RESEARCH & RESOURCES

MapR Simplifies End-to-End Workflow for Data Scientists

Release includes new advancements for MapR-DB, MapR Data Science Refinery, and Apache Drill.

Note: TDWI’s editors carefully choose vendor-issued press releases about new or upgraded products and services. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the vendor's statements.

MapR Technologies, Inc. has released the MapR Expansion Pack (MEP) 4.1, enabling data scientists and engineers to create scalable deep learning pipelines, make operational data instantly available for data science, and achieve performance improvements across a variety of data discovery and ad hoc queries. MEP 4.1 expands the ability to build real-time pipelines and brings data science capabilities to a broad set of users with new languages support.

The latest release of the MapR Expansion Pack gives the data scientist community an option to use their language of choice, Python, for high-performance data analysis on operational data -- defining real-time workflows to read from and write data into MapR database -- and easier deployment by simplifying the Python library distribution across the cluster.

Complementary to the MapR Converged Data Platform, new features in MEP 4.1 include:

  • MapR Data Science Refinery extends support for distributing Python archives for PySpark. This allows data scientists to distribute popular Python data science libraries to create scalable deep learning pipelines.
  • MapR Data Science Refinery enables Apache Zeppelin to easily leverage a diverse set of Python libraries and environments that can be shared and stored in MapR-XD.
  • PySpark jobs can directly read and write to MapR-DB OJAI, making operational data instantly available for data science.
  • Python and Java Bindings for MapR-DB OJAI Connector for Apache Spark enable developers to read from and write to MapR-DB from Spark using Java and Python. Developers can now build data-intensive business applications in Java and Python.
  • A new version of Apache Drill, Drill 1.12, enables fast data exploration on operational data in MapR-DB and historical data in Parquet for data scientists, significantly improving performance across a variety of data discovery and ad hoc queries.

MapR-DB is a high-performance NoSQL (“Not Only SQL”) database management system built into the MapR Converged Data Platform. It is a highly scalable multimodel database that brings together operations and analytics as well as real-time streaming and database workloads to enable a broader set of next-generation, data-intensive applications.

The MapR Data Science Refinery provides complete, self-service access for DataOps teams to all data from within the same cluster. Data scientists are a driving force behind the DataOps movement where data analysis is increasingly powered by machine learning and artificial intelligence to gain quick, accurate, and actionable insights.

Apache Drill is an open source distributed SQL query engine integrated into the MapR Converged Data Platform, offering fast and secure self-service BI SQL analytics at scale. With the ability to discover schemas on the fly, Drill’s distributed shared-nothing architecture enables incremental scale-out with low-cost hardware to meet increasing demands of query response and user concurrency.

The new features of MapR-DB, MapR Data Science Refinery, and Apache Drill 1.12 are available now in the MapR Expansion Pack 4.1.

For more information, visit www.mapr.com.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.