Concurrent’s Open Source Scoring Engine Helps Reduce Barriers to Hadoop Adoption
Company launches Pattern, which, in combination with Cascading and Lingual, helps enterprises fulfill the promise of Hadoop.
Note: TDWI’s editors carefully choose vendor-issued press releases about new or upgraded products and services. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the vendor's statements.
Concurrent, Inc., an enterprise big data application platform company, has released Pattern, a free, open source, standard-based scoring engine that enables analysts and data scientists to quickly deploy machine-learning applications on Apache Hadoop.
Leveraging the power and broad platform support of the Cascading application framework, Pattern lowers the barrier to Hadoop adoption by enabling companies to leverage existing intellectual property (IP) in predictive models, existing investments in software tooling, and the core competencies of existing analytics staff to run big data applications from existing machine-learning models using Predictive Model Markup Language (PMML) or through a simple programming interface.
Hadoop is rapidly becoming the tool of choice for tackling enterprise big data analytics needs in an effort to make the most of growing volumes of unstructured and semi-structured data. The need for Hadoop to easily integrate with existing data management and analytics systems, however, has created a barrier to comprehensive Hadoop adoption.
Pattern: PMML for Cascading and Hadoop
With the introduction of Pattern, enterprises can now leverage existing skill sets, core competencies, and product investments by carrying them over to Hadoop via the standards-based PMML technology. PMML is the standard export format for tools such as R, MicroStrategies, and SAS.
With Pattern, analysts and data scientists familiar with these technologies can run predictive data models at scale and integrate ETL, data preparation, and predictive analytics in the same application to greatly reduce development time and unlock accessibility to large Hadoop data sets.
Pattern runs on Cascading, a widely used and deployed application framework for building robust, enterprise big data applications on Hadoop. Enterprises use Cascading to streamline data processing, filter data, and optimize workflows for large volumes of unstructured and semi-structured data. Cascading is also at the core of popular language extensions including PyCascading (Python + Cascading), Scalding (Scala + Cascading), and Cascalog (Clojure + Cascading) -- open source projects sponsored by Twitter. Cascading offers a reliable and repeatable way to build and deploy big data applications.
By leveraging the Cascading framework, enterprises can apply Java, SQL, and predictive modeling investments, and combine the respective outputs of multiple departments into a single application on Hadoop. This is a powerful step forward in delivering on the full promise of the business of big data.
Availability and Pricing
Pattern is free, open source software that is available under the Apache 2.0 License. To learn more about the Pattern project, visit http://www.cascading.org/pattern. Concurrent also offers standard and premium support subscriptions for enterprise use. To learn more about Concurrent’s offerings, please visit http://concurrentinc.com.