RESEARCH & RESOURCES

Alpine Enables Predictive Analytics for the Rest of Us

Chorus from Alpine Data Labs is a predictive analytics tool designed for data scientists and business analysts. With its new Chorus 5.0 release, Alpine says it's reaching out to not-so-techie consumers, too.

By big data start-up standards, Alpine Data Labs Inc. is relatively mature. It was first spun-off out of EMC Corp. three years ago, and its flagship product, Alpine Chorus, is now in its version 5.0 release.

It's also branching out. In earlier iterations, Chorus combined a statistical and analytical workbench with a set of collaborative capabilities. The company focused on facilitating collaboration among data scientists and statisticians, along with business analysts, power users, and other advanced users. This was a good growth strategy because it promised to permit non-statisticians and non-data-scientists to build in-database predictive analytic models. The rub, says chief marketing officer Bruno Aziza, is that Alpine's success in this segment meant that it was neglecting a much, much larger market of potential consumers: i.e., everybody else.

With Chorus 5.0, Alpine aims to change that.

"What [Chorus 5.0 is] doing is essentially opening up access to all of the people inside of your organization to this analytic ecosystem that you've been building," he argues.

This pitch isn't unique to Alpine Data Labs, however. Los Angeles, Calif.-based Predixion Inc. has long targeted a similar market; from the beginning, in fact, it's positioned its Insight product as a tool for mainstream predictive analytic usage. More recently, SAS Institute Inc. introduced SAS Visual Statistics, a statistical workbench for non-statistical users. (SAS may be targeting Alpine's own bread-and-butter market -- namely, business analysts, power users, and other advanced, or tech-savvy, users.) Other players, such as IBM Corp., Pentaho Inc., SAP AG, and Teradata Corp., are targeting a similar -- or at least much bigger -- market with easier-to-use predictive analytic solutions.

Nevertheless, argues Dillon Woods, the company's "field-facing" CTO, Alpine's pitch is distinctly different. For one thing, he says, Chorus has emphasized collaboration from the very beginning. Established players such as SAS and IBM have had to retrofit existing tools with collaborative capabilities, Woods argues. Chorus 5.0 builds on an existing collaborative backbone by incorporating new wizards, visualizations, social-media-like features, automated export features -- including "Publish to Tableau" and "Open to Tableau" options.

"The core of what we do is math in Hadoop, but you could think of it as us having this wrapper around [this math] that we call a collaboration layer. There are many collaborative features, such as activity streams that can connect people in groups," Woods comments. "The biggest problem is that people spend a lot of money on these analytics projects ... and they never actually operationalize them. We believe that the reason for this is that you have to get people involved -- you have to get business users closer to the analytics process, get them to participate in [the design and optimization of predictive analytic artifacts]. The second step is getting those finished results and pushing them out to the process. This [second step] follows from the first."

Woods claims that Chorus 5.0 extends this collaborative model to data source connectivity.

"Users can see what data they have [as well as] where it exists. They can add human-readable annotations to the data. They could be a DBA, they could be a data scientist, they could be an expert on several different data sets, or they could just be the business analyst or user who owns [the data and its associated business process]," he explains. "We try to surface that information. At an even deeper level, we want to use our own analytic engine to surface the data that you might be interested in, so we're using that to track data lineage. Based on the project [someone is working on], we'll even try to suggest related or alternate data sets."

Chorus also aims to simplify data source preparation. Like data virtualization or data federation technology, it can expose a single logical view of data -- namely, data in Hadoop, along with data managed by SQL database systems -- and offers optimized connectors for Hadoop, Greenplum, Pivotal's Hawq (a port of Greenplum to Hadoop), as well as JDBC connectivity for SQL query access. In this way, says Woods, Chorus can push data transformations down to an Oracle DBMS and then move that (reduced, conformed) data set into Hadoop -- where it can be stored, analyzed, and/or joined with data from other sources.

Speaking of Hadoop, analysts can use Chorus to visually explore or profile data in Hadoop itself -- Chorus offers several visualizations, including the requisite heat maps and histograms -- as well as to select relevant data and either extract it for analysis or schedule additional prep/analysis in Hadoop. Behind the scenes, Chorus pushes the relevant work down to Hadoop's MapReduce engine or (increasingly) to the Apache Spark engine.

Alpine doesn't market Chorus as a data prep tool, says Dillon -- although he says there have been a few inquiries. "With [respect to] data prep, basically you're trying to get your data into shape that you can do something with it. There are companies that are fully dedicated to that feature. We're not that company, even though we've had customers come to us and ask us about [our] Hadoop support."

Chorus supports the open source R statistics programming environment, along with the new(ish) Spark Machine Learning Library (MLLib), in addition to Python -- which is emerging as a popular alternative for coding machine learning libraries -- Java, and the Predictive Modeling Markup Language. In the area of R support, Chorus's "R Execute" feature permits data scientists to embed existing R code artifacts into Chorus workflows, as well as combine them with machine learning algorithms.

"We're trying to give [customers] as much flexibility as possible," says Aziza. "We're letting you do the work in the language that you know, but we're layering the manageability on top of that, the collaboration and governance on top of that," he continues, noting that "R Execute" is a relatively new feature. "The next one [i.e., similar feature] we'll do is probably Python. We're going to continue that trend of integrating scripting languages based on demand."

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.