TDWI Upside - Where Data Means Business

Three Introductory Machine-Learning Platforms

Three services offer a simple entry point to advanced analytics.

When it comes to advanced analytics, the choice of toolsets can be daunting. From the raw power of programming languages such as Python, R, and Scala to the megaplatforms of SAS and IBM, the technology can be intimidating. These different options provide solid path to take a company to success but often represent a long-term investment in infrastructure and skill development.

Three major providers have identified an alternative path to success, the analytics-as-a-service route. Amazon, Google, and Microsoft have all released offerings that will get organizations up and running very quickly with advanced analytics. Their focus is on offering a simple entry point to advanced analytics without the long-term commitment of infrastructure and skill development. To achieve this simplicity, these services don't offer all the flexibility of hundreds of different statistical models and configurations to optimize how these models perform. What they do is provide organizations an opportunity to get started very quickly using a set of pre-defined models and configurations that work relatively well in most cases.

Amazon Machine Learning

In 2015, Amazon launched a new platform for performing prediction as part of its Amazon Web Services platform. This service is built to consume data in the Amazon ecosystem, including data files in S3, data stored in Amazon RedShift, or in Amazon's Relational Database Service (RDS). The idea is to create a dataset that has both independent variables and a single dependent variable and submit this to create a model. Amazon takes care of partitioning the data, creating the model, and evaluating its performance.

Once this model is complete, batch data can be run against it from these same data sources; the results are returned in the terms of a predicted dependent variable and associated confidence level. Each model can also be easily turned into a Web service to be consumed from within an application.

With this service, there is no hardware to set up or configure and no software to install. The cost is very attractive at both a batch level and on a per-transaction level.

Google Prediction API

Google's service is similar to Amazon's Machine Learning. Called the Google Prediction API, it was first released in 2010 and has since gone through many improvements. It is similar in nature to Amazon's offering and uses Google's Cloud Storage as its base. Files with independent and a dependent variable are uploaded into the Google Cloud and then models are generated through the administrative console. From there, new data is passed through the Web-service interface and prediction and confidence levels are returned.

To get started with the Google Prediction API, there is a nominal monthly fee that includes an initial set of predictions. Additional predictions are available for a low cost.

Again, as with Amazon, there is no hardware or software setup to get up and running on this platform.

Microsoft Azure Machine Learning

Finally, Microsoft has taken a hybrid approach to their offering. Unlike Amazon and Google, Microsoft's solution offers more configuration options. Built upon its success with SQL Server Data Mining, Microsoft's Machine Learning Studio allows users to choose different types of models and configure those models for optimization. It can use data from the user's local computer or data that exists in the Azure ecosystem. Models can take more skill to develop because they are configurable, but the environment provides more flexibility to answer different types of questions.

Once models are created, they can be exposed as Web services that be consumed in real time. The Microsoft pricing model is similar to Google's: there is a minimal base monthly rate and additional charges for predictions and hourly usage. This pricing covers access to the Machine Learning Studio and the prediction engine.

Getting Started

Amazon, Google, and Microsoft are all seeing a niche in the market: the data science community that wants results without being encumbered by the effort and costs of setting up and configuring hardware and software. For a minimal cost, enterprises can get into these platforms and start generating predictions quickly and accurately.

About the Author

Troy Hiltbrand is the chief digital officer at Kyäni where he is responsible for digital strategy and transformation. You can reach the author at thiltbrand@kyanicorp.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.