TDWI Upside - Where Data Means Business

Machine Learning Surprisingly Widespread in Development

A new survey basically confirms it: we're in the midst of a renaissance in which machine learning technologies are both widely available and increasingly easy to use.

Are you (or somebody you know) developing big data-related software? If so, there's a pretty good chance you're working with machine learning (ML) technologies of some kind.

That's the upshot of a new survey from Evans Data, a firm that specializes in developer-oriented research. Evans Data's Big Data and Advanced Analytics Survey found that more than one-third (36 percent) of developers say they're using ML technologies in their big data and advanced analytics projects.

"Machine learning includes many techniques that are rapidly being adopted at this time and the developers who already work with big data and advanced analytics are in an excellent position to lead the way," said Evans Data CEO Janel Garvin in a prepared release.

Widespread Machine Learning Resources

ML used to be the province of academics and researchers. Increasingly, however, ML technology is commoditized: vendors including Actian, IBM, Hewlett-Packard Enterprise, Microsoft, Pivotal, SAP, SAS, and Teradata have built ML algorithms and functions into their RDBMS and analytics technologies.

From a developer perspective, ML resources -- libraries, algorithms, and source code -- are basically ubiquitous, particularly with respect to open source software (OSS). Developers and others who work with Hadoop, Spark, and other big data platforms have access to thousands of free and/or open source libraries, code snippets, etc.

In the OSS space, there's an abundance of ML libraries, including Apache MADlib, a library of in-database machine learning algorithms; Google's TensorFlow; MLLib, an ML library for the Spark cluster computing framework; and MLOSS, a huge online repository of machine learning libraries.

That's just scratching the surface, according to those in the know. "There is a plethora of data sets, case studies, and libraries available for people to use and share," observes data scientist Denny Lee, a technology evangelist with Spark parent company Databricks. Lee specifically cites the availability of Python libraries for statistics, machine learning, and other advanced analytics.

In the Microsoft world, ML technology has also been commoditized to some extent. Many of the libraries available at MLOSS will also run in Windows or Azure. SQL Server boasts built-in support for in-database ML algorithms and Redmond's new Azure Machine Learning service exposes ML capabilities via Web services APIs.

We're in the midst of a renaissance in which ML technologies are both widely available and increasingly easy to use. As a result, developers can basically pick and choose from among a slew of ML resources.

Uses for Machine Learning Technologies

What kind of work are coders actually doing with ML? According to Garvin and Evans Data, they're primarily making use of decision trees, one of the core building blocks of predictive models. Decision trees are also used with artificial intelligence.

Other common ML models include linear regression and logistics regression. According to Evans Data, the internal units most likely to make use of ML and other advanced analytics technologies are logistics, distribution, and operations.

Garvin also says that developers aren't just hip to the usefulness -- and, thanks to their widespread availability, ease of use -- of ML technologies. They're knowledgeable about, and in many cases working in, a number of other cutting-edge analytics fields too.

"We are seeing more and more interest from developers in all forms of cognitive computing, including pattern recognition, natural language recognition, and neural networks, and we fully expect that the programs of tomorrow are going to be based on these nascent technologies of today."

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.