Four More Advances in Predictive Analytics
The technology and the practices around predictive analytics are also evolving. Here are four ways predictive analytics is changing.
- By Fern Halper, Ph.D.
- March 4, 2014
Predictive analytics -- a statistical or data mining solution consisting of algorithms and techniques that can be used on both structured and unstructured data to determine outcomes -- is not a new technology. In fact, it has been used on structured data for decades. However, several factors -- including cheap compute power, ever-increasing amounts of disparate data with various frequencies (aka, big data), ease of use, and the need to more effectively compete -- have helped to fuel the more recent adoption of the technology. The technology is becoming red-hot.
Predictive analytics is being used across industries in use cases as diverse as retention analysis, fraud detection, behavior analysis (human and machine), propensity to spend, risk assessment, and medical diagnosis. The technology and the practices around predictive analytics are also evolving. When people described predictive analytics, it was all about statistics. However, newer computational science and linguistics techniques are now part of the analytics mix. Open source is also now an important piece of the pie. Infrastructure technologies to support big data and predictive analytics are evolving as high-performance computing and other technologies to support big data become part of the predictive analytics landscape.
I've been writing about some trends I'm seeing in predictive analytics for some time (for instance, see my May 2013 blog posting on 5 Trends in Predictive Analytics. Here are four more:
Trend #1: Techniques
As part of my research for the recently published Predictive Analytics Best Practices Report, I asked survey respondents currently using predictive analytics as well as those investigating it about popular techniques for predictive analytics. The top three techniques were linear regression, decision trees, and clustering. However, other techniques are becoming more widespread.
-- Time series data consists of equally spaced observations through time. For instance, weather observations, stock market prices, and machine-generated data are all examples of time series data. Time series analysis takes into account that there may be an internal structure to this data. One goal of the analysis can be to help extract the signal and identify these patterns. This can be useful in a wide variety of applications and especially in analyzing big data. For instance, machine data generated by sensors can be analyzed to determine mean time to failure for engine maintenance. The popularity of time series analysis is starting to grow; it was ranked number 4 on the list of predictive analytics techniques in the Best Practices study.
-- Machine learning grew out of computational sciences. Some of the first algorithms were actually created decades ago. The idea behind machine learning is to discover patterns in data which were previously not known. The emphasis of machine learning algorithms is often to process large amounts of data for prediction, which makes it a useful technology to consider for big data analytics. Machine learning can be supervised or unsupervised. In supervised learning, an algorithm is given a set of inputs and a set of corresponding outcomes or target variables, commonly in classification kinds of problems. In unsupervised learning, an algorithm is given a set of input variables but no outcome variables.
-- In ensemble modeling, predictions from a group of models are used to generate more accurate results. Although not yet used extensively, this kind of analysis is starting to gain traction.
Trend #2: Open Source
Open source solutions are becoming increasingly important to predictive analytics because they enable a wide community to engage in innovation. There are several open source solutions that address predictive analytics. Probably the most popular is the R language (http://www.r-project.org). It is a free software environment for data manipulation, statistics, and graphics. Historically, R was primarily used only in academia; enterprises are now adopting R at an increasing rate. Vendors are also incorporating R as part of their solution stack. Newer vendors have emerged to make R easier to use.
In some of the interviews I conducted for the Predictive Analytics Best Practices Report, users reported that they often used R to try out certain techniques. As one user stated, "Analytics like R provide more innovative packages and more sophisticated models. It may not be an enterprise system rigor-wise, but it helps us to come up with good initial model structure. It can also provide business rules that become part of a process." Interestingly, while use of open source ranked fairly low among current users of predictive analytics, it ranked higher with those investigating the technologies.
Trend #3: In-Memory Analytics
In-memory analytics processes data in RAM rather than on disk. This can be a big boon for predictive analytics, especially against big data. For example, models that might have taken hours or even days to run now take minutes. This means that users can analyze large data sets in-memory with better performance. This, in turn, means that you can iterate on models more effectively, an important practice for predictive analytics. In my Predictive Analytics Best Practices Report research, for instance, respondents noted the use of in-memory computing was poised to almost double in the next three years.
Trend #4: Predictive Analytics in the Cloud
Although use of the public cloud for analytics has not yet met market expectations, at TDWI we're seeing evidence that this is starting to change. More companies are starting to investigate the cloud for BI. Additionally, in conversations with end users, more of them are considering putting their big data in the public cloud, especially for exploration. They believe this makes sense, especially if they want to marry their on-premises data with cloud data. They are using it to process real-time data as well as running predictive models on extremely large multi-source data kinds of analysis. We expect to see this trend continue.