3 Flavors of Predictive Analytics Automation
Simplifying tasks is a key benefit of automation, and in the world of predictive analytics, these three areas are getting the most automated help.
- By Fern Halper
- May 15, 2017
I'm in the middle of writing a new TDWI report that will help organizations navigate the predictive analytics market. As part of my research, I'm examining a number of predictive analytics developments. One that interests me is automation. In speaking to vendors, I find that automation, which involves reducing human intervention, means different things to different providers. Here are three flavors of automation I'm seeing.
Automation of Basic Model Building
Data scientists and statisticians who build predictive analytics models are often hard to come by. Many organizations are planning to "up-skill" their data science teams by including and training business analysts. As part of this trend, vendors are making their tools easier to use so that business people not necessarily trained in predictive analytics can "effortlessly" build models.
Some vendors provide different user interfaces for different user personas (e.g., business analyst, data scientist). For the business analyst persona, the analyst simply specifies the target (or outcome) variable of interest, and the software creates the best model using the attributes provided. In other words, the software automates the model building. Of course, it is still up to the business analyst to prepare the data, engineer features, and understand and communicate the model results, although some tools provide explanations to help.
Automating Model Building to Generate Hundreds or Thousands of Models
Whereas model-building automation speaks to the expertise issues around building a model, there is also the issue of automating repetitive tasks. Sometimes, organizations want to build models across many segments (or groups such as products, gender, or region). That can be time-consuming -- think about a large company trying to build "propensity to buy" models across all of its product segments.
Some vendors offer automated approaches for building these many models. The model builder can put a model flow together and then reuse it against many other segments, thereby significantly reducing the time taken versus building all of these models manually. Some vendors refer to this as a "factory" approach to predictive modeling. This can also include running algorithms in parallel to determine the best model.
Automated Obsolescence Detection
Finally, there is the issue that models can get stale. A critical step in deploying a model into production is making sure that it doesn't outlive its usefulness and start to degrade. Bringing some sort of automation to this process is important, especially once there are numerous models running in production.
Vendors are building automation into the model management process. Some tools have features where a model builder can specify rules for the model to alert the organization that the model is degrading and needs to be retrained. Other tools go further and perform automated detection of model degradation, based on lift or some other parameter.
Of course, there are other aspects of the predictive analytics life cycle that are also being automated. Many vendors are providing automation help for the data preparation phase (see Best Practices Report: Improving Data Preparation for Business Analytics.)
Some vendors provide tools to identify duplicates, outliers, and so on.
One question that people ask is whether predictive analytics can be completely automated. I don't believe this will happen any time soon. Although certain specific tasks within the analytics life cycle are being automated, data scientists and analysts are still needed to ask the right questions in the first place, oversee and participate in the process, interpret results, and provide context. Many aspects of the life cycle can be partly automated to eliminate some of the drudgery, and certain kinds of models are automated in production (think recommendation engines), but I don't see people being completely out of the loop for many years to come.
[Editor's note: Halper's recent research is for of a new line of reports called TDWI Navigators that will help readers navigate a particular technology. Each report will examine trends, adoption, maturity, challenges, and value. Her first report will provide information about vendors that offer predictive analytics solutions -- who they are, what they offer, and who they're targeting -- and explain the vendor landscape. Watch for the report this summer.]
Fern Halper, Ph.D., is well known in the analytics community, having published hundreds of articles, research reports, speeches, webinars, and more on data mining and information technology over the past 20 years. Halper is also co-author of several “Dummies” books on cloud computing, hybrid cloud, and big data. She is the director of TDWI Research for advanced analytics, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and “big data” analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at firstname.lastname@example.org, on Twitter @fhalper, and on LinkedIn at linkedin.com/in/fbhalper.