5 Skills You Need to Build Predictive Analytics Models
These five competencies are required to build a successful predictive model.
- By Fern Halper
- June 26, 2017
Predictive analytics is a changing market. Vendors are making it easier and easier to build models using automated predictive modeling tools designed for business analysts. Developers are utilizing machine learning algorithms from open source marketplaces or automated model building via APIs to build predictive applications.
Enterprises are extremely interested in deploying predictive capabilities. In a recent TDWI survey about data science, about 35 percent of respondents said they had already implemented predictive analytics in some way. In a 2017 TDWI education survey, predictive analytics was the top analytics-related topic respondents wanted to learn more about.
Although it can be easy to build models, that doesn't mean you don't need skills to be successful. Here are five competencies that are key for anyone looking to build a predictive model.
#1: Think with a predictive mindset.
Predictive analytics is different from descriptive analytics. Descriptive analytics involves using historical data to understand what has happened. It generally involves using some visualization tool or SQL to slice-and-dice and shake-and-bake the data. Typical questions for descriptive analytics include: How many units did we sell last month? What region sold the most of product X? The mindset is reactive.
In predictive analytics, the mindset is proactive. Here, it is important to think about an outcome or target variable of interest. For instance, an HR department might be interested in predicting which employees are at risk of leaving the company. Here the outcome is leave or stay. Historical data, based on known outcomes of those employees who either stayed or left, is used to train the model to understand the probability that a current employee with certain characteristics will take a certain action.
This is a proactive/outcome-based mindset rather than a reactive/data chunked one. That proactive mindset will be important in formulating the question you need to answer.
#2: Understand the basics of predictive techniques.
Although many available tools make it easy to build a predictive model, you should still understand the basics of the techniques you might be using.
For example, you might be asked to defend your analysis. If you can't explain what you did or how the technique works, you will have a hard time building confidence in your results. Additionally, automated model building can help get an app to market faster, but that doesn't mean that the model is going to be right all the time. It is important to have some inkling of what is going on behind the scenes so you'll be prepared when you run into issues down the road.
The good news is that if you've taken a college statistics class, you probably understand the basics of one of the most popular techniques -- regression. The other popular technique is some form of classification such as a decision tree. You need to be able to explain the strengths and weaknesses of each algorithm.
#3: Know how to think critically about variables.
The attributes you provide your predictive algorithm for training are important. The data quality needs to be sound (remember garbage in, garbage out). That means knowing how to deal with missing data, outliers, and other data problems. It also means knowing how to pick the best attributes for the model. For instance, you don't want to include attributes that are themselves correlated (e.g., people who live in Boston also live in Massachusetts) because that will confuse the model.
Some of the most valuable predictors are often those you derive yourself. In predictive analysis, typically the data will need to be shaped to create attributes (called features) of interest that might be good predictors of the outcome. For instance, you might want to perform a length of time calculation or create a meaningful ratio. Some tools do simple transformations automatically, some do not. This skill requires critical thinking as well as subject-matter expertise.
#4: Understand how to interpret results and validate models.
It is one thing to build a model by providing a tool with data. However, in order to trust the model, you need to understand whether the model is meaningful and how good it is. That means having the skills to know how to interpret model metrics.
For instance, a tool might tell you that a classification model is 95 percent accurate. Do you know what that really means? Do you know if that is good? It is important to become familiar with different ways to interpret the quality of models. You must be knowledgeable about confusion matrices and precision/recall as well as ROC, gain and lift charts, and root mean square error, to name a few.
You must also anticipate when certain problems might arise with different kinds of metrics. For instance, in a classification problem where there are many examples from one class but a small number from another, using an accuracy metric may not be the best choice because of the accuracy paradox – where less-accurate models can be more predictive and a different metric, such as precision or recall, might be better to use than accuracy A common example is that a fraud model might be 98 percent accurate, but the accuracy can be increased to 98.5 percent by always predicting "no fraud." (See this article for more information.)
#5: Know what it means to validate a model.
It's easy to build a model that looks good when you know the answers. However, it is important to validate models against new data. This is done in multiple ways.
Some analysts will use a test set of data called a hold-out sample (perhaps 20 percent of the data) to test the model against. Others use a test sample and then a validation sample because the model may be tweaked during testing. The validation sample provides a fresh data set. Other analysts use different kinds of cross-validation, such as K-fold cross-validation or a leave-one-out method. Depending on the tools you use, building these validation skills will also be important.
A Word of Advice: Keeping Current is Key
These are five of the skills you need today, but the field is changing all the time. You need to keep your skills fresh by reading everything you can on the topic [Editor's note: the Upside website and TDWI research are two good places to start], attending webinars or listening to replays of past presentations, enrolling in continuing education (refresher) classes in person or online, attending conferences, or visiting a local user group.
Fern Halper, Ph.D., is well known in the analytics community, having published hundreds of articles, research reports, speeches, webinars, and more on data mining and information technology over the past 20 years. Halper is also co-author of several “Dummies” books on cloud computing, hybrid cloud, and big data. She is the director of TDWI Research for advanced analytics, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and “big data” analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at firstname.lastname@example.org, on Twitter @fhalper, and on LinkedIn at linkedin.com/in/fbhalper.