Who Should be Building Predictive Models?
With statistical and data mining skills in short supply, are the right people building your models? These three safeguards can help you avoid model-building problems.
- By Fern Halper, Ph.D.
- October 21, 2014
There has been quite a bit of talk recently about the democratization and consumability of analytics. Democratization refers to extending the deployment of BI and analytics tools to more users in an organization. The idea is to let all people, regardless of technical prowess, have access to data to analyze to enable making more informed decisions. Democratization originally focused on more descriptive analytics and reports and dashboards. Now this movement includes visualization as well as more advanced techniques. Consumability, which is related to democratization, refers to either that BI or analytics can be used easily by a lot of people or that the results of BI or analytics can be consumed by the masses. I'm referring to the latter here.
Concurrent with these market trends is the movement to empower business analysts to use more advanced forms of analytics, such as predictive analytics. Is this a good idea? The market seems to think so. In a 2014 TDWI Best Practices Report on Predictive Analytics, 86 percent of respondents currently using predictive analytics expected that business analysts would be the primary builders of predictive models in the near future. Seventy-nine percent said that statisticians or other quantitative types would be the primary builders. Additionally, when asked what the top three skills for model building were, knowledge of the business, knowledge of the data, and critical thinking were at the top of the list. Statistical knowledge or knowledge of the tool ranked low on the list!
What's going on here? Is this a push by the vendors to put their tools in more hands or a movement by the business to do so?
It is probably a bit of both. On the one hand, vendors have made their software easier to use. They are including drag-and-drop interfaces rather than requiring the model builder to use a scripting language. Some software tools suggest models based on input data and the specification of targets of interest (such as buy or don't buy) or remain a customer or drop service. This has made the software easier to use. On the other hand, the reality is that statistical and data mining skills are in short supply. Some organizations feel that their highly skilled staff can work together with business analysts to build models, especially if they understand the data and can deal with it.
The question still remains: is this a good idea, or will we be reading in the newspaper a few years from now that someone who was not really qualified to do so built a predictive model that was put in production and cost a company millions of dollars?
Here are three safeguards to take to help ease this transition:
Training: I don't believe that any business analyst can build a predictive model, even if they are a critical thinker with knowledge of the business and the data. I think that training is necessary for any kind of predictive analysis for two reasons. First, it is important to understand what you are doing, especially if you have to defend your analysis. Second, it makes sense to get training on a tool you are using so you can use it correctly.
There are pitfalls to predictive modeling. These can include overfitting, giving irrelevant data too much credence, or thinking that correlation means causation. It might involve not understanding how models get stale or how to treat certain kinds of data or build the right attributes for the task. If you're building predictive models and don't understand the last two sentences, then my point is made. I'm not saying you necessarily need to have a Ph.D. in statistics, but there are data mining boot camps (TDWI offers one) as well as online courses or vendor-provided training that can help increase your knowledge.
Collaboration: Several companies that I speak to say that they might let a business analyst build a predictive model, but in order for it to get into production to make decisions, an enterprise needs a control process in place. This control process might involve the business analyst getting approval from the data scientist for any model that the business analyst builds. It might include business analysts and data scientists working together on the models. Vendors are helping with this by providing collaboration and sharing features in their toolkits. This makes sense.
Risk analysis: An enterprise needs to decide for itself what kinds of models it might let its business analysts create. For instance, it might be that the data scientist/statistician is the only person who might be allowed to build models that carry a high risk/high dollar tag if there are problems. A business analyst might be able to create some less risky models. It is up to the organization to decide.
Just because something is easier to do doesn't mean that everyone should be doing it, however. Who builds models should be taken seriously. However, if companies take precautions, such as those I've just described, they may be able to take the first steps towards opening up predictive analytics to more builders.