Overcoming the Data Scientist Shortage: Democratizing Predictive Analytics
Collaboration, validation, education, and organization can help your organization democratize advanced analytics.
- By Fern Halper
- March 22, 2016
The move to democratize advanced analytics, such as predictive analytics, has raised the question of what sort of best practices need to be put in place in order to have non-quants do more advanced analytics.
I recently spoke with Florian Douetteau, CEO and founder of Dataiku. The three-year-old company offers a platform to help users analyze and create models with big data. The platform provides capabilities for data sourcing, cleansing, preparation, and then modeling and deploying predictive models. It even helps users select the best model to use and to tune that model. One of the company's goals is to help democratize more advanced analytics such as predictive analytics in response to the shortage of data scientists. Dataiku is not alone. Products such as BeyondCore and Watson Analytics also determine the right model to use.
In our conversation, we touched on the topic of whether it makes sense for someone who isn't trained in predictive modeling to build a predictive model. Florian told me that the most frequent error they see non-data scientists make when using predictive analytics is misinterpreting the data and predicting the obvious thing. Sometimes this occurs because the person building the model isn't thinking through the attributes used for the prediction. Are they correlated with each other? Are the input variables somehow derived from the target variable? This can impact the outcome.
Florian explained that he has seen companies take steps to overcome this challenge. First, when data scientists are in short supply, a company will often turn to its business analysts to work with the data scientists. This makes sense. A business intelligence analyst is a logical choice to learn the intricacies of data science.
The data scientist and the business intelligence analyst often collaborate, with the business intelligence person dealing primarily with the data preparation and modeling. The data scientist then validates the model before it is put into production. In this way, the data scientist acts as a control.
This is a best practice TDWI has suggested, as well. Dataiku offers training packs to help others get started building models and understanding predictive analytics. Training is also a best practice we at TDWI believe in. Additionally, Florian mentioned that they are seeing the emergence of what he termed "data labs," or central analytics teams that combine data scientists, developers, BI professionals and others to build data-centric analysis. Some organizations refer to these as centers of excellence. Others talk about multimember "SWAT" teams that tackle more complex problems.
These best practices -- collaboration, validation, education, and organization -- can be very useful first steps in helping to democratize more advanced analytics. For more information and best practices on this topic, check out the TDWI Best Practices Report, Predictive Analytics for Business Advantage.
Fern Halper, Ph.D., is well known in the analytics community, having published hundreds of articles, research reports, speeches, webinars, and more on data mining and information technology over the past 20 years. Halper is also co-author of several “Dummies” books on cloud computing, hybrid cloud, and big data. She is the director of TDWI Research for advanced analytics, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and “big data” analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at firstname.lastname@example.org, on Twitter @fhalper, and on LinkedIn at linkedin.com/in/fbhalper.