Don’t Forget the Back End of the Machine Learning Process
Not planning for putting models into production can lead to project failure or increase the amount of time needed to deploy a model.
- By Fern Halper
- December 8, 2020
As organizations undergo digital transformation, they are seeking out more advanced analytics. TDWI sees growing interest in machine learning across different parts of the company. For instance, a manufacturing operations team might use machine learning for predictive maintenance related to assembly lines. A call center agent might use a churn model built with machine learning to provide special offers to customers at risk of churn. Although these use cases might be in different parts of the organization, they both require models to be deployed into production.
At TDWI, we see that organizations often worry about hiring data scientists to build models and put considerable emphasis and resources into the model development stage of the analytics life cycle. At the same time, they don’t consider all of the nuts and bolts that need to go into operationalizing that model. That can lead to project failure or increase the amount of time needed to deploy a model.
There are at least four steps to consider:
Model management. Just as a piece of software is versioned and registered, a machine-learning model should be, too. It is important to version the model in the model-building stages as well as in the case where you deploy new versions immediately when you’ve made a change. That means registering and versioning the model and capturing metadata about the model -- including when it was built, who built it, and what data was used to train the model. This way, the organization can track models that are put into production and know what version of each model is running.
Model deployment. Once a model is built and validated, it can be deployed into production. This involves exporting the model as well as developing the pipeline to score fresh data.
There are different approaches to model deployment. Some organizations will rewrite the model so it fits into a production system or application. This is generally not advisable because it can introduce many errors into the process. Others export models using APIs. More are beginning to export them in containers. Regardless of the method used, it is important to be able to feed data to the model in production. This will require gathering data, preprocessing it, and recalculating the features that need to be input into the model.
Model monitoring. A model is good for a certain period of time -- months, weeks, or days (or shorter, depending on the use case). After that, models can get stale. Model degradation can be a serious problem, so organizations need to monitor models in production to see if they are drifting.
Retraining the model. After a model is put into production, the model’s assumptions might change. For example, the assumptions for customer behavior used as input to models put into production pre-COVID have certainly changed. The last step of operationalizing analytics, then, is retraining the model once it has been in production and the organization is monitoring its performance.
At TDWI, we typically see that organizations have perhaps three to five models in production. That means they are currently getting away with doing many of the steps mentioned above manually. For instance, they may use some sort of file system to keep track of models. They may manually monitor their models. However, this is not a scalable solution. Additionally, their data scientists may be tasked with doing this work. Again, this is not scalable.
Deploying models into production will ultimately require DataOps or ModelOps team(s) doing this work. It will also require tooling and automation to deploy and manage models in production. Some commercial vendors, early on, provided “back-end” feature/functionality. The open source community is trying to provide these kinds of tools as well, but they are still playing catch-up.
TDWI recommends that organizations think about model management early on as machine learning is taking hold, so they can put the right people, processes, and technologies in place to be ready as they begin to scale.
Fern Halper, Ph.D., is well known in the analytics community, having published hundreds of articles, research reports, speeches, webinars, and more on data mining and information technology over the past 20 years. Halper is also co-author of several “Dummies” books on cloud computing, hybrid cloud, and big data. She is VP and senior research director, advanced analytics at TDWI Research, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and “big data” analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at [email protected], on Twitter @fhalper, and on LinkedIn at linkedin.com/in/fbhalper.