How the COVID-19 Pandemic is Accelerating the Need for Model Monitoring
Data models that predate the pandemic may not reflect today's business environment. It's time to give models a checkup to make sure they reflect current conditions.
- By Joshua Poduska
- March 8, 2021
It's no secret that the COVID-19 pandemic has had an impact on nearly every facet of business operations, and organizations that depend on artificial intelligence (AI) and machine learning (ML) to automate business decisions and critical business processes have been particularly vulnerable. Thanks to dramatic changes in both the overall economic environment as well as specific consumer behaviors since the onset of the pandemic, AI/ML models in organizations of all sizes and in every industry have been rendered largely ineffective because the pre-pandemic data on which the models were trained is no longer relevant or predictive of current behavior.
Once in production, a model's behavior can change if production data diverges from the data used to train it. In other words, the behavior of a model is determined by the picture of the world as it was during training. When real-world data diverges from the picture "learned" by the model -- a phenomenon known as drift -- models perform poorly or, in the worst cases, fail completely.
Model Drift Has Real-World Consequences
Model drift can result in severe consequences for everyone involved if it's not observed, caught, and remediated quickly. Think critical supply shortages resulting from inaccurate demand projections (e.g., toilet paper), customers unjustly denied a loan due to pre-COVID-19 data feeding new approval decisions, and misdirected PPE shipments causing a glut of equipment in a rural hospital in Alaska while a California hospital experiencing a surge in COVID-19 patients faces empty supply closets.
These scenarios are not hypothetical; they are real-world examples of model drift that have already happened -- and will continue unabated if companies don't address the problem with effective and thorough model monitoring solutions.
Model Monitoring Is Critical in 2021
As the pandemic continues to impact businesses, model monitoring is one of the most critical areas that must be addressed by companies using AI/ML models. It requires a shift in thinking about data science governance in general.
Unlike traditional software, in which engineers specify explicit logic in code, AI/ML models learn from data, as we discussed. IT organizations, although highly adept at monitoring the health and performance of traditional software applications, are often ill-equipped to monitor the "health" of their AI/ML models or are entirely unaware of the risks associated with model drift. One of the major blind spots is a lack of understanding that because data science models are probabilistic and traditional software is deterministic, monitoring them is completely different as well.
Typically, data science leaders carry the monitoring burden as, ultimately, their teams are responsible for the quality of predictions made by the models. Data scientists then spend a significant amount of time analyzing models in production instead of doing new research or, in an attempt to reduce manual effort, develop ad hoc monitoring solutions for each model, which leads to a proliferation of disparate, inconsistent, and half-baked monitoring solutions.
This "Wild West" of monitoring solutions has already become unsustainable. Instead, companies should look to operationalize model monitoring to ensure their models are functioning correctly and the decisions they are making based on those models are sound.
Think Different About Model Monitoring
A good way to think about monitoring a machine learning model is to approach it in the same way you would think about getting your annual physical or taking your car in for regular oil changes and tune-ups. Both of these examples help ensure your own health or that of your automobile.
Model monitoring helps ensure that your AI/ML models are healthy and performing to the best of their abilities.
A holistic approach to model monitoring can also detect problems before the models degrade or cause serious business issues. It can track drift in both input data and output predictions, and it can track prediction quality by finding "ground truth" -- the accuracy of the input data. For example, you need to know how many loans defaulted before you can compare it to the predicted default rate. Without these model quality checks, faith in the accuracy of your predictions wavers.
Operationalizing model monitoring has several other benefits as well. With it, data scientists can focus on value-added projects and new research rather than wasting time analyzing existing models in production. They can quickly assess whether a model needs to be retrained or rebuilt and analyze failure conditions by running different tests interactively. By providing a "single pane of glass" to monitor the performance of all models across the organization, you can prevent drift that renders your models ineffectual.
A Final Word
Model monitoring is a vital operational task that helps ensure your models are performing to the best of their abilities. It gives you the peace of mind to know that the models upon which you're basing strategic decisions are healthy, thereby avoiding negative impacts on the business and protecting customer loyalty and satisfaction. As your company moves more machine learning systems into production, you need to update your model monitoring practices to remain vigilant about model health and your business' success in a consistent, efficient manner.
Joshua Poduska is the chief data scientist with Domino Data Lab, a data science platform that accelerates the development and deployment of models while enabling best practices like collaboration and reproducibility. He has 18 years of experience in analytics. His work experience includes leading the statistical practice at one of Intel’s largest manufacturing sites, working on smarter cities data science projects with IBM, and leading data science teams and strategy with several big data software companies. Josh has a masters degree in applied statistics from Cornell University. You can reach the author via Twitter or LinkedIn.