TDWI Upside - Where Data Means Business

The Shortcomings of Predictive Analytics

Data scientist Claudia Perlich explains why we must use machine learning and predictive technologies ethically, responsibly, and mindfully.

Do data scientists need a refresher course in the Hippocratic precept "first, do no harm"?

This is a question that data scientist Claudia Perlich has spent considerable time grappling with.

Predictive Models Have Unintended Side Effects

Perlich, chief data scientist with marketing analytics specialist Dstillery, believes data science and advanced analytics are powerful tools for human good, and she'll be making this case at TDWI's upcoming Accelerate conference, in Boston April 3-5. Accelerate will feature tutorials and presentations by Perlich and other industry luminaries.

In all likelihood, few sessions will be as provocative as Perlich's. Even though she's an unapologetic champion of predictive analytics, Perlich recognizes that machine learning and other technologies can only be forces for good if people use them ethically, responsibly, and mindfully.

For Further Reading:

3 Flavors of Predictive Analytics Automation

Taking Advantage of Predictive Models

AI in the Crosshairs

"I'm a huge fan of this technology. I love what I do and I've been doing it for almost 20 years. In that time, I've collected a deep understanding of why things don't work, often for very surprising reasons that have nothing to do with classical reasons," she explains. "I'm really interested in when and why things fail. 'Failing' isn't [the right word]. I'm talking about 'unintended side effects' -- [things] you didn't really count on when you decided to build models and put them out there in the wild."

First, Perlich says, we have to recognize that predictive models embody the acknowledged and unacknowledged biases of the people who created them.

"If you use a machine learning system to automatically screen job candidates ... your predictive model may propagate historical biases. If a model makes predictions [based on] what has happened in the past, it is bounded by [the selection criteria of] the past," Perlich says. "All of us who are enthusiastically building these models need to develop a moral sense of responsibility ... about how and when they are put to use."

Models Give You Exactly What You Ask For

This "moral" sense isn't just limited to scrubbing biases out of models. In some cases, a predictive model is optimized to predict the letter but not the spirit of what the modeler desires.

"I have seen the exact analogous effect in advertising ... when we talk about models that predict who will click on ads and we try to select those opportunities with the highest probability [of click-through]. You're trying to find the people most interested in the product -- people who will actually buy the product," she explains.

"This ignores the fact that people tend to accidentally click on ads. A person has eyesight problems; a person has lent their device to their three year old; a person is distracted. If you base your model on all [click-through data], you're going to ... end up with something that is technically correct but doesn't actually do what you want it to do."

Data scientists don't just have a responsibility to the strict letter of a requirement -- e.g., predicting successful job applicants or click-through opportunities -- but to the spirit of what they're trying to model and measure, she argues.

"The model is doing its job. It will find you a set of opportunities with the highest click-through rate. The applicant recommended by the [candidate-screening] model will be highly likely to succeed. [However,] you are stuck with this incompatibility where you're saying you want one thing and your model is giving you something else entirely," Perlich says.

"The discrepancy between the two objectives will increase as you are more able to do [the one thing] really, really well, [be it identifying] higher click-through rates or successful job applicants."

Designing Better Models

When you're designing predictive models, there are a couple of things to be alert to, Perlich says.

"You should never have any single technical criteria -- you should never focus just on click-through rates, for example. You should never try to do too much with your [individual] models. It's hard to build models that are optimized [for] many things at the same time," she observes.

"If your model is getting too good, it's almost always a problem. There was an example where we built a really good model that predicted breast cancer -- except it didn't. The only thing it had basically learned was ... that people in a [breast cancer] treatment center are more likely to have cancer than people in a breast cancer screening center."

Perlich sees the zero-sum character of societal debate about data mining and data science as a distraction. "The criticism being brought forward against data mining and data science is, in principle, often correct, but at the same time the antagonism between the critics of data science and its actual practitioners is exaggerated and nonproductive," she points out. "We're being told from a privacy point of view that everything we do is evil. What we need to ... collaborate on are better options to do these things the right way."

Because of its power, predictive technology will be used. It's inevitable. The challenge is to promote ethical and responsible usage.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.