Moneyball and the Limits of Predictive Analysis
Could predictive analytics be used as extensively in business as it is in baseball?
- By Stephen Swoyer
- August 7, 2012
If you've seen the movie Moneyball, you likely have a good sense of how statistical analysis and predictive analytics (PA) have completely transformed the management of baseball.
You might even have asked yourself if widespread use of PA could transform the way businesses are managed? The answer is yes -- PA has already done so.
A better question, experts say, concerns the extent to which PA can transform business. Put another way, can predictive analytics be used as extensively and effectively in business as it is in baseball?
The short answer is no. Baseball is a game, with fairly well-defined parameters. Business is a much "fuzzier" proposition. Life has an infinite set of parameters, not all of which are understood, and some of which have yet to be identified. To model business is to model reality.
In that sense, argues Neil Raden, co-author of Smart Enough Systems: How to Deliver Competitive Advantage by Automating Hidden Decisions, business simply can't be modeled, much less made as predictable as baseball.
"In the business world, things are far more uncertain and fuzzy than [they are in] baseball. So I wouldn't be thinking about cases [in business] that defy prediction, I'd be looking at how ... organizations deal with uncertainty in their models," says Raden, a principal with information management consultancy Hired Brains Inc.
Predictive modeling makes or breaks the success of PA, but modeling is hard. The best models are those that best account for -- that most effectively manage -- uncertainty. Predictive modeling involves making choices (and sometimes guesses) about which variables you're going to model and about how you're going to understand them. Because we have a more complete understanding of the set of possible "choices" in baseball, its "reality" can be modeled -- and its uncertainty managed -- more effectively than can the "reality" of business.
It isn't a question of throwing more computing horsepower or modeling complexity at the problem, either. John MacGregor, vice president of and head of the Centre for Predictive Analytics with SAP AG, says he's built enormous models, including ensemble models with more than 5,000 variables. Even when he was working with his largest and most sophisticated models, MacGregor says, he wasn't naïve enough to believe that he was capturing reality.
Something -- a hidden or as-yet-unknown variable of some kind -- would always be missing, says MacGregor. "If you're trying to predict which customers will churn and which customers will stay, you can collect lots of variables and you can [collect data] from millions of customers," he comments. "But if the reason your customers are leaving is that someone has just brought out a new product [that's just much better than] your own [product], and you don't have something in the data or the algorithms that can account for this, you're going to miss it."
Hiccups
This is true even in baseball, where -- in spite of its comparatively fixed parameters -- hiccups nonetheless do occur.
The best known Moneyball hiccup is San Francisco Giants pitcher Matt Cain. Moneyball -- or "sabermetrics," as it's known to baseball stat-heads -- doesn't completely miss Cain; it simply fails to account for his greatness. Prior to this season, in fact, Cain used routinely to confound sabermetricians, who projected him as a better-than-average major league pitcher.
Although Moneyball rated Cain a good pitcher, it didn't consider him elite. The Giants think Cain's elite, however. If nothing else, they're paying him at an elite level: Cain's new $22 million contract makes him one of the five highest-paid hurlers in the game.
Industry veteran Mark Madsen, a principal with consultancy Third Nature Inc., says an example such as Cain's helps illustrate an important lesson about the limits of predictive analytics.
"Many people find correlation, but just because the model says thus-and-such should happen based on the distribution, there are always other factors not accounted for in the model," he argues. "I bet the psychological game of reading the batter or pitcher is a variable that can't be easily measured, so it isn't, but it could very well be how ... [hitters] are outfoxed on called third strikes. Metrics aren't reality. They are approximated abstractions of reality. Their interrelationships might likewise be improperly modeled."
Madsen's reference to "called third strikes" -- i.e., what happens when an umpire "calls" a pitch a strike, without a batter having swung at it -- invites comparison with another ostensible Moneyball hiccup: Philadelphia Phillies pitcher Vance Worley, who strikes out batters at a healthy, if not spectacular, clip. Moneyball values strike-outs. If a ball isn't put into play, it can't do much harm. Moneyball-wise, Worley projects as an above-average major league pitcher.
This, too, might be a case in which the most common or facile Moneyball metrics don't accurately account for what's happening. Although Worley racks up strikeouts at an above-average clip, he does so without inducing lots of swinging strikes. For two years running, in fact, he's led his league in called third strikes. Batters aren't necessarily swinging at his pitches and missing; instead, a human variable -- the umpire's judgment -- is factoring into Worley's performance. Moneyball has a metric for understanding or valuing swing-and-miss strikes; it hasn't yet developed a metric for understanding or valuing called strikes.
Are they even predictable? A swing-and-a-miss is -- for the most part -- an objective thing; a called strike involves a host of different variables, starting with an always-fluctuating strike zone. On the other hand, there's empirical reality: Worley has led his league in called third strikes for two years running. This has prompted at least one sabermetrician to suggest that Worley's ability to induce called third strikes might actually be a sustainable (i.e., a predictable) trend.
In this case, it isn't as if Moneyball fails to account for performance; it's that it simply needs more information to accurately project what's going on. Moneyball is predicting an outcome and getting it "right," but not because it's capturing what's actually happening. This, says Madsen, is the eternal question of predictive modeling: Are we sure we understand what's really going on?
He cites the popular urban legend of the correlation between sales of beer and diapers.
"Take [the urban legend] as 'true' and you merchandise beer and diapers in a certain way, but there are many, many confounding variables dictating when and under what circumstances it's valid, so you may very well be screwing things up by doing that," he points out.
The Human Factor
As the cases of baseball players Cain and Worley demonstrate, uncertainty can't be completely eliminated in even well-understood problem areas.
Human beings, notes Hired Brains' Raden, dislike uncertainty. He uses the example of "black swans" -- i.e., people, events, or things that not only don't fit the mold but which are egregious outliers. Because they're outliers, they make us uncomfortable: they alarm us. Something about our inability to account for their characteristics or for their performances troubles us.
Perhaps one day we'll succeed in narrowing the problem set, much like Moneyball is doing with the issue of called third strikes; we shouldn't and can't expect to eliminate uncertainty entirely, however.
In this regard, Raden says, understanding how our predictive models deal with uncertainty is of the utmost importance. "I think what makes black swans is not their anomalous characteristics, it's the limits of modeling. After all, a model is only that: a model. It isn't reality," he observes, "and it's impossible to model everything about a situation perfectly. Sabermetrics has overlooked something. Like George Box said, 'All models are wrong, some are useful.'"