TDWI Upside - Where Data Means Business

Take Care When Applying Predictive Analytics

Predictive analytics can get things wrong. Here's what your enterprise needs to do to prevent problems or public relations disasters.

In the 2002 film Minority Report, human "precogs" were able to predict future crimes and thus allow the government to intervene to prevent them from happening. At the time, this was viewed as pure science fiction, but it may be moving closer to reality if we substitute data mining and big data analytics for the mutant precogs.

For example, although profiling may not be politically correct, it is often deployed in homeland security and other applications to target potential threats. Furthermore, whereas source data once focused on relational data generated by our enterprise transaction-based systems, updated at most daily, we now analyze data from a wide variety of structured, semistructured, and unstructured sources -- perhaps in real time -- that may also include sensors, social media, and GPS trackers.

Although most of us may not be employed in DOD or homeland security environments, as data warehouse practitioners we are often called upon to assist our users in gathering, storing, and analyzing data to be used in predictive analytics applications. These applications are often deployed to improve marketing, increase sales, reduce risk, anticipate equipment failures, or prevent fraud.

There is little doubt that we can gain insight and derive real benefits from predictive analytics, and we must be careful that we do not abuse this technology and cause negative reactions. It is important to consider whether the benefits might be outweighed by issues such as invasion of privacy.

For example, a few years ago a major retail chain used data mining to determine which product purchases were likely to be made by pregnant women in order to target them for other products that they would likely purchase for themselves or their babies. The exercise identified approximately 25 specific items that could be used to generate an accurate "pregnancy prediction" score and even predict how far along a woman was in her pregnancy. This allowed the company to accurately time direct mailings to these customers.

Although the results were highly accurate, the company received a great deal of negative publicity after it was revealed that an angry father was upset that the company had sent his teenage daughter -- still in high school and (unknown to him) pregnant -- coupons for cribs and baby clothes. He thought that she could not have been pregnant and accused the company of encouraging her to engage in sexual relations.

The exercise clearly demonstrated the potential marketing benefits that can be derived from data mining, yet the results became a public relations disaster that could have been averted. After the birth of a child or the purchase of a home becomes public record, most parents or new homeowners receive a slew of relevant (or perhaps not-so-relevant) offers. Predictive analytics, however, predicts events that have a high probability of occurring, but are usually less than certain. We need to be sensitive to this. We must recognize that misclassifying someone under the wrong prospect segment might cause problems. We must devise ways to minimize potential damage if the predictions are wrong -- or even when they are accurate. For example, the company might have included the pregnancy-related coupons in a more general mailing with other coupons that were not obviously targeted at pregnant customers.

When customers realize their data is being gathered and stored for potential data mining, one result of what some consider this "invasion of privacy" is that they simply provide inaccurate or false data. For example, they might provide a false name, address, or phone number when obtaining a customer loyalty card from a supermarket or pharmacy. By presenting the card, they still receive their discounts, but the issuing company misses an opportunity to gain a more complete view of that customer.

It is more important than ever that IT, marketing, finance, public relations, and perhaps legal departments work together to ensure that data analysis activities do not benefit one department to the detriment of the overall organization. It is also up to us, as data warehouse practitioners, to try to understand how these analyses will be used.

Our users may not (at least at first) appreciate our unsolicited opinions and advice, but once they realize we are trying to do what is right for the entire organization, they will likely move closer to accepting us as true business partners. If at all possible, we should strive to further understand our organizations' businesses and work to acquire appropriate domain expertise.

About the Author

Michael A. Schiff is founder and principal analyst of MAS Strategies, which specializes in formulating effective data warehousing strategies. With more than four decades of industry experience as a developer, user, consultant, vendor, and industry analyst, Mike is an expert in developing, marketing, and implementing solutions that transform operational data into useful decision-enabling information.

His prior experience as an IT director and systems and programming manager provide him with a thorough understanding of the technical, business, and political issues that must be addressed for any successful implementation. With Bachelor and Master of Science degrees from MIT's Sloan School of Management and as a certified financial planner, Mike can address both the technical and financial aspects of data warehousing and business intelligence.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.