TDWI FlashPoint: Exclusive Excerpt for The Modeling Agency Subscribers
We'd like to extend a special welcome to TMA's newsletter subscribers! Below, you'll find an excerpt from "Making Predictive Analytics 'Predictive'" by Thomas A. "Tony" Rathburn. This article was published in the March 2013 issue of TDWI FlashPoint.
Distributed monthly via e-mail to thousands of BI/DW professionals, TDWI FlashPoint features unique how-to articles, key findings from TDWI Research, and tips on building and managing BI/DW teams. Written by TDWI Premium Members, fellows, and instructors, the focus is on timely BI and DW issues.
If you are interested in reading the full article, we invite you to become a TDWI Premium Member. TDWI Premium Membership comes with a wide range of benefits, including a comprehensive selection of industry research, news, and information; access to all of TDWI's current and archived research and publications in password-protected areas of the TDWI website; and discounts to TDWI conferences and seminars.
Thank you for considering Premium Membership with TDWI! Please send us your questions and feedback.
Making Predictive Analytics “Predictive”
By Thomas A. "Tony" Rathburn
Much of the conversation surrounding predictive analytics tends to focus on advanced mathematics and sophisticated algorithms, as well as data-related issues—especially our current fascination with big data. Perhaps the most misunderstood aspect of predictive analytics is what makes it “predictive.”
Our analytics projects are intended to provide insights to help us understand the dynamics of the relationship between a set of conditions (our “input” or “independent” variable) and a behavior we are attempting to predict (our “output” or “dependent” variable). The key to developing a “predictive” model lies in the appropriate construction of records in training, test and validation data sets, and delivery applications.
Traditionally, analysts build their record structures by identifying a variable of interest as the output variable in their models, and then perform a variety of techniques for the selection and transformation of candidate input variables. This work is done to prepare the data for the application of a selected algorithm to generate the formula that comprises our model.
The vast majority of the effort is focused on data extraction, data quality issues, transforming the data into a form consistent with the requirements and assumptions of the algorithm we intend to use, and ensuring that the sampled data is consistent with the current environment.
These are all important considerations. However, this literal extraction of data from the data warehouse is flawed by its time perspective of the relationship between the candidate input variables and the output variable. By constructing data sets in this manner, the relationship is between the current state of the output variable and the corresponding current state of the candidate input variables. ...
Interested in reading the full article? Become a Premium Member today!
Already a TDWI Premium Member? Read the full newsletter.
About the Author
Thomas A. “Tony” Rathburn is a senior consultant with The Modeling Agency and has over 25 years of predictive analytics development experience. Tony will present two new courses at the TDWI Chicago World Conference in May 2013.