This one-day vendor-neutral session will expose analytics practitioners, data scientists, and those looking to get started in predictive analytics to the critical importance of selecting, transforming, and properly preparing data in advance of model building. The instructor will present the characteristics of varying data types and discuss how to address data quality issues and identify data representations that are suited to various project types.
Participants will learn that data outliers are often not errors in the data, but the data points of most interest. Live demonstrations will reinforce why problem context is required to understand how to deal with outliers and why undertreating extreme values can introduce model bias. This session will also cover a wide range of data preparation exercises ranging from data sandbox construction to the creation of training, test, and validation data sets for model development.
You Will Learn
- How to prepare a data sandbox for predictive analytics
- Ways to detect and treat missing data and address data quality issues
- Methods to match data representations to suitable project types
- Construction methods for various data transformations
- How to handle data outliers without biasing model performance
- How to build "train–test–validate" data sets for model development
- Resources, skills, and plans that you can take with you to confidently process raw data for analytics
- Analytics practitioners; data scientists; IT professionals; technology planners; consultants; business analysts; analytics project leaders