This one-day vendor-neutral session will expose analytics practitioners, data scientists, and those looking to get started in predictive analytics to the critical importance of properly preparing data in advance of model building. The instructor will present the characteristics of varying data types, how to address data quality issues, and identifying data representations that are suited to various project types. Participants will also learn how to move through the five major tasks of data preparation: selection, integration, cleaning, formatting, and construction.
Examples of each task will be given. Emphasis will be given to those tasks that must be overseen by the modeler - and which cannot be done outside of the context of a specific modeling project. Data is carefully “crafted” by the modeler to improve the ability of modeling algorithms to find patterns. Techniques highly specific to this type of data preparation are also explored like balancing and partitioning. Purposeful collaboration between data scientists, subject matter experts, leadership and IT that directly impact data preparation will also be discussed.
You Will Learn
- How to prepare a data sandbox for predictive analytics
- Ways to detect and treat missing data and address data quality issues
- Methods to match data representations to suitable project types
- Construction methods for various data transformations
- How to handle data outliers without biasing model performance
- How to build "train–test–validate" data sets for model development
- Resources, skills, and plans that you can take with you to confidently process raw data for analytics
- Analytics practitioners; data scientists; IT professionals; technology planners; consultants; business analysts; analytics project leaders