Level: Beginner to Intermediate
Prerequisite: None
Lost in the excitement around analytics and AI is the central role of finding and preparing data. Analytic modeling requires data to be sourced, understood, cleansed, and prepared. Although the analytics are important, they are not possible without these key steps, which typically consume more time than any other stage of a data science project! This course provides an overview of the data sourcing and preparation activities in data science and predictive analytics projects, highlighting key principles and practices and providing business examples to reinforce each concept.
Data sourcing and preparation activities are incorporated into multiple phases of a data science project and are repeated as part of an iterative process as models are designed, validated, and deployed. Establishing the business and analytics goals of a project requires evaluating and profiling potential data sources to ensure they match analytics objectives. Data preparation activities create integrated data fit for analytic modeling, and they are adjusted as models are developed, features are selected, and algorithms are tuned. Deployment of production models requires the development of automated data pipelines, and operations processes must monitor sources and pipelines that feed analytic models.
This is part of an optional Data Science Bootcamp. Learn more about the courses offered, or attend this individual course.
You Will Learn
- How data sourcing and preparation fit into the major stages of a data science project
- Principles for matching data sources to business goals and analytics techniques
- The role of exploratory data analysis (EDA) in evaluating potential data sources
- The importance of assessing data quality when choosing data sources
- Data preparation activities such as cleansing, integration, and feature engineering
- The purpose and importance of data pipelines for analytics solutions
- Key technologies that enable data sourcing and preparation
Geared To
This course is geared to technical and non-technical professionals getting started with data science, including:
- Business analysts
- Business stakeholders
- Data scientists
- Analytics practitioners
- Data engineers
- Analytics project leads
- BI and data management professionals
Experienced data scientists will find this course to be a review, but they will benefit from the class if they have not been formally exposed to key principles and practices.