By using tdwi.org website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Data Science Bootcamp

A TDWI Certificate Track

Virtual Classroom
August 26–28, 2024
9:00am – 5:00pm CT

virtual seminars

Data Science Processes and Data Preparation

August 26, 2024

9:00 am - 5:00 pm

Duration: Full Day Course

Central Time CT

Prerequisite: None

Deanne Larson, Ph.D.

DM, CBIP

President

Larson & Associates

Data science has been called “the sexiest job of the 21st century” and with good reason—the size and breadth of our data is growing exponentially, and businesses are leveraging advances in machine learning and AI capabilities to make sense of it all for competitive advantage. This course provides a complete overview of the data science process and drills into detail on key tasks that occur before analytic model building begins.

A project-oriented framework is used to introduce the discipline of data science, placing activities in the context of business value and covering the key concepts that every data scientist and business stakeholder needs to know. Each data science project must begin with establishing business and analytics objectives, then collect and integrate data, prepare it for analysis, develop analytic models, and deploy the results. For each stage, key principles will be described, and real-world examples will illustrate the concepts. These key principles apply whether the end goal of the project is to make a single decision, develop dashboards and visualizations, deploy new reports, or automate key business activities.

Next, the course breaks down the data science activities that occur before analytic modeling can begin. Data sourcing and preparation activities are incorporated into multiple phases of a data science project and are repeated as part of an iterative process as models are designed, validated, and deployed. Establishing the business and analytics goals of a project requires evaluating and profiling potential data sources to ensure they match analytics objectives. Data preparation activities create integrated data fit for analytic modeling, and they are adjusted as models are developed, features are selected, and algorithms are tuned. Deployment of production models requires the development of automated data pipelines, and operations processes must monitor sources and pipelines that feed analytic models.

You Will Learn

  • How data science is applied to business challenges to produce useful results
  • How data science programs are organized and key roles such as business stakeholders, subject matter experts, data engineers, and analytic modelers
  • The major stages of a data science project, including establishment of goals, data preparation, analytic modeling, and deployment
  • The relationship of data science to statistical analysis, machine learning, and AI
  • Tools and technologies used in data science
  • How data sourcing and preparation fit into the major stages of a data science project
  • Principles for matching data sources to business goals and analytics techniques
  • The role of exploratory data analysis (EDA) in evaluating potential data sources
  • The importance of assessing data quality when choosing data sources
  • Data preparation activities such as cleansing, integration, and feature engineering
  • Key technologies that enable data sourcing and preparation

Geared To

This course is geared to technical and non-technical professionals getting started with data science, including:

  • Business analysts
  • Business stakeholders
  • Data scientists
  • Analytics practitioners
  • Data engineers
  • Analytics project leads
  • BI and data management professionals

Experienced data scientists will find this course to be a review, but they will benefit from the class if they have not been formally exposed to key principles and practices.

Subscribe to receive seminar updates via email

TDWI Data Science Bootcamp

August 26–28, 2024