By using website you agree to our use of cookies as described in our cookie policy. Learn More


Your Team,
Our Instructors
Anywhere, Anytime

Hands-On: Cluster Analysis for Data Science with Python

Duration: One Day Course

Prerequisite: See below

Course Outline

This course can also be delivered using R.

Cluster analysis is one of the most useful of all machine learning techniques. Cluster analysis splits data into groups (i.e., clusters) that are meaningful, useful, or both.

What makes cluster analysis different than other forms of machine learning is that it finds meaningful groups using only the data itself. The results of a cluster analysis are often used as an input into other forms of machine learning (e.g., classification).

Cluster analysis has broad applicability across industries and business domains. Cluster analysis is used in scenarios such as:

  • Customer segmentation
  • Anomaly detection
  • Document classification
  • Supply chain optimization

This course is an introduction to cluster analysis using Python. Your team will learn how to employ two of the most popular clustering techniques (k-means and DBSCAN) to craft new insights from data via hands-on labs.

Although this course contains some mathematics, the level of math is accessible to a broad audience and the focus is on concepts, not calculations.

Your Team Will Learn

  • How cluster analysis differs from other forms of machine learning
  • Use cases for cluster analysis
  • The different types of clustering algorithms
  • The k-means clustering algorithm
  • How to optimize the number of k-means clusters
  • The DBSCAN clustering algorithm
  • How to optimize the clusters found by DBSCAN
  • How to reduce dimensionality using principal component analysis (PCA)
  • How to handle categorical data with one-hot encoding
  • How to handle categorical data with factor analysis of mixed data (FAMD)
  • Additional resources for honing skills

Geared To

  • Business and data analysts
  • BI and analytics developers and managers
  • Business users
  • Data scientists
  • Anyone interested in using cluster analysis with their business data

No background in advanced mathematics or statistics is required.


Students must be familiar with Python and Jupyter notebooks or complete the prerecorded course “Python Quick Start” prior to the class. This prerecorded course will be made available in advance to any students who need it.

Laptop Setup

Attendees will need a laptop computer with specific software installed before the session. In advance of the class, attendees will receive detailed software download and installation instructions.

Download the Course Catalog to Get Started Today

TDWI Course Catalog Download