What Is Cohort Analysis? How to Track Behavior Over Time
Suppose a subscription business is trying to understand whether its product is getting better. It looks at its overall retention rate and sees that it's been stable for two years. That looks like a good sign.
But stable overall retention could mean a lot of things. It could mean every group of new customers retains at roughly the same rate. It could also mean that newer customers are churning much faster than older ones, but the older customers, who are retaining well, are large enough in number to keep the average stable..
The aggregate number looks fine while something important is getting worse underneath it.
Cohort analysis is the tool that makes the difference visible.
A cohort is a group of users or customers who share a common characteristic at a specific point in time. The most common cohort definition is acquisition cohort: all customers who first signed up, made their first purchase, or became active users in a given time period. January 2024 customers are one cohort. February 2024 customers are another. The analysis then tracks each cohort's behavior forward in time, measuring how they perform at one month, three months, six months, twelve months after their starting point.
What you get is not a single number but a grid. Rows are cohorts. Columns are time periods since the starting point. Each cell contains a metric, typically retention rate, revenue per user, or engagement rate, for that cohort at that stage of their lifecycle. Reading across a row tells you how a specific cohort evolved over time. Reading down a column tells you how cohorts at the same stage of their lifecycle compare to each other. Both directions are informative, and they often tell different stories.
Reading down the columns is where cohort analysis tends to be most revealing. If your January cohort retained 60% of users at month three, your June cohort retained 55%, and your November cohort retained 45%, you have a degrading trend in early retention that an aggregate retention metric would smooth over entirely. That trend might indicate a product problem, a change in the quality or composition of acquired users, or a seasonal effect. Whatever the cause, the cohort view surfaced it in a way that aggregate analysis couldn't.
Cohorts don't have to be defined by acquisition date. Behavioral cohorts group users by something they did: customers who used a specific feature in their first week, customers who made more than three purchases in their first month, customers who contacted support within thirty days of signing up. These cohorts are useful for understanding whether specific behaviors predict long-term outcomes. If customers who use feature X in their first week retain at twice the rate of those who don't, that's a signal worth acting on, and it's a signal that only emerges when you construct the cohort deliberately and track it forward.
The mechanics of cohort analysis in SQL involve a few consistent patterns. You need to identify each user's cohort assignment, typically their first event date truncated to a week or month. You need to calculate, for each subsequent time period, whether the user was active and how far they are from their cohort start date. Then you aggregate by cohort and time period. The logic is not complicated, but it requires thinking carefully about what constitutes an event, how to handle users who have no activity in a given period, and what time granularity is appropriate for the question being asked.
Interpreting cohort analysis carefully matters as much as constructing it correctly. Newer cohorts have less history, so comparisons between a cohort with twelve months of data and one with two months of data have to account for that asymmetry. Cohort sizes vary, and a cohort with fifty users produces noisier metrics than one with five thousand. Seasonality can make cohorts acquired in different months look different for reasons that have nothing to do with product quality or user behavior. These are not reasons to avoid cohort analysis, but they are reasons to be thoughtful about what conclusions it actually supports.
For anyone working in product analytics, growth, or customer success, cohort analysis is one of the more powerful tools in a standard analytical toolkit. It doesn't replace aggregate metrics, which are useful for a quick read on overall health. It complements them by providing the segmented, time-aware view that explains what's driving those aggregates and where the real patterns lie.