TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Data 101

00 Days

00 Hrs

00 Min

00 Sec

Slowly Changing Dimensions: What They Are and Why They Matter

Most business events don't happen in a vacuum. A sale involves a customer. A support ticket involves an employee. A shipment involves a location. In a data warehouse, the tables that provide that context, describing who the customer is, what region the employee belongs to, what category the product falls into, are called dimensions. And for a data warehouse to answer historical questions accurately, those dimension tables need to reflect not just what's true now but what was true at the time each event occurred.

That requirement sounds simple. It turns out to be one of the more consequential design challenges in data warehousing.

The problem arises because real-world entities change. Customers move to different cities. Products get recategorized. Employees transfer between departments. Sales territories get redrawn. In an operational system, updating a record to reflect the current state of the world is exactly the right thing to do. In a data warehouse, blindly overwriting historical values with current ones means that when you run a report covering a past time period, you're joining historical events to current attributes. The numbers may look plausible. They may also be wrong in ways that are hard to detect.

Consider a simple example. A customer based in the Northeast buys something in January. In February they relocate to the Southwest and you update their record. In March someone asks for Q1 sales broken down by region. If your dimension table only stores the current address, that January sale now appears under Southwest, because that's where the customer lives today. The event is correct. The context is not.

Dimensions where this kind of change happens are called slowly changing dimensions, a term coined by data warehousing pioneer Ralph Kimball. "Slowly" is relative: the changes happen less frequently than the events being recorded, but often enough that ignoring them produces inaccurate history. The question every dimensional modeler has to answer is: when an attribute changes, what should happen to the historical record?

There are several established approaches, typically referred to as types.

Type 1 is the simplest: overwrite the old value with the new one and keep no record of what it used to be. This is appropriate when historical accuracy on that particular attribute doesn't matter, or when the change is a correction rather than a real update. If a customer's name was misspelled and you fix it, you probably don't need to preserve the misspelling. Type 1 is the right choice in those cases. It's the wrong choice when the attribute has analytical significance and the history of its changes is something you'll need to query.

Type 2 is the most powerful and the most widely used approach for attributes that matter historically. Instead of overwriting the existing row, you preserve it and insert a new row representing the updated state. Each version of the dimension record gets its own surrogate key, a start date, and an end date, so fact table rows can be joined to whichever version of the dimension was active at the time the event occurred. The January sale points to the Northeast version of the customer record. The history is preserved. The query returns the right answer.

Type 3 takes a middle path: rather than creating new rows, it adds a column to store the previous value alongside the current one. This works when you only ever need to compare the current state with the immediately prior one, but it breaks down quickly if you need deeper history or if attributes change more than once.

Beyond Types 1, 2, and 3, the literature describes several additional variants, including hybrid approaches that apply different types to different attributes within the same dimension. In practice, most dimensional modeling decisions come down to understanding what historical questions the warehouse needs to answer and choosing the type that makes those questions answerable without introducing more complexity than the problem warrants.

The slowly changing dimension concept is a good example of something that seems like a technical detail but is actually a business decision in disguise. Whether to preserve the history of a particular attribute depends on whether that history has analytical value, which depends on what the business actually needs to know. Getting that right requires a conversation between the people building the model and the people who will use it, not just a choice made at the database level.

Data 101

Slowly Changing Dimensions: What They Are and Why They Matter

TDWI

Engage

Research