TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Data 101

00 Days

00 Hrs

00 Min

00 Sec

Blog archive

What Is Temporal Data Modeling? How Databases Track Time-Dependent Truth

Data has a complicated relationship with time.

When you store a fact in a database, you're implicitly answering a question: true as of when? Most database designs don't ask that question explicitly. They store the current state of the world and overwrite it when things change. That's appropriate for many use cases, but it means the database can only answer questions about the present. Ask it what a customer's address was eight months ago, or what the agreed contract price was on a specific date, and it either can't answer or gives you the wrong answer.

Temporal data modeling is the discipline of designing databases that can answer those questions correctly.

The foundational concept is the distinction between two different kinds of time that are relevant to any piece of data. The first is valid time: the period during which a fact is true in the real world. A contract is valid from the date it's signed to the date it expires. An employee holds a particular title from the date of their promotion to the date of their next role change. Valid time is about reality.

The second is transaction time, sometimes called system time: the period during which the database believes a fact to be true. This is about the record itself. When was this row inserted? When was it updated or deleted? Transaction time tracks what the database knew and when it knew it.

These two timelines are independent, and that independence matters more than it might initially seem. Consider a scenario where an insurance claim is filed in January for an incident that occurred in November. The valid time of the incident is November. The transaction time, when the database learned about it, is January. If you're calculating risk exposure as of November, you need to account for incidents that were valid in November even if they weren't recorded until later. If you're auditing what the system knew at a specific point in time, you need transaction time. Conflating the two produces queries that answer the wrong question.

A database that tracks only valid time is called a valid-time temporal table. A database that tracks only transaction time is called a transaction-time temporal table. A database that tracks both is called a bitemporal table, and bitemporal modeling is where the real expressive power, and the real complexity, lives.

SQL:2011, the international SQL standard published in that year, added native support for temporal tables, introducing syntax for defining and querying both valid-time and transaction-time periods. The AS OF SYSTEM TIME clause lets you query the state of the database as it existed at a specific point in transaction time. The FOR PORTION OF clause lets you update or delete records for a specific valid-time range without affecting data outside that range. Support for these features varies across database systems. Some implement them fully, some partially, and some not at all, which means temporal modeling in practice often involves implementing the patterns manually rather than relying on native support.

The manual approach typically involves adding period columns to tables: a valid_from and valid_to column to represent the valid-time period, and a recorded_at or system_from and system_to column pair to represent the transaction-time period. Each change to a record produces a new row rather than overwriting the existing one, with appropriate period columns set to reflect when the change occurred and what interval it covers. Queries then filter on these columns to retrieve the state of the data as of a specific combination of valid time and transaction time.

The slowly changing dimension patterns used in dimensional modeling, covered elsewhere in this blog, are a specialized form of valid-time temporal modeling. Type 2 slowly changing dimensions track the valid-time history of dimension attributes by preserving old rows and inserting new ones. What they typically don't track is transaction time: when the warehouse learned about the change, as opposed to when the change happened in the real world. A fully bitemporal data warehouse tracks both, which makes it possible to reconstruct not just what was true at any point in the past but what the warehouse believed to be true at any point in the past. For audit purposes, regulatory compliance, and certain classes of financial reporting, that distinction is not academic.

Temporal modeling adds real complexity to both schema design and query writing. Queries that would be simple against a current-state table become more involved when you're filtering across two time dimensions. ETL logic has to correctly manage period columns through inserts, updates, and late-arriving data. Indexes need to be designed with temporal query patterns in mind. None of this is intractable, but it's enough additional overhead that temporal modeling is most justified when the business genuinely needs to answer time-dependent questions accurately, rather than as a default design choice.

The cases where it's genuinely necessary tend to cluster around industries with strong regulatory requirements, financial services, healthcare, insurance, legal, and around analytical problems where the question isn't just what happened but what was known when. In those contexts, a database that can only answer questions about the present is a database that can't do the job.

Data 101

What Is Temporal Data Modeling? How Databases Track Time-Dependent Truth

TDWI

Engage

Research