Mike Schiff

Data Warehousing: Remembering the Basics

The data warehouse concept has continued to evolve over the past two decades and now includes a variety of architectures such as data marts, operational data stores, data warehouse appliances, analytic databases, and federated and virtual implementations.

The data warehouse concept has continued to evolve over the past two decades and now includes a variety of architectures such as data marts, operational data stores, data warehouse appliances, analytic databases, and federated and virtual implementations. Its functionality has also grown to encompass real-time data feeds, master data management repositories, operational business intelligence, and the inclusion of both structured and unstructured data.

As the evolution continues, we must not lose sight of the early concepts that served as the catalyst for the growth of the data warehousing industry. In particular, Bill Inmon's classic definition that "a data warehouse is a subject oriented, integrated, nonvolatile, time variant collection of data in support of management's decisions" (Building the Data Warehouse," John Wiley & Sons, Inc., 1992).

One of my concerns is that some of us are losing sight of the fourth characteristic of the definition, the time- variant collection of data. This is what allows us to make period-to-period comparisons and enables us to spot trends that raise issues that may need attention. I am especially concerned when I hear some vendors deliver the message that a data warehouse may no longer be needed since their technology allows organizations to access and consolidate data from multiple operational systems without having to first extract the data and store it in another database in a data warehouse. There are several potential problems with this approach. Among these: the absence of historical values and data inconsistency among the systems being consolidated.

Operational vs. Analytical Systems

In general, operational systems contain current data values such as inventory on-hand, the availability of a seat on an airline flight, a customer's outstanding balance, or which vendors supply a specific manufacturing part. Although many operational systems will also contain some historical data, this is not their primary design purpose. These systems were, by definition, designed to help operate the organization, not necessarily to help analyze it. For example, although an order entry system will contain detailed data about current year sales to a given customer, it may only contain limited (and likely highly summarized) data about last year's sales to that customer, and perhaps no sales data for earlier years.

Reconciling Data Among Multiple Systems

Additionally, as many operational systems were designed to serve specific functional areas, they may not all use the same data definitions and value lists. For example, the same customer may be represented by different customer numbers in different operational systems. Many view the process of cleansing and reconciling data (for example, unit-of-measure, value list, and format conversions) from multiple systems before loading it into a data warehouse as a necessary burden, but I consider it to be a act of purification. It ensures that those utilizing the data warehouse are not comparing apples to oranges.

Historically, operational systems, in particular legacy applications, were designed to meet specific functional needs and did not necessarily conform to corporate data standards (assuming data standards even existed when the systems were developed). Even when organizations made the investment to develop enterprisewide data standards, they frequently did not have the luxury of reworking existing operational systems to conform the these standards. These organizations could, however, ensure that when data from these systems was loaded into a data warehouse, the data was transformed so that data residing in the warehouse conformed to these standards.

Federation Has Its Place

This is not to say that a federated approach to data warehousing does not have its place in an overall data warehouse architecture, especially if the potential limitations are recognized and understood. One possible use could be to access operational data in disparate systems to prototype potential reports and analyses for "one-time requests" prior to making a decision to incorporate additional data into a data warehouse.

Furthermore, if an organization has matured to the point of having established (and more importantly enforcing) corporate data standards, it may be possible to utilize a federated approach that includes a data warehouse to access historical values in conjunction with operational systems to access the most current values. At the very least, this approach should be evaluated against near-real-time data feeds when implementing a data warehouse solution for analyses that requires both immediate and historic values.

About the Author

Michael A. Schiff is founder and principal analyst of MAS Strategies, which specializes in formulating effective data warehousing strategies. With more than four decades of industry experience as a developer, user, consultant, vendor, and industry analyst, Mike is an expert in developing, marketing, and implementing solutions that transform operational data into useful decision-enabling information.

His prior experience as an IT director and systems and programming manager provide him with a thorough understanding of the technical, business, and political issues that must be addressed for any successful implementation. With Bachelor and Master of Science degrees from MIT's Sloan School of Management and as a certified financial planner, Mike can address both the technical and financial aspects of data warehousing and business intelligence.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.