TDWI FlashPoint Newsletter TDWI FlashPoint Newsletter

Beyond the Data Warehouse: Architectural Options for Data Integration

TDWI Data Integration Techniques: ETL and Alternatives for Data Consolidation

TDWI Data Warehousing Concepts and Principles

Data Quality 

White Paper
The Top 10 Reasons for Choosing Open Source Data Integration

Quality White Paper
The Role of Open Source in Data Integration
Data Quality White 

Paper Trusted Data for BI: Integrating Data for Success
Presented by Philip Russom
Event date: September 30, 2010

Data Governance for Business Leaders
Presented by John Ladley
Event date: October 13, 2010

TDWI Experts is a twice-monthly e-newsletter where BI/DW thought leaders share opinions and commentary about relevant industry topics and the latest technologies.

Article Image

September 23, 2010

Emerging Tech: Data Integration for Real-Time DW

Philip Russom
Senior manager of TDWI Research

Topic: Emerging techniques and practices for real time

In a 2009 TDWI survey, a paltry 17 percent of survey respondents reported using real-time functionality with their data warehouse. Yet, a whopping 92 percent said they would be using it within three years. This puts real- time functions at the top of the priority list for next- generation data warehouses.

Why the rush to real-time data warehousing? Because it's a foundation for time-sensitive business practices such as operational business intelligence, frequently refreshed management dashboards, just-in-time inventory, facility monitoring, self-service information portals, on-the-fly recommendations in eCommerce, and so on. Although these real-time business practices emerged a few years ago, their broad adoption is just now emerging.

Ironically, real-time data warehousing is less about the warehouse and more about the data integration (DI) functions that feed the warehouse. In fact, accelerating a data warehouse into real time is mostly accomplished via real-time data integration functionality. If you're facing this kind of work, consider the following recommendations:

Enable real-time data warehousing with real-time data integration

Most of the real-time functionality applied in DW/BI is enabled by various types of data integration tools and techniques, along with the diverse data delivery speeds and interfaces these support. Even so, accommodating real-time information delivery may also involve adjustments to data warehouse architecture and data models, as well as how reports are refreshed. For example, DW architecture may need a data staging area to accommodate data arriving in real time. The DW platform may need to support fast or reactive interfaces such as services, messages, events, triggers, and real-time alerts.

Know the available real-time data integration techniques and tool types

Those that are conducive to real-time operation include data federation, frequent batches, change data capture (CDC), complex event processing (CEP), on-the-fly data quality, and interoperability with message buses and service architectures. Are all these necessary? Yes, because diverse data and business processes have diverse requirements for information delivery. That's why "real time" is a blanket term for multiple information delivery speeds and frequencies. In turn, these requirements are satisfied by multiple DI technologies that operate at various speeds, frequencies, interface types, and other performance characteristics.

Sponsored Links

Recognize that real-time data warehousing usually virtualizes data

For the purposes of this article, let's define data virtualization as a pooling of many DI resources. That's similar to how pooling server resources creates a virtualized cloud, but data virtualization pools data, as seen through data sources, targets, transformations, interfaces, and other components of a data integration solution. This way, data that's virtualized via data integration techniques provides an abstraction layer that enables data provisioning -- even in real time. Real-time provisioning is key because the kind of data you want to move in real time for DW/BI changes so rapidly that only the most recent update of it is useful for a business process. Note that data federation can be a component of this, but data virtualization involves many more capabilities.

Wrap real-time data warehouse functionality in data integration services

The point is that a data integration service can be invoked multiple ways, causing it to operate in real time, latently, or a gradation of these, as evolving requirements demand. The problem is that many organizations have developed a rich library of data integration routines, logic, and interfaces for their high-latency, history- oriented data warehouses. They then developed yet another library of similar DI functions that run in real time. Instead of this inefficient redundancy, a better approach is to have one library that can operate flexibly, and hence be reusable across a number of workloads, interfaces, and projects.

In summary, accelerating a data warehouse into real time is mostly accomplished via real-time data integration functionality. Diverse data and business processes have diverse requirements for information delivery, which is why real-time data warehousing can require multiple data integration techniques that operate at differing speeds, frequencies, and interface types, handling both physical (persistent) data and virtual (instantiated) data. An advanced solution for real-time data warehousing will support data virtualization and data services.

To learn more, replay Philip Russom's recent Webinar on Data Integration for Real-Time Data Warehousing available at the Webinar Replay list.

Philip Russom is senior manager of TDWI Research. Phil can be reached at .

Copyright 2010. TDWI. All rights reserved.

TDWI Membership TDWI Membership TDWI Membership