RESEARCH & RESOURCES

Philip Russom

Emerging Tech: Data Integration for Real-Time DW

Real-time data warehousing is best enabled by real-time data integration tools and techniques

In a 2009 TDWI survey, a paltry 17 percent of survey respondents reported using real-time functionality with their data warehouse. Yet, a whopping 92 percent said they would be using it within three years. This puts real-time functions at the top of the priority list for next-generation data warehouses.

Why the rush to real-time data warehousing? Because it’s a foundation for time-sensitive business practices such as operational business intelligence, frequently refreshed management dashboards, just-in-time inventory, facility monitoring, self-service information portals, on-the-fly recommendations in eCommerce, and so on. Although these real-time business practices emerged a few years ago, their broad adoption is just now emerging.

(Commentary continues below)


Related Resources

White Papers:
  The Top 10 Reasons for Choosing Open Source Data Integration
  The Role of Open Source in Data Integration

Webinars:
  Trusted Data for BI: Integrating Data for Success
  Data Governance for Business Leaders


(Commentary continues)

Ironically, real-time data warehousing is less about the warehouse and more about the data integration (DI) functions that feed the warehouse. In fact, accelerating a data warehouse into real time is mostly accomplished via real-time data integration functionality. If you’re facing this kind of work, consider the following recommendations:

Enable real-time data warehousing with real-time data integration

Most of the real-time functionality applied in DW/BI is enabled by various types of data integration tools and techniques, along with the diverse data delivery speeds and interfaces these support. Even so, accommodating real-time information delivery may also involve adjustments to data warehouse architecture and data models, as well as how reports are refreshed. For example, DW architecture may need a data staging area to accommodate data arriving in real time. The DW platform may need to support fast or reactive interfaces such as services, messages, events, triggers, and real-time alerts.

Know the available real-time data integration techniques and tool types

Those that are conducive to real-time operation include data federation, frequent batches, change data capture (CDC), complex event processing (CEP), on-the-fly data quality, and interoperability with message buses and service architectures. Are all these necessary? Yes, because diverse data and business processes have diverse requirements for information delivery. That’s why “real time” is a blanket term for multiple information delivery speeds and frequencies. In turn, these requirements are satisfied by multiple DI technologies that operate at various speeds, frequencies, interface types, and other performance characteristics.

Recognize that real-time data warehousing usually virtualizes data

For the purposes of this article, let’s define data virtualization as a pooling of many DI resources. That’s similar to how pooling server resources creates a virtualized cloud, but data virtualization pools data, as seen through data sources, targets, transformations, interfaces, and other components of a data integration solution. This way, data that’s virtualized via data integration techniques provides an abstraction layer that enables data provisioning -- even in real time. Real-time provisioning is key because the kind of data you want to move in real time for DW/BI changes so rapidly that only the most recent update of it is useful for a business process. Note that data federation can be a component of this, but data virtualization involves many more capabilities.

Wrap real-time data warehouse functionality in data integration services

The point is that a data integration service can be invoked multiple ways, causing it to operate in real time, latently, or a gradation of these, as evolving requirements demand. The problem is that many organizations have developed a rich library of data integration routines, logic, and interfaces for their high-latency, history-oriented data warehouses. They then developed yet another library of similar DI functions that run in real time. Instead of this inefficient redundancy, a better approach is to have one library that can operate flexibly, and hence be reusable across a number of workloads, interfaces, and projects.

In summary, accelerating a data warehouse into real time is mostly accomplished via real-time data integration functionality. Diverse data and business processes have diverse requirements for information delivery, which is why real-time data warehousing can require multiple data integration techniques that operate at differing speeds, frequencies, and interface types, handling both physical (persistent) data and virtual (instantiated) data. An advanced solution for real-time data warehousing will support data virtualization and data services.

To learn more, replay Philip Russom’s recent Webinar on Data Integration for Real-Time Data Warehousing available at the Webinar Replay list.


Related Resources

White Papers:

   
The Top 10 Reasons for Choosing
Open Source Data Integration
    The Role of Open Source
in Data Integration

Webinars:


Trusted Data for BI: Integrating Data for Success
September 30, 2010
Speaker: Philip Russom


Data Governance for Business Leaders
October 13, 2010
Speaker: John Ladley

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.