Operational BI: To DW or Not?
Operational business intelligence (BI) means many things to many people. But the nub is that it delivers information to decision makers in near real time, usually within seconds, minutes, or hours. The purpose is empower front-line workers and managers with timely information so they can work proactively to improve performance.
Key Architectural Decision. This sounds easy but it’s hard to do. Low latency or operational BI systems have a lot of moving parts and there is not much time to recover from errors, especially in high-volume environments. The key decision you need to make when architecting a low-latency system is whether to use the data warehouse (DW) or not. The ramifications of this decision are significant.
On one hand, the DW will ensure the quality of low-latency data; but doing so may disrupt existing processes, add undue complexity, and adversely impact performance. On the other hand, creating a stand-alone operational BI system may be simpler and provide tailored functionality and higher performance, but potentially creates redundant copies of data that compromise data consistency and quality.
So take your pick: either add complexity by rearchitecting the DW or undermine data consistency by deploying a separate operational BI system. It’s kind of a Faustian bargain, and neither option is quick or cheap.
Within the DW
If you choose to deliver low-latency data within your existing DW, you have three options:
1. Mini Batch. One option is simply to accelerate ETL jobs by running them more frequently. If your DW supports mixed workloads (e.g. simultaneous queries and updates), this approach allows you to run the DW 24x7. Many start by loading the DW hourly and then move to 15-minute loads, if needed. Of course, your operational systems may not be designed to support continuous data extracts, so this is a consideration. Many companies start with this option since it uses existing processes and tools, but just runs them faster.
2. Change Data Capture. Another option is to apply change data capture (CDC) and replication tools which extract new records from system logs and move them in real-time to an ETL tool, staging table, flat file, or message queue so they can be loaded into the DW. This approach minimizes the impact on both your operational systems and DW since you are only updating records that have changed instead of wiping the slate clean with each load. Some companies combine both mini-batch and CDC to streamline processes even further.
3. Trickle Feed. A final option is to trickle feed records into the DW directly from an enterprise service bus (ESB), if your company has one. Here, the DW subscribes to selected events which flow through a staging area and into the DW. Most ETL vendors sell specialized ESB connectors to trickle feed data or you can program a custom interface. This is the most complex of the three approaches since there is no time to recover from a failure, but it provides the most up-to-date data possible.
Working Outside the DW
If your existing DW doesn’t lend itself to operational BI for architectural, political, or philosophical reasons, then you need to consider building or buying a complementary low-latency decision engine. There are three options here: 1) data federation 2) operational data stores 3) event-driven analytic engines.
1. Data federation. Data federation tools query and join data from multiple source systems on the fly. These tools create a virtual data mart that can combine historical and real-time data without the expense of creating a real-time DW infrastructure. Data federation tools are ideal when the number of data sources, volume of data, and complexity of queries are low.
2. ODS. Companies often use operational data stores (ODS) when they want to create operational reports that combine data from multiple systems and don’t want users to query source systems directly. An ODS extracts data from each source system in a timely fashion to create a repository of lightly integrated, current transaction data. To avoid creating redundant ETL routines and duplicate copies of data, many organizations load their DW from the ODS.
3. Event-driven engines. Event-driven analytic engines apply analytics to event data from an ESB as well as static data in a DW and other applications. The engine filters events, applies calculations and rules in memory, and triggers alerts when thresholds have been exceeded. Although tailored to meet high-volume, real-time requirements, these systems can also support general purpose BI applications.
In summary, you can architect operational BI systems in multiple ways. The key decision is whether to support operational BI inside or outside the DW. Operational BI within a DW maintains a single version of truth and ensures high quality data. But not all organizations can afford to rearchitect a DW for low latency data and must look to alternatives.
Posted by Wayne Eckerson on November 6, 2009