Business Intelligence Experts' Perspective
- By Stephen Brobst, Evan Levy, Craig Muzilla
- May 23, 2005
Jeet Kumar is the director of data warehousing at a large regional bank. He was hired five years ago to implement a data warehouse to support the bank’s CRM business strategy. Using the data warehouse, the bank has been successful in integrating customer information, understanding customer profitability, attracting customers, enhancing customer relationships, and retaining customers.
Over the years, the bank’s data warehouse has moved closer to real time by moving to more frequent refreshes of warehouse data. Now, the bank wants to implement customer self service and call center applications that require even fresher data than is currently available in the warehouse.
Jeet needs some help in thinking through the possibilities for providing fresher data. One option is to fully commit to implementing real-time data warehousing. His ETL vendor is ready to help him make this move. However, Jeet has been hearing more about enterprise application integration (EAI) and enterprise information integration (EII) technologies and wonders how they might fit into his plans.
In particular, he has the following questions:
- What exactly are EAI and EII technologies?
- How are they related to ETL?
- How are they related to real-time data warehousing?
- Are they required, complementary, or alternatives to real-time data warehousing?
Stephen Brobst
The architectural evolution in data warehouse deployment that Jeet is experiencing in his regional bank is typical of the progression toward real-time data warehousing. There was a time when data warehouse implementations were relatively isolated within an organization’s IT infrastructure. A select group of knowledge workers (usually in marketing, finance, strategic planning, and so on) had access to this information for reporting and analysis, but the requirements for connectivity to large numbers of users outside the corporate ivory tower—and interoperability with production systems—were minimal.
With the expanded scope of successful data warehouses to encompass tactical decision making, in addition to traditional decision-support applications, better integration into the mainstream of IT infrastructure has become essential. As a result, EAI has emerged as a critical component in the architecture of advanced data warehouse implementations (Brobst, November 2002). When properly deployed, EAI provides a vehicle for transitioning from back-office decision making to tactical decision support on the front lines to impact execution of the enterprise strategy. EAI plays a particularly important role when implementing the extreme data freshness service levels required for real-time data warehousing (Brobst, Spring 2002).
EAI and EII are two very different frameworks in the context of data warehouse deployment. EAI provides a vehicle for “pushing” data from source systems into the data warehouse. EII, on the other hand, is a mechanism for “pulling” data from source systems to satisfy a request for information. EII attempts to integrate data on the fly through use of integration logic embedded in the middleware used to federate information across multiple data sources. EII would normally be used to combine integrated information from a data warehouse with data sources external to the warehouse that may have more up-to-date data (e.g., an operational source system) or external data sources that have not yet been integrated into the data warehouse. It is inadvisable to use EII with high data volumes because this technique implies pulling data across the network from disparate data sources and dynamically integrating data for each query executed.
EAI, on the other hand, can be used to facilitate data acquisition directly into a (near) real-time data warehouse or to deliver decisions to the OLTP systems that will be responsible for the associated bookkeeping activities. EAI with process integration allows for “closed-loop” decision making. Data fed from the bookkeeping environment into the real-time data warehouse will (selectively) cause event-based triggers to fire (based on business rules) and initiate decisions that are fed back into the operational bookkeeping systems for execution.
There are many different approaches and tools for EAI implementation. The Hurwitz taxonomy of the EAI implementations (described in Gold-Bernstein, 1999) provides a useful framework for understanding differences between the options available in the marketplace. From the perspective of data warehouse deployment, three categories of EAI are most interesting:
- Data-level EAI
- Message-level EAI
- Process-level EAI
These options typically reflect increasing maturity in the integration of the business intelligence environment with the overall enterprise application framework. Each has a different approach, and there are clear implementation tradeoffs.
The data-level EAI technique implements information exchange between multiple application data stores using traditional extract, transform, and load (ETL) techniques common in data warehouse deployments. Metadata is used to define transformations between the source system data and the target data required by the data warehouse. Data-level EAI is typically implemented using batch-file processing techniques.
A major advantage of data-level EAI is that it is extremely non-intrusive. As long as a time window can be defined during which an extract from the operational systems will not adversely affect production workloads, the impact to legacy systems infrastructure should be minimal. No complex integration between systems is required; metadata specifications are used to translate between source data definitions and the target definitions for the data warehouse.
The message-level EAI technique is much more compatible with (near) real-time information sharing between source systems and a real-time data warehouse. Message-level EAI manages the message exchange among multiple applications using reliable queuing systems. Business events can be published to one or more message queues as they occur in real time rather than relying on a batch-file processing model for information exchange.
The disadvantage of message-level EAI is that it requires more involvement from participating applications than data-level EAI. Participating applications in message-level EAI must create interfaces for sending and receiving messages. In some cases, a non-trivial amount of coding is needed for a legacy application to implement message-level interfaces. This is specifically an issue with older, batch-oriented systems. It is difficult to leverage a real-time messaging capability if the source system is a batch billing system with no hooks for delivering information at a lower level than files.
The process-level EAI technique goes beyond message-level EAI by overlaying a workflow management capability on top of message delivery capability. While message-level EAI is usually point-to-point, process-level EAI tools typically rely on a publish/subscribe messaging model. Process-level EAI can be thought of as an extension of message-level EAI with a middleware layer that performs the business process management using metadata derived from workflow automation tools.
Process-level EAI may be overkill for organizations just beginning with EAI implementation. However, full-blown, process-level EAI definitely emerges as a requirement as more sophistication is required for routing information between applications, orchestrating business processes for decision making and task execution, and interfacing to external business processes (Schmidt, 2000).
Organizations are finding that it is high time they remove their data warehouse assets from “solitary confinement” in the corporate ivory tower. Leveraging information assets in a real-time data warehouse in support of tactical decision making requires that the business intelligence solution within an enterprise cooperate more closely with OLTP applications to facilitate near-real-time data acquisition as well as delivery of decisions.
EAI plays a critical role in providing seamless integration of decision-making capability with traditional bookkeeping systems. The specific tools used for EAI deployment will depend on maturity of the technical infrastructure and sophistication of requirements within the enterprise. EII can complement real-time data warehouse deployment by allowing on-the-fly integration of data from external sources, but should not replace the integration of core decision-making data into the warehouse through the use of EAI tools.
Evan Levy
Good news! The data warehouse has proven to be a business-critical system to the bank. Business use and access driving the need for more timely data means there’s buy-in. But the options for growth can sometimes be a bit overwhelming.
What’s important now is that Jeet and his team understand and distinguish the functional needs to ensure that the right technologies are matched to the appropriate information uses.
The data warehouses can act as the data-provisioning system for the bank’s new operational CRM applications (customer self-service and the call center application). Data provisioning platforms have traditionally leveraged specially built middleware to enable operational systems to access their data. Given the current state of ETL, EAI, and EII technologies, the bank has a new set of alternatives that can leverage the data warehouse as a data provisioning platform—thus providing these systems with all the benefits a data warehouse environment offers (data hygiene and quality, standard data representation, and integrity checking to name a few). The result can be a new way of providing data to a new operational system, while dramatically reducing ongoing maintenance and improving data quality.
EII provides a creative solution when just-in-time data access and integration is required.
Let’s first review the different technology alternatives for supporting data migration and connectivity between the data warehouse and the bank’s operational CRM systems.
ETL is probably the simplest (and most widely understood) method for data migration. In order to utilize ETL technology correctly, the actual migration and load requirements must be clearly articulated: an upfront designation of the source-data location and format, the documentation of all data element conversion rules, and the designation of the target database.
Because code must be developed to support each of these areas, each must be predefined. The up-front coding and design can be resource intensive. However, the ongoing processing of ETL code is usually highly efficient. Most of my clients use ETL to address data migration activities that are identified once but executed on an ongoing and regular basis (e.g., daily sales update, monthly sales figures, and so on).
It’s important to realize that not every data migration situation can be established in a predefined manner. Sometimes, data migration—and, for that matter, the business decisions that drive it—must occur based on specific events. Back in the 1980s, it was common for grocers to replenish store shelves at the end of each month. Today, those same grocers replenish their products once the inventory falls below a particular level (e.g., an order is placed once only three items remain on the shelf ). This scenario highlights one of the differences between EAI and ETL.
EAI technology aids data migration between application systems when a flat-file interface doesn’t exist or can’t support the required data migration functionality. Because operational applications facilitate business processes, these applications store important information and are aware of particular business events, which can be critical to determining the need for data migration.
In the grocery example, a “low inventory” event triggers a message to the order system for replenishment. This type of data migration process is better suited to EAI than ETL because the data is generated on an event basis. ETL technologies weren’t designed to automatically recognize such events, whereas EAI excels here.
EII technology focuses on providing a single view to multiple disparate data sources. This is accomplished through sophisticated and integrated metadata architecture that allows the EII system to find and query different data sources and return the results. The power of EII is in delivering data integration (and access) on demand. The only data that is moved or transformed is the specific records necessary to answer the query. EII can be an optimal solution for on-the-fly data integration.
With these short profiles in hand, let’s now review how Jeet can leverage the strengths of each.
ETL would be the technology of choice for the initial loading of data into the bank’s new CRM operational applications. This data may include any non-event-based or historical detail, including existing customer lists, purchases, and third-party data. A telecommunications firm we work with uses ETL technology to migrate its customers’ payment details onto the CRM system at the end of each business day, once the payments have been posted to the billing system. Jeet’s firm might do the same.
EII isn’t a substitute for traditional data warehouse query support, but it provides a good solution when query processing requires a limited amount of data spread across multiple systems.
EAI should be positioned to support the loading of any details from other applications where the data is eventbased (for example, order placement or customer contact). Another bank I know uses EAI technology to transfer order details to their CRM system once their ERP system approves a customer order. EAI technology may also play a role should the new CRM applications require data from other packaged application systems (such as accounting or human resources).
EII provides a creative solution when just-in-time data access and integration is required. EII works well when a limited amount of data is required to support query processing, such as operational reporting or discrete data integration. A retail client uses EII technology to support its call center operations. When a customer contacts customer service, the customer service desktop uses EII software to query more than 15 source systems for the specific details for that customer. EII isn’t a substitute for traditional data warehouse query support, but it provides a good solution when query processing requires a limited amount of data spread across multiple systems.
Jeet is right to consider data-integration alternatives, and wise to understand that different processing needs call for different solutions. His situation exposes the fallacy that there’s a one-size-fits-all integration solution. As data integration becomes ever more critical for enterprise initiatives like CRM, there will probably be more solutions to come.
Craig Muzilla
The problem facing Jeet Kumar is typical in businesses today. Many companies successfully implemented data warehouses to support their marketing initiatives and better analyze customer behavior and patterns. This greater understanding of customers led to success in attracting new customers, selling more products and services, increasing profitability, and retaining existing customers. But now Jeet’s regional bank wants to go further. They want to better serve and sell to customers by accessing all customer data from disparate sources and making it available in real time to operational applications such as the call center and self-service portal.
To solve this operational problem, replicating data into a single data warehouse or ODS will not be adequate. As the optimal alternative, EII technology will provide the means to access heterogeneous customer data from many disparate operational data stores in real time and provide a standard, consistent view of customer data. In fact, whether the goal is more real-time operational data integration, business intelligence, or support for analytical applications, EII systems complement the warehouse by extending its reach to hard-to-get data sources not available in the warehouse today.
Replicating data into a data warehouse was not originally designed or intended to support real-time operational application needs. Since its real advent during the late 1980s, the data warehouse was conceived as the place to consolidate disparate data from many operational applications, making that data available for analysis. The warehouse was the venue for capturing large volumes of historical information about the customer and “normalizing” the information to provide a consistent view. Large-scale analysis could be conducted without jeopardizing the integrity or performance of the operational systems. The warehouse excels in this analytical capacity.
However, data warehouses weren’t conceived as the data store for operational applications. As soon as one tries to convert the warehouse, even a “real-time” warehouse, into the operational data store for applications, a myriad of problems arise. One needs to ask whether it is possible to achieve real-time data delivery in a reasonably cost-effective manner using a data warehouse. In theory, one can establish a real-time data update and replication process between the operational data stores and the data warehouse, but the costs will be high. Burdens will be placed on the operational stores (just the problem that the warehouse was trying to solve in the first place!), consuming processing resources and jeopardizing the service levels of the source systems. Overall maintenance of the system will increase substantially and the risk of system failure is high. Furthermore, the data warehouse won’t have an inherent ability to write back to the operational sources.
What is EII?
Enterprise Information Integration (EII) is a type of data integration software that enables applications to access data from multiple, disparate data sources without replicating the data. The data is federated. EII creates a data abstraction layer, where details about data structure differences, data location, data sources, and security differences are hidden from the application that needs the data. To the application using the data, such as Jeet’s customer portal, the EII system appears as a single virtual data source or database. The EII system provides integrated views of data from multiple systems, performing all the necessary joins and transformations while the data is being delivered. The best EII systems can integrate any type of data: relational, file-based, and application-centric. Moreover, the EII system can deliver this data in real time or near real time, depending on the application need.
The best EII systems enable companies to expand beyond one project and extend the data abstraction layer across the enterprise. This allows for greater use and re-use of data assets across many decision-support and operational applications while reducing the redundancy of data integration efforts. These best-of-breed EII systems are based on distributed architectures, rather than centralized architectures, enabling companies to support thousands of data sources and applications. The best EII systems also provide a variety of integration approaches to support many different needs. For Jeet, the EII system can represent the integrated customer profile data needed by the call center application as callable services, optimized for application integration and performance. Or, for random analysis of customer data, they can choose to represent the data from disparate systems as a set of virtual database tables that can be queried via SQL, as if the data were coming from a single database.
EII is fundamentally different from EAI and ETL, as these technologies were designed to tackle other types of integration problems. Whereas EII concentrates on providing integration and access to data from disparate sources, EAI is focused on solving the problem of process-to-process integration among applications. EAI orchestrates business process flow among applications and coordinates high volumes of small transactions. EII delivers small or large volumes of data and provides the necessary data integration, metadata management, and caching infrastructure to support these data intensive needs. ETL technologies were built to help load a warehouse or other persistent data stores and transform that data before it is loaded. In any case, EII easily complements EAI and ETL. EII provides the data abstraction capability and EAI coordinates the business process flow. EII can leverage ETL for complex transformations while EII delivers the data in real time.
Applying EII to Jeet’s Operational Applications
An EII system can make data available to the call center and self-service portal applications directly from multiple operational systems. Replication will not be required, thereby ensuring that data will be fresh and consistent. The EII system will create integrated views of customer data, helping to standardize how customer data should be presented to these applications, and ensuring one version of the truth. These integrated views can be represented as coarse-grained, callable services that are loosely coupled with the application. To the call center application, these services will appear as single data sources, even though the data is actually drawn directly from multiple source systems. Furthermore, these data services can be re-used by many applications. The EII system will also enable coordinated writing back to the original source systems.
The data warehouse will continue to be an important part of the data integration arsenal for this bank. However, it can’t do it all. Jeet Kumar should complement their data-warehouse strategy and grow the bank’s overall data integration capabilities with the addition of EII.
Stephen Brobst is Chief Technology Officer at Teradata, a division of NCR.
[email protected]
Evan Levy is a partner and co-founder of Baseline Consulting.
[email protected]
Craig Muzilla is Vice President of Strategy and Marketing at Avaki Corporation.
[email protected]
REFERENCES
Brobst, Stephen. Active Data Warehousing and Enterprise Application Integration. Proceedings of Data Warehousing 2002: From Data Warehousing to the Corporate Knowledge Center. Physica-Verlag Heidelberg, November 12-13, 2002.
Brobst, Stephen. “Delivering Extreme Data Freshness with Active Data Warehousing,” Journal of Data Warehousing. Spring 2002.
Gold-Bernstein, Beth. “EAI Market Segmentation.” EAI Journal. July/August 1999.
Schmidt, J. “Enabling Next Generation Enterprises.” EAI Journal. July 1, 2000.
This article originally appeared in the issue of TDWI.