The Evolution of Data Federation
Data federation is not a new technique. The notion of virtualizing multiple back-end data sources has been around for a long time, reemerging every decade or so with a new name and mission.
Database Gateways. In the 1980s, database vendors introduced database gateways that transparently query multiple databases on the fly, making it easier for application developers to build transaction applications in a heterogeneous database environment. Oracle and IBM still sell these types of gateways.
VDW. In the 1990s, vendors applied data federation to the nascent field of data warehousing, touting its ability to create “virtual” data warehouses. However, data warehousing purists labeled the VDW technology as “voodoo and witchcraft” and it never caught on, largely because standardizing disparate data from legacy systems was nearly impossible to do without creating a dedicated data store.
EII. By the early 2000’s, with more powerful computing resources, data federation was positioned as a general purpose data integration tool, adopting the moniker “enterprise information integration” or EII. The three-letter acronym was designed to mirror ETL—extract, transform, and load—which had become the predominant method for integrating data in data warehouses.
Data Services. In addition, the rise of Web services and services oriented architectures during the past decade gave data federation another opportunity. It got positioned as a data service, abstracting back-end data sources behind a single query interface. It is now being adopted by many companies that are implementing services oriented architectures.
Data Virtualization. Today, data federation vendors now prefer the label of data virtualization, capitalizing on the popularity of hardware virtualization in corporate data centers and the cloud. The term data virtualization reinforces the idea that data federation tools abstract databases and data systems behind a common interface.
Data Integration Toolbox
Over the years, data federation has gained a solid foothold as an important tool in any data integration toolbox. Companies use the technology in a variety of situations that require unified access to data in multiple systems via high-performance distributed queries. This includes data warehousing, reporting, dashboards, mashups, portals, master data management, SOA architectures, post- acquisition systems integration, and cloud computing.
One of the most common uses of data federation are for augmenting data warehouses with current data in operational systems. In other words, data federation enables companies to “real-time-enable” their data warehouses without rearchitecting them. Another common use case is to support “emergency” applications that need to be deployed quickly and where the organization doesn’t have time or money to build a data warehouse or data mart. Finally, data federation is often used to create operational reports that require data from multiple systems.
A decade ago there were several pureplay data federation vendors, but now the only independent is Composite Software, which is OEM’d by several BI vendors, including IBM Cognos. Other BI vendors support data federation natively, including Oracle (OBIEE) and MicroStrategy. And many data integration vendors, including Informatica and SAP have added data federation to their data integration portfolios.
Federation versus Integration
Traditionally, the pros and cons of data federation are weighed against those of data integration toolsets, especially when creating data warehouses. The question has always been “Is it better to build virtual data warehouses with federation tools or physical data marts and data warehouses with ETL tools?”
Data federation offers many advantages -- it’s a fast, flexible, low cost way to integrate diverse data sets in real time. But data integration offers benefits that data federation doesn’t: scalability, complex transformations, and data quality and data cleansing.
But what if you could combine the best of these two worlds and deliver a data integration platform that offered data federation as an integrated module, not a bolt on product? What if you could get all the advantages of both data federation and data integration in a single toolset?
If you could have your cake and eat it, too, you might be able to apply ETL and data quality transformations to real-time data obtained through federation tools. You wouldn’t have to create two separate semantic models, one for data federation and another ETL; you could use one model to represent both modalities. Basically, you would have one tool instead of two tools. This would make it easier, quicker, and cheaper to apply both data federation and data integration capabilities to any data challenge you might encounter.
This seamless combination is the goal of some data integration vendors. I recently did a Webcast with Informatica, which shipped a native data federation capability this year that runs on the same platform as its ETL tools. This is certainly a step forward for data integration teams that want a single, multipurpose environment instead of multiple, independent tools, each with their own architecture, metadata, semantic model, and functionality.
Posted by Wayne Eckerson on June 15, 2010