The Many Uses of Data Federation
Data federation is now regarded as a viable, necessary, and complementary DM practice, tapped alongside ETL to support an array of data integration scenarios.
- By Stephen Swoyer
- March 17, 2010
When Composite Software Inc., IBM Corp. and other players first started promoting data federation a decade ago, some data management (DM) experts cried foul. The temptation to tap data federation technology in what DM pros deemed a kludgey manner -- as an instant-enabling technology for data mart applications, for example, or as a quick-and-dirty replacement for a full-fledged enterprise data warehouse (EDW) -- would simply be too great, they cautioned.
Seven years later, the worst-case scenarios of DM pros haven't come to pass. More importantly, federation is now regarded as a viable, necessary, and typically complementary DM practice, tapped alongside ETL (or ELT) to support an increasingly complex array of data integration (DI) scenarios.
In the meantime, DM pros seem to have warmed to data federation's strong suits -- chiefly as an enabling technology in frequently-refreshed reporting, data warehouse modeling, or other rapid time-to-value scenarios.
"It makes a lot of sense to use data federation tools when it takes too long or costs too much to create a persistent store of consolidated data, such as a data warehouse or data mart," writes Wayne Eckerson, director of research with The Data Warehousing Institute (TDWI), in his recent TDWI Checklist Report: Data Integration. Eckerson also praises data federation's ability to facilitate on-the-fly access to heterogeneous data sources.
At the same time, Eckerson stresses, the data federation tools of today are considerably more powerful than their predecessors. Consequently, they're increasingly used in a number of non-traditional (or not-strictly-DM) scenarios.
"[D]ata federation tools have broadened their capabilities and appeal and go by many labels, including data virtualization, data services, and distributed query," he writes. "They are used in a variety of situations that require unified access to data in multiple systems via high-performance distributed queries, such as data warehousing, reporting, dashboards, mashups, portals, master data management, data services in a service-oriented architecture [SOA], post-
acquisition systems integration, and cloud computing."
From a data management perspective, data federation offers a number of compelling advantages. The trick, Eckerson says, is to use it when and where it makes sense. "Data federation software is designed for query-based applications that require current data from multiple systems," he writes. "It's ideal when the business needs a solution fast, doesn't have a sizable budget for infrastructure and staffing, and wants to minimize the risk involved in deploying a new solution."
In this respect, he explains, it's a big improvement over the DM status quo.
"The traditional way to build query-based applications is to create a data warehouse or data mart," Eckerson writes. "However, creating a new data mart … takes at least three months. The process involves understanding user requirements, creating target data models, and building and testing ETL transformations as well as purchasing, deploying, and testing server hardware and database software." Federation can be a godsend in such scenarios. Its strongest selling point, proponents claim, is a rapid time-to-value.
"Data federation can minimize the risks, costs, and time needed to deliver query-based solutions because it doesn't require a lot of upfront coding and doesn't need an additional database to store source data," he explains. "All you do is install the data federation development and runtime software on an industry standard server, create the views and services that form the global semantic layer, and tune the major queries. There are no ETL programs to develop or staging areas, data warehouses, or data marts to instantiate."
Nevertheless, Eckerson stresses, data federation is not a silver bullet.
"Data federation software is ideal when source systems are consistently available and have enough capacity to handle streams of ad hoc queries without slowing down transaction processing tasks," he counsels. "Also, it's best if source data doesn't require significant transformation or cleansing, and if the business application consumes mostly current data in relational or XML formats."
For this reason, federation isn't suitable for applications that involve very large data sets or which require complex transformations. Nor does it make sense in situations where a DM infrastructure is already overloaded.
Instead, Eckerson and TDWI outline a quartet of new or compelling uses for data federation, including using it as a means to extend a DW with additional (external) sources. Elsewhere, federation can be tapped as a tool to help augment existing ETL processes and accelerate ETL development; create a single (virtual view) of multiple data warehouses (a scenario that's particularly compelling in M&A scenarios); and, lastly, as an interim solution to help facilitate data warehouse migration.
"If you need to migrate or replace a data warehouse, data federation can help minimize the impact on downstream reports, reducing costs and risks," Eckerson writes. "To migrate the data warehouse, you need to rewrite report queries to run against data federation software instead of directly against the old data warehouse. Once the new warehouse is in place, you modify the semantic layer in the data federation tool to point at the new source. This insulates your reports and queries from source system changes.
Eckerson's report discusses a number of additional federation pros and cons. It's available as a free download at http://tdwi.org/research/2009/11/data-federation/asset.aspx.