Data Integration in a Nutshell: Four Essential Guidelines
These four guidelines can shape how you fundamentally think of DI and how you measure the quality, modernity, and maintainability of DI solutions.
- By Philip Russom, Ph.D.
- March 24, 2010
I really hate déjà vu. Its feeling of repetition seems non-productive and time-consuming to me. That’s why I’m uncomfortable when I find myself explaining the same high-level points about data integration (DI) over and over to different people. I realize that I need to have conversations that define terms or get me and the other person onto the same page before we dive into the meat of the matter, but I can’t help feeling that the points I’m making about DI should be more self-evident to more people. I’m frustrated that many otherwise brilliant people still cling to a 1990s vision of DI.
I’ve compiled a list of four points that keep coming up in conversations, interviews, and consulting about data integration. I think of these points as guidelines in a nutshell that can shape how you fundamentally think of DI, as well as how you measure the quality, modernity, and maintainability of DI solutions.
At the risk of being self-serving, I’m hoping more people will absorb my view of DI, so I suffer fewer non-productive déjà vu moments. In a more altruistic spirit, I also hope these nutshell guidelines can help DI specialists and the people who work with them to see a more future-facing vision of what DI can and should be.
Guideline #1: Data integration is a family of techniques and best practices
The unfortunate knee-jerk reaction of many data warehouse professionals is that the term data integration is synonymous with ETL (extract, transform, and load) simply because ETL is the most common form of data integration found in data warehousing. However, there are other techniques (and best practices to go with them), including data federation, database replication, and data synchronization. Different techniques have different capabilities and prominent use cases, so it behooves a data integration specialist to know and apply them all.
Guideline #2: Data integration practices reach across both analytics and operations
In Analytic DI, one or more DI techniques are applied in the context of business intelligence (BI) or data warehousing (DW). Operational DI applies DI techniques outside BI/DW, typically for the migration or consolidation of operational databases, synchronizing operational databases, or exchanging data in a business-to-business context. Analytic DI and operational DI are both growing practice areas, and both are progressively staffed from a common competency center or similar organization.
Guideline #3: Data integration is an autonomous data management practice
In some old-fashioned organizations, DI is considered a mere subset of DW. It can be that, but it can also be independent. For example, the existence of operational DI proves DI’s independence from DW. Furthermore, hundreds of DI competency centers have sprung up in the last ten years or so as a shared-service organization for staffing all DI work -- not just DI for DW.
Guideline #4: A data integration solution should have architecture
After all, other types of IT solutions have architecture. DI architecture helps you with DI development standards, the reuse of DI objects, and the maintenance of solutions. The preferred architecture among integration technologies -- whether for data or application integration -- is the hub-and-spoke. For this reason, most DI tools today lend themselves to hub-and-spoke. However, there are many variations of it, so you need to actively design an architecture for your DI solutions.
Philip Russom is senior manager of TDWI Research. Philip can be reached at firstname.lastname@example.org .
Data Integration for Data Warehousing and Data Migrations
March 29, 2010
Speaker: Philip Russom
The Growing Practice of Operational Data Integration
April 14, 2010
Speaker: Philip Russom