Data Integration Defined
To help you make your way through the many powerful case studies and “lessons from the experts” articles in What Works in Data Integration, we have arranged them into specific categories: general data integration, data governance, data warehouse appliances, data warehousing, and master data management (MDM). What do these terms mean, and how do they apply to your organization?
Fundamentally, data warehousing is an exercise in data integration. A data warehouse attempts to re-integrate data for analytic purposes that organizations have maintained in multiple, heterogeneous systems. Pulling together and reconciling dispersed data is adifficult task. Data needs to be accessed and extracted, moved and loaded, validated and cleaned, and standardized and transformed. Data integration tools support all these processes and make it possible to execute the rules created by developers in the design phase of data warehousing.
Data governance is usually manifested as an executive-level data governance board, committee,or other organizational structure that creates and enforces policies and procedures for the business use and technical management of data across the entire organization. Common goals of data governance are toimprove data’s quality; remediate its inconsistencies; share it broadly; leverage its aggregatefor competitive advantage; manage change relative to data usage; and comply with internal and external regulations and standards for data usage. In a nutshell, data governance is an organizational structure that oversees the broad use and usability of data as an enterprise asset.
Data Warehouse Appliances
A strict definition of data warehouse appliance is: “server hardware and database software built specifically to be a data warehouse platform.” A looser definition allows appliances to be hardware and software designed for any purpose, though bundled and pre-integrated for data warehousing. In a February 2007 TDWI Technology Survey, roughly half of respondents chose the strict definition, a quarter the loose one. However, the focus of data warehouse appliances is shifting from proprietary to commodity hardware, as well as more generally from hardware to software components. In fact, some of the newer data warehouse appliance vendors openly describe their products as software-based accelerators, not hardware boxes. When added to a user organization’s existing BI technology stack (or another vendor’s appliance), these accelerate BI development, and—once in place—they accelerate query performance.
At the highest level, designing a data warehouse involves creating, manipulating, and mapping models. These models are conceptual, logical, and physical (data) representationsof the business and end-useri nformation needs. Some models already exist in source systems and must be reverse engineered. Other models, such as those defining the data warehouse, are created from scratch. Creating a data warehouse requires designers to map data between source and target models, capturing the details of the transformation in a metadata repository. Tools that support these various modeling, mapping, and documentation activities are known as data warehouse design tools.
Master Data Management
Master data management is the practice of defining and maintaining consistent definitionsof business entities, then sharing them via integration techniques across multiple IT systems within an enterprise and sometimes beyond to partnering companies or customers. Many technical users consider MDM to be an integration practice, enabled by integration tools and techniques for ETL, EAI, EII, and replication. When the system of record is a hub that connects many diverse systems, multiple integration technologies may be required, including newer ones like Web services and service-oriented architecture (SOA). More simply put: MDM is the practice of acquiring, improving, and sharing master data.