Data Integration and Data Warehousing Defined
To help you make your way through the many powerful case studies and “lessons from the experts” articles in What Works in Data Integration, we have arranged them into specific categories: data governance, data integration, data management, and data warehousing. What do these terms mean, and how do they apply to your organization?
Data governance is usually manifested as an executive-level data governance board, committee, or other organizational structure that creates and enforces policies and procedures for the business use and technical management of data across the organization. Common goals of data governance are to improve data’s quality; remediate its inconsistencies; share it broadly; leverage its aggregate for competitive advantage; manage change relative to data usage; and comply with internal and external regulations and standards for data usage. In a nutshell, data governance is an organizational structure that oversees the broad use and usability of data as an enterprise asset.
Data integration (DI) is a family of techniques and best practices that repurpose data by transforming it as it’s moved. ETL (extract, transform, and load) is the most common form of DI found in data warehousing. There are other techniques, including data federation, database replication, data synchronization, and so on. Solutions based on these techniques may be hand coded, based on a vendor’s tool, or a mix of both. DI breaks into two broad practice areas. Analytic DI supports business intelligence (BI) and data warehousing (DW), and operational DI is applied outside BI/DW to the migration, consolidation, and synchronization of operational databases, as well as in exchanging data in a business-to-business context.
Data management (DM) and information management, a synonym, are broad terms that encompass several data-oriented technical disciplines, such as data integration, data quality, master data management, data architecture, database administration, metadata management, and so on. DM may also include practices that rely heavily on DM, such as business intelligence, data warehousing, and data governance. By extension, enterprise data management (EDM) is a high-level practice that seeks to coordinate DM disciplines, align them with business-oriented goals, and give them consistency and quality through shared data standards and policies for data usage. Synonyms for EDM include unified data management (UDM) and enterprise information management (EIM).
At the highest level, designing a data warehouse involves creating, manipulating, and mapping models. These models are conceptual, logical, and physical (data) representations of the business and end-user information needs. Some models already exist in source systems and must be reverse engineered. Other models, such as those defining the data warehouse, are created from scratch. Creating a data warehouse requires designers to map data between source and target models, capturing the details of the transformation in a metadata repository. Tools that support these various modeling, mapping, and documentation activities are known as data warehouse design tools.
This article originally appeared in the issue of .