By using tdwi.org website you agree to our use of cookies as described in our cookie policy. Learn More

RESEARCH & RESOURCES

Data Integration and Data Warehousing Defined

To help you make your way throughthe many powerful case studiesand “lessons from the experts”articles in What Works in DataIntegration, we have arrangedthem into specific categories:data integration, data quality,master data management,open source data integration, anddata warehousing. What do theseterms mean, and how do theyapply to your organization?

To help you make your way throughthe many powerful case studiesand “lessons from the experts”articles in What Works in DataIntegration, we have arrangedthem into specific categories:data integration, data quality,master data management,open source data integration, anddata warehousing. What do theseterms mean, and how do theyapply to your organization?
Data Integration

Fundamentally, data warehousing is an exercisein data integration. A data warehouseattempts to reintegrate data for analytic purposesthat organizations have maintainedin multiple, heterogeneous systems. Pullingtogether and reconciling dispersed data is adifficult task. Data needs to be accessed andextracted, moved and loaded, validated andcleansed, and standardized and transformed.Data integration tools support all these processesand make it possible to execute therules created by developers in the designphase of data warehousing.

Data Quality

Data quality is a complex concept that encompassesmany data management techniquesand business quality practices, applied repeatedlyover time as the state of quality evolves,to achieve levels of quality that vary per datatype and seldom aspire to perfection. Themost common technique is name-and-addresscleansing, whereas the least common is theinternationalization of data for quality purposes.Between these two extremes are numerousdata quality techniques, including data standardization,verification, profiling, monitoring,matching, merging, householding, geocoding,postal standards, enrichment, and so on.

Master Data Management

Master data management is the practice ofdefining and maintaining consistent definitionsof business entities, then sharing themvia integration techniques across multiple ITsystems within an enterprise and sometimesbeyond to partnering companies or customers.Many technical users consider MDM to be anintegration practice, enabled by integrationtools and techniques for ETL, EAI, EII, andreplication. When the system of record is ahub that connects many diverse systems, multipleintegration technologies may be required,including newer ones like Web services andservice-oriented architecture (SOA). Moresimply put: MDM is the practice of acquiring,improving, and sharing master data.

Open Source DataIntegration

Open source data integration tools are, for themost part, like other data integration tools interms of functionality. One difference is that,being based on open source code, the functionalityis developed primarily by a softwarevendor, but augmented by the efforts of adeveloper community. Another difference isthat most are not licensed to users per se;instead, most users download and use theopen source tool at no charge, then pay aminimal charge for support and maintenancelater, if they choose to deploy a solution. Opensource data integration tools are known fortheir low cost, ease of customization, quickprocurement via download, and ease ofembedding within various applications.

Data Warehousing

At the highest level, designing a data warehouseinvolves creating, manipulating,and mapping models. These models areconceptual, logical, and physical (data) representationsof the business and end-userinformation needs. Some models already existin source systems and must be reverse engineered.Other models, such as those definingthe data warehouse, are created from scratch.Creating a data warehouse requires designersto map data between source and targetmodels, capturing the details of the transformationin a metadata repository. Tools thatsupport these various modeling, mapping, anddocumentation activities are known as datawarehouse design tools.


Next

Previous

Back to Table of Contents

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.