TDWI FlashPoint Newsletter TDWI FlashPoint Newsletter
RELATED BI RESOURCES
EDUCATION

TDWI Data Warehousing Concepts and Principles: An Introduction to the Field of Data Warehousing
Las Vegas World Conference

TDWI Business Intelligence Fundamentals: From Data Warehousing to Business Impact
Las Vegas World Conference

TDWI Data Modeling: Data Analysis and Design for BI and Data Warehousing Systems
Las Vegas World Conference

WHITE PAPERS

Choosing BI: Selecting the Right Business Intelligence Software

Creating a Highly Available Database Solution the Easy Way
WEBINARS
Data Quality White 

Paper Analytic Workloads: Which Data Warehouse Architecture Is Right for You?
Presented by Philip Russom
Event date: November 29, 2011

Data Quality White 
Paper Developing a Data Quality Strategy
Presented by Jonathan Geiger
Event date: December 1, 2011
ABOUT TDWI EXPERTS

TDWI Experts is a twice-monthly e-newsletter where BI/DW thought leaders share opinions and commentary about relevant industry topics and the latest technologies.

Article Image
Feature

November 17, 2011

Information Integration: The Whole May be Greater than the Sum of its Parts

Michael A. Schiff
Principal Consultant, MAS Strategies

Topic: Data Integration

When you make a purchase from a Web site, the vendor will often suggest additional items, verify credit card or PayPal accounts and sometimes even credit ratings, check and decrement its inventory, update the customer's purchase history, and perform other tasks that interface with other systems that utilize a variety of databases. When analyzing operations or making forecasts, an organization's efforts frequently involve using a data warehouse where data was collected from several sources.

These examples rely on data integration to pull the necessary pieces together. Data integration can provide numerous benefits in both operational and analytic environments. For example, it is the enabling technology for obtaining an accurate and complete view of a customer or a product while making it possible for organizations to combine data from disparate sources so it can be analyzed in their data warehouses.

We often find that when we try to obtain a "360-degree view" of a business entity (such as customer or product) or when we process a transaction, we need to combine data from multiple sources both inside and outside our organizations. A decade ago this might have involved collecting data from our operational systems and augmenting it with additional data such as credit ratings, DUNS number, demographic (e.g., gender, income, age) data, and lifestyle or psychographic data obtained from a variety of commercial and governmental sources.

Today we seek to integrate additional sources such as call-center conversations, instant messages, email messages, geo-coded location data, blogs, Facebook postings, and tweets. However, if we miss a critical piece of data (for example, that a customer placing a large order has not yet paid for its last three orders and has, in fact, just filed a lawsuit against our company), we might miss a potential problem. Imagine data mining health-care databases to identify possible causes and cures for a patient's symptoms that did not contain data about a recently discovered "miracle cure."

Other Examples

These examples are customer-centric, integrating data from multiple sources is desirable for almost all subject areas. For example, sales figures, product recall data, complaints (and praise), third-party reviews, bills-of- materials, inventory levels, and supplier performance are valuable when manufacturing and marketing an organization's products.

Many of our organizations focus on "bet-your-business" applications; Homeland Security is concerned with "bet- your-borders" applications that may involve integrating information such as travel histories, telephone call data, affinity group memberships, facial recognition data, known associates, and perhaps even the results of mining communications for suspicious keywords to identify potential threats. I can cite examples from many other industries or functional areas; indeed, it would be very difficult to find one that does not rely on information integration.

Sponsored Links

Additional Benefits

Many organizations still have islands of information resulting from non-integrated operational systems, especially if they purchased commercial enterprise applications for some applications while still supporting internally developed transaction systems for others. One of the basic goals of data warehousing is to collect data from multiple heterogeneous sources so that they can be aggregated and analyzed. However, the chance that every data warehouse source system utilizes the same code sets and value lists or even the same entity definitions is unlikely, even if the organization has matured to the point of developing strong corporate data definitions.

More often than not, these organizations don't have the inclination or the resources to modify their existing (and sometimes legacy) operational systems to conform to corporate standards that were established after these systems were implemented. Even if an organization actually achieves this somewhat elusive goal, the next time the organization implements another third-party application system or acquires another company, it will also acquire data definitions and value lists that probably don't conform to its corporate standards. Fortunately, one of the benefits of a data warehouse is its ability to integrate and reconcile data from these disparate systems by transforming their data to conform to the corporate standards when loaded into the data warehouse.

A Word of Caution

Data cleansing is (or should be!) a primary component of any data integration effort when populating a centralized data warehouse. However, this process is sometimes overlooked when integrating data for operational purposes and federated data warehouse efforts. Just because two databases may use the same field name for a key, it does not necessarily follow that the key represents the same entity identifier or that, for example, vendor number 12345 in one data source represents the same vendor as vendor number 12345 in another. Furthermore, numeric fields in one data source may be expressed in a different unit of measure than in another. It is imperative that any data integration efforts ensure that bushels of apples are not being added sacks of oranges.

The Bottom Line

Geometry teaches us that the whole is equal to the sum of its parts. However, in many operational and analytic systems, we must recognize that when we truly obtain all the data we require, the value we achieve is much greater than the sum of the individual pieces. Information integration allows the whole to be greater than the sum of its parts, but only if all of the relevant data sources are included and the data being integrated is consistent and accurate.

Michael A. Schiff is a principal consultant for MAS Strategies. He can be reached at mschiff@mas- strategies.com.

Copyright 2011. TDWI. All rights reserved.



TDWI Membership TDWI Membership TDWI Membership This message has been sent to: [-EMAILADDR-]
TDWI will periodically send you information via e-mail about related products and services. If you do not wish to receive these types of e-mails, use our preference page:
https://newsletters.1105pubs.com/nl/BTGf.do?e=[-EMAILADDR-]

To view our privacy policy, visit: http://www.1105media.com/privacy.html

TDWI, 1201 Monster Road SW, Suite 250, Renton, WA 98057 TDWI Info