TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Analysis: Data Quality is Job Number 1

BI tools are only as good as the quality of the data they work with. Analyst Michael Schiff is still surprised at how many BI professionals still ignore this fact.

By Mike Schiff
January 28, 2009

Much attention has been focused recently on integrating data from multiple sources to populate data warehouses or data marts for analysis purposes or as part of a migration effort for new enterprise applications. For example, a recent press release from a well-established business intelligence vendor highlights the ability of its BI platform to access multiple data sources where they reside without first having to move the data into a data warehouse or a data mart.

Rather than dwelling on the tradeoffs (perhaps a topic for future analysis) among centralized data warehouses, federated databases, enterprise information integration (EII), or even the old concept of virtual data warehouses, I would like to point out a common property they share: the quality of the information obtained from any of them is directly dependent on the quality of the data they access or contain. In other words, GIGO (garbage in, garbage out) always has applied (and always will apply) to both analytical and operational systems.

We all recognize this, yet I continue to be amazed as to how often we ignore the fact. In most cases, it's not deliberate; rather it results from taking at face value preliminary assumptions about data quality rather than using techniques such as data profiling to validate the assumptions.

When integrating data from multiple sources, the data quality at each source may be accurate and consistent but the sources may be inconsistent among themselves. Consider a simple example: inventory quantities from one data source are recorded in different units-of-measure from the others (for example dozens versus pieces). A company I once worked at was about to place an order for a large amount of optical cable until they discovered, at the last minute, that the quantity it thought it had on-hand consisted of reels, rather than feet; in this case, different inventory systems had implemented different units of measure.

A more complex example involves different code sets so that the vendor number or customer number in one database is inconsistent with the codes used to represent a vendor or customer in another. Worse yet is using the same customer number to represent different customers in two different systems without users being aware that the numbers in each system were assigned independently. This is an example of the classic issue of "we understand the data in our department (or division); it is the data we receive from the other departments (or divisions) that doesn't make any sense." In fact, consistency across departments and divisions is one of the drivers for master data management.

One of the advantages of consolidating data from multiple sources into a data warehouse is that inconsistencies among sources quickly become apparent (hopefully before the data warehouse goes "live") and remedial steps can be taken to ensure proper data transformation (or reformatting of the source data itself) takes place to correct the problem and prevent it from occurring in future loads. For example, it is certainly possible to define a set of corporate code sets and value lists for the data warehouse and then convert data from each source to these corporate values when loading the data warehouse. Although this may be obvious to BI professionals, it was one of the selling points for data warehousing when it was first popularized in the late 1980s and early 1990s. Unfortunately, this lesson is now sometimes forgotten, especially when organizations try to directly access data from multiple sources rather than first cleansing it and then loading it into a data warehouse.

There are several reasons for this, not the least of which is the spread of business intelligence to business users for use with a particular operational system. In these operational BI implementations, the users are typically very familiar with the data contained in "their own" operational system and, as long as they are using it as their sole data source, they usually avoid consistency problems. However, when they try to incorporate data from multiple operational systems into their analyses, the inconsistencies cause problems.

Both BI practitioners and IT professionals must recognize and, together with their business users, take steps (such as those associated with a data governance program) to avoid these problems. Although BI tools are now easier to use than ever before and make it possible for business users to graphically link and consolidate data from multiple heterogeneous sources, these tools do not ensure that the underlying data sources are consistent. The same potential for data consistency problems also exists with composite applications and mashups. As BI becomes more pervasive within our organizations, we may be doing our business users a disservice if the underlying data that they can now analyze is not accurate and consistent.

About the Author

Michael A. Schiff is founder and principal analyst of MAS Strategies, which specializes in formulating effective data warehousing strategies. With more than four decades of industry experience as a developer, user, consultant, vendor, and industry analyst, Mike is an expert in developing, marketing, and implementing solutions that transform operational data into useful decision-enabling information.

His prior experience as an IT director and systems and programming manager provide him with a thorough understanding of the technical, business, and political issues that must be addressed for any successful implementation. With Bachelor and Master of Science degrees from MIT's Sloan School of Management and as a certified financial planner, Mike can address both the technical and financial aspects of data warehousing and business intelligence.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Analysis: Data Quality is Job Number 1

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research