TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Architecting a Modern Martech Stack for Speed, Scale, and AI Readiness August 26, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

August 14, 2008: TDWI FlashPoint - Building and Using a Data Quality Scorecard

A data quality scorecard is the centerpiece of any data quality management program.

August 14, 2008

Welcome to TDWI FlashPoint. In this issue, Arkady Maydanchik discusses the benefits of building and using a data quality scorecard.

FlashPoint Snapshot
Best Practices in Operational BI: Converging Analytical and Operational Processes
Column
Building and Using a Data Quality Scorecard
FlashPoint Rx
Don't Rely on Source System Data Accuracy

FlashPoint Snapshot

FlashPoint Snapshots highlight key findings from TDWI's wide variety of research.

What is the status of your operational BI environment?

Based on 423 respondents.

Source: Best Practices in Operational BI: Converging Analytical and Operational Processes (TDWI Best Practices Report, Q3, 2007). Access the report.

Top

Building and Using a Data Quality Scorecard

Arkady Maydanchik, Data Quality Group

Data Quality Scorecard Defined

A data quality scorecard is the centerpiece of any data quality management program. It provides comprehensive information about the quality of data in a database and allows both aggregated analysis and detailed drill-downs. A well-designed data quality scorecard is the key to understanding how well the data supports various reports, analytical and operational processes, and data-driven projects. It is also critical for making good decisions about data quality improvement initiatives.

A common misconception is that the objective of a data quality assessment project is to produce error reports. Such a view significantly diminishes the ROI of assessment initiatives. Project teams spend months designing, implementing, and fine-tuning data quality rules; they build neat rule catalogues and produce extensive error reports. Without a data quality scorecard, however, all they have are raw materials and no value-added product to justify further investment into data quality management. Indeed, no amount of firewood will make you warm in the winter unless you can make a decent fire. The main product of data quality assessment is the data quality scorecard!

The image below represents the data quality scorecard as an information pyramid. At the top level are aggregate scores, which are high-level measures of the data quality. Well-designed aggregate scores are goal driven. They allow us to evaluate data fitness for various purposes and to indicate the quality of various data collection processes. From the perspective of understanding the data quality and its impact on the business, aggregate scores are the key piece of data quality metadata. In the middle are score decompositions and error reports that allow us to analyze and summarize data quality across several dimensions and for different objectives. Let's consider these components in more detail.

Aggregate Scores

On the surface, the data quality scorecard is a collection of aggregate scores. Each score consolidates errors identified by the data quality rules into a single number-a percentage of good data records among all target data records. Aggregate scores help make sense out of the error reports produced in the course of data quality assessment. Without aggregate scores, error reports often discourage, rather than enable, data quality improvement.

Be careful when choosing which aggregate scores to measure. Scores that are not tied to a meaningful business objective are useless. For instance, a simple aggregate score for the entire database is usually rather meaningless. Suppose we know that 6.3% of all records in the database have some errors. So what? This number does not help me if I cannot say whether it is a good or bad value, and I cannot make any decisions based on this information.

On the other hand, consider an HR database that is used to calculate employee retirement benefits, among other things. If you can build an aggregate score that says 6.3% of all calculations are incorrect because of data quality problems, such a score is extremely valuable. You can use it to measure the annual cost of data quality to the business through its impact to a specific business process or to decide whether to initiate a data-cleansing project by estimating its ROI.

It is possible—and desirable—to build many different aggregate scores by selecting different groups of target data records. The most valuable scores measure data fitness for various business uses. These scores allow us to estimate the cost of bad data to the business, to evaluate potential ROI of data quality initiatives, and to set appropriate expectations for data-driven projects. In fact, if you define the objective of a data quality assessment project as calculating one or several of such scores, you will have a much easier time finding sponsors for your initiative.

Other important aggregate scores measure quality of data collection procedures. For example, scores based on the data origin will provide estimates of the quality of the data obtained from a particular data source or through a particular data interface. A similar concept involves measuring the quality of the data collected during a specific period of time. It is usually important to know if the data errors are historic or were introduced recently. The presence of recent errors indicates a greater need for data collection improvements. Such measurement can be accomplished by an aggregate score with constraints on the timestamps of the relevant records.

To summarize, analysis of the aggregate scores answers these key data quality questions:

What is the impact of the errors in your database on business processes?
What are the sources and causes of the errors in your database?
Where in the database can most of the errors be found?

Score Decompositions

The next layer in the data quality scorecard is composed of score decompositions, which show contributions of different components to the data quality. Score decompositions can be built along many dimensions, including data elements, data quality rules, subject populations, and record subsets.

For instance, in the previous example we may find that 6.3% of all calculations are incorrect. Decomposition may indicate that 80% of these errors are caused by a problem with the employee compensation data; in 15% of cases the reason is missing or incorrect employment history; and in 5% of cases the culprit is invalid birth date. This information can be used to prioritize a data cleansing initiative. Another score decomposition may indicate that more than 70% of errors are for employees from a specific subsidiary. This may suggest a need to improve data collection procedures in that subsidiary.

The level of detail obtained through score decompositions is sufficient to reveal the source of most data quality problems. However, if we want to investigate data quality further, more drill-downs are necessary. The next step would be to produce reports of individual errors that contribute to the score (or sub-score) tabulation. These reports can be filtered and sorted in various ways so that we can better understand the causes, nature, and magnitude of the data problems.

The bottom of the data quality scorecard pyramid represents reports showing the quality of individual records or subjects. These atomic-level reports identify records and subjects affected by errors and could estimate the probability that each data element is erroneous.

Summary

The data quality scorecard is a valuable analytical tool that allows us to measure the cost of bad data to the business and to estimate ROI of data quality improvement initiatives. Building and maintaining a dimensional time-dependent data quality scorecard must be one of the first priorities in any data quality management initiative.

Arkady Maydanchik is a recognized practitioner and educator in the field of data quality and information integration. A cofounder of Data Quality Group LLC, he is the author of Data Quality Assessment (Technics Publications LLC, 2007).

Top

FlashPoint Rx

FlashPoint Rx prescribes a "Mistake to Avoid" for business intelligence and data warehousing professionals from TDWI's Ten Mistakes to Avoid series.

Ten Mistakes to Avoid When Planning Your CDI/MDM Project

Mistake 4. Relying on Source System Data Accuracy

If you’ve had any involvement with your company’s enterprise data warehouse, you’ve probably encountered the challenge of operational system accountability: that is, convincing source system owners that it’s their job to address data quality.

CDI technologies allow the merging of content from multiple sources to create a master record about a customer. While any data quality tool can correct a customer address, it can’t identify and resolve duplicate or disparate records and reconcile them into one when subordinate attributes are different.

The quality of the master record is not dependent on the accuracy of the data from an individual source system, since the CDI or MDM technology can spot synonyms, duplicates, and errors in the source data. For instance, when an operational system has duplicate customer entries because of inconsistent descriptive detail (for instance, the customer goes by both “Bob” and “Robert,” or has different home addresses), it can selectively match other details to determine which descriptive attribute is best to include in the master record.

The good news about CDI is that the hub can identify unique customers without affecting the day-to-day development activities of operational system programmers. When the time comes and the operational system team decides to correct its data, it can leverage CDI to identify duplicate or disparate customer records.

This excerpt was pulled from the Q3 2006, TDWI Ten Mistakes to Avoid series, Ten Mistakes to Avoid When Planning Your CDI/MDM Project, by Jill Dyché and Evan Levy.

Top

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning