LESSON - Data Quality Monitoring: The Basis for Ongoing Information Quality Management
- By Todd Goldman
- May 8, 2007
By Tom Golden, Director of Corporate Communications (Data Quality), Informatica Corporation
The key elements of a good data quality program include establishing a baseline, continuous improvement, appropriate metrics, and scorecarding.
Establishing a Baseline
The first step is establishing a baseline of the current state of data quality. This should identify the critical failure points and determine improvement targets. The targets must be tied to business objectives.
Data quality must be tracked, managed, and monitored if it is to improve business efficiency and transparency. Therefore, being able to measure and monitor data quality throughout the lifecycle and compare the results over time is an essential ingredient in the proactive management of ongoing data quality improvement and data governance.
Organizations need a formalized way of setting targets, measuring conformance to those targets, and effectively communicating tangible data quality metrics to senior management and data owners. Standard metrics provide everyone (executives, IT, and line-of-business managers) with a unified view of data and data quality, and can also provide the basis for regulatory reporting in certain circumstances, such as Basel II, where there are specific data quality reporting requirements.
Metrics to Suit the Job
Ultimately, data quality monitoring and reporting based on a well-understood set of metrics provides important knowledge about the value of the data in use, and empowers knowledge workers with the ability to determine how the data can best be used to meet their own business needs.
The critical attributes of data quality (completeness, conformity, consistency, accuracy, duplication, and integrity) should map to specific business requirements. Duplicate records in a data warehouse, for example, make it difficult to analyze customer habits and segment customers in terms of market. Inaccurate data results in poor targeting, budgeting, staffing, unreliable financial projections, and so on. (The Informatica white paper, “Monitoring Data Quality Performance Using Data Quality Metrics,” outlines a more comprehensive list of metrics and examples.)
A well-defined set of metrics should be used to get a baseline understanding of the levels of data quality; this baseline should be used to build a business case to justify investment in data quality. Beyond that, the same metrics become central to the ongoing data quality process, enabling business users and data stewards to track progress and quickly identify problem areas that need to be addressed.
Figure 1. The critical attributes of data quality should map to specific business requirements.
Breaking down data issues into these key measures highlights where best to focus your data quality improvement efforts by identifying the most important data quality issues and attributes based on the lifecycle stage of your different projects. For example, early in a data migration, the focus may be on completeness of key master data fields, whereas the implementation of an e-banking system may require greater concern with accuracy during individual authentication.
Inherent in the metrics-driven approach is the ability to aggregate company-wide results into data quality scorecards. A scorecard is the key visual aid that helps to drive the data quality process in the right direction, empowering data analysts to set accurate and focused quality targets and to define improvement processes accordingly, including setting priorities for data quality improvement in upstream information systems.
Metrics and scorecards that report on data quality, audited and monitored at multiple points across the enterprise, help to ensure data quality is managed in accordance with real business requirements. They provide both the carrot and the stick to support ownership, responsibility, and accountability. But, beyond the data quality function itself, the metrics used for monitoring the quality of data can actually roll up into higher-level performance indicators for the business as a whole.
This article originally appeared in the issue of .
Todd Goldman is the vice president of marketing for Palo Alto, California-based Infoworks.io, a software company that automates the data engineering for BI and machine learning data analytics for companies worldwide. You can contact the company via email.