Question and Answer: Data Quality Best Practices
Analysis of your customer data is only as good as the quality of the data you’re working with. We look at best practices for one-time cleaning and ongoing data maintenance.
- By James E. Powell
- May 13, 2009
Conclusions you draw from your customer data analysis are only as good as the quality of the data you’re working with. Getting data clean (and keeping it that way) is no easy task; we look at what’s involved, explain the role of governance, discuss who’s responsible for data quality, and how you can measure the effectiveness of your data-governance and data quality initiatives. We look at best practices for one-time cleaning and ongoing data maintenance.
To learn more, we spoke with Katherine Hamilton, director of product marketing, Enterprise Business Solutions for Pitney Bowes Business Insight (http://www.g1.com).
BI This Week: According to analyst reports, the volume of enterprise data doubles every 18 months. What best practices can businesses implement to ensure the quality of their data (e.g., reduce duplicate data and inaccurate information)?
Katherine Hamilton: Data should be viewed as a corporate asset. It has measureable value that is integral to achieving strategic objectives and gaining a competitive edge. However, for your data to really be an asset, it must be used while still fresh. To be consistently used, it needs to be complete and regularly refreshed. There are three basic steps to ensure data quality.
First, profile your data. You wouldn’t buy a house without first having it inspected. A qualified inspector will look at the foundation and identify other building flaws that could create a problem in the future. You want the same kind of information about your corporate data. Virtually all data quality profiler tools will provide counts on the percentages of fields that are populated, but for real insight you need to be able to view key data values as well. For example, are there numbers or symbols in fields where only text is appropriate? How many of your unique identifiers (customer number, account number, etc.) are not unique? This information can help you identify outliers, anomalies, and other questionable data points and direct you to your organization’s larger data quality issue.
Next, embrace these four actions for data cleansing.
1. Format fields. You want consistent terms and formats across a given field. If you are relying on human beings to input the data (and you are), this is not happening automatically.
2. Parse components. Break down strings of data into multiple fields so you can more effectively standardize data elements with greater accuracy.
3. Check content. Some records include accurate information that is embedded in the wrong fields. Other fields may appear populated but are not accurate (for example, a phone number field that looks like: “111-111-1111”). Your data-cleansing process should identify and correct these anomalies so your data is fit for use.
4. Eliminate duplicates. Identify matches and eliminate duplicate records. Once your data is standardized, this can be done with a high degree of confidence.
The third step to data quality is to remember you must perform ongoing data maintenance. Data cleansing is not a one-time operation. Even the best data gets stale. (Consider that 11 percent of the U.S. population moves annually.) Also, customers have unprecedented access to their data: online, over the phone, through the mail, in the store or branch, so opportunities for changes and mistakes added to your system are immense.
We strongly recommend an ongoing maintenance program that includes both batch and real-time maintenance. In batch, run your data through the data cleansing steps regularly and you will be able to correct issues as they arise. Annual data cleansing is a minimal effort. A best practice is to perform this task at least quarterly. Pair batch with real-time data quality applications that validate data as it is entered, essentially serving as a “data quality firewall.”
Both processes have advantages that complement one another. Managing data quality at the point of entry requires speed and reliability on a transactional basis; batch processes allow for more thorough and complete cleansing. The data quality platform you choose should support both processes.
What role should data quality and governance play in the enterprise, and how are data quality and governance related?
If you consider data quality a corporate asset, then you will want to use it wisely. That is the role of data governance and it is critical. Simply put, data governance is the collection of a corporation’s policies and practices that are essential to keeping data healthy and secure. These guidelines should be made in writing and must be easy to understand. They should also be reviewed regularly to ensure they still meet the needs of the business over time. Data governance practices will determine the business rules to which your data quality solution will align.
Who should be responsible for an organization's data quality? Why?
Frankly, it is everyone’s job to ensure data quality. For most companies, the responsibility lies within the data governance committee. The commitment to sound data quality and security practices must begin at the top of the organization and include stakeholders at every level. Best practices demonstrate that most data governance committees represent exactly this mix.
Data governance is an ongoing commitment. As company needs change, their data governance policies must be reviewed to ensure alignment. On a day-to-day basis, many organizations are embracing the role of the data steward. This role began in the IT arena, but trends indicate that this role is branching out as the level of accountability entrusted to them increases. It is largely the data steward (also a member of the data governance team) who will determine the business rules for a company’s data quality platform.
It’s tough to sell data quality initiatives to management. After all, it's an expense that doesn't have an associated revenue stream. How can IT or business users (or both) make a successful pitch for such a project?
Our experience had been that it was usually an “event” that uncovered the desperate need for a data quality solution. Perhaps a company had invested millions in a CRM system and was disappointed with the results. The CRM solution did everything as promised, but the data that it processed was incomplete, outdated, and laden with duplicate records. In that scenario, the ROI was immediate and obvious, but what may be a bit harder to measure were the secondary and tertiary benefits. If you are a financial institution and one of your long-time, high-net-worth clients receives a prospectus asking him to open an account, does that instill trust or frustration?
What we are seeing now are a few trends that have created a more accepting environment among management.
- Companies and government agencies are striving to be customer-centric.
- Many companies, and certainly government agencies, see data quality solutions as a great way to detect fraud before it happens.
- Many companies are embarking upon master data management (MDM) initiatives. Their success relies completely upon clean data. After all, why bother to create a master file if the data is poor? PBBI is not an MDM vendor, but we work closely with them to ensure their customers yield the promised ROI.
- Today’s advanced and readily available technologies provide easy, flexible, and very affordable solutions.
How can businesses measure the effectiveness of their current data-governance initiatives?
A properly deployed data governance initiative will include tools that effectively measure the initiative and report summaries while providing alerts when conditions exceed thresholds or don’t meet minimum standards.
The key is determining what to measure and discerning the threshold values. When an alert is issued, you then have the ability to understand where in the process the conditions went wrong to trip the alert. This requires a feedback loop after the initial measurement.
Both policies and tools are needed to discover where the problem occurred. For example, profiling tools can aid in discovering the source of bad information. Once the source is identified, the issue is then taken into review with the policy to determine how to address it. Typically there are three basic options:
- Fix the source of the problem (it could be personnel, training, front-end software tools, or the legacy database)
- Change the policy and therefore the threshold value for an alert
- Change the policy and do not measure that particular circumstance.
Additionally, an effective data governance initiative with monitoring as described will aid in the overall ROI for the initiative.
What common mistakes do businesses make when managing their data and what steps can they take to avoid them?
The biggest mistake is often made by assuming that a sound data quality initiative is a software issue. The reality is, as I mentioned earlier, it requires a commitment from the most senior ranks because it can only succeed with a strong data governance program.
Other mistakes include investing in expensive BI, CRM, or ERP solutions without cleaning the data. When these vendors provide an ROI for their solutions, their assumption is that your data is clean. If it isn’t, you will never achieve the promised ROI.
Another scenario that is quite common is allowing IT to be the final decision maker regarding the data quality platform purchased. The end user must be considered. If using the solution is too hard, or requires a great deal of training, it won’t be as effective in keeping data clean.
What business goals can data-quality initiatives support and what benefits can businesses realize by actively managing their data?
This is my list:
In a Gartner survey, 28 percent of companies responding said they have deployed SaaS-based data integration tools, and 24 percent had implemented SaaS-based data quality tools. What are the benefits of implementing a SaaS-based quality solution?
- Master data management
- Post merger/acquisition customer integration/Single view of the customer
- Stronger CRM, ERP, or BI results
- Access to data to ensure informed decision-making
- Reduced operational costs
- More cost-effective customer communication
- Better customer service
- Stronger customer relationships
SaaS is an effective option. Among our customers, we typically see this to be the preferred option when the client wants to get up and running immediately. It is also far more cost-effective in a situation whereby the application is for a specific department--for example, the circulation department of a magazine.
We do see among some customers a resistance to SaaS. If the resistance stems from a reluctance to abandon their existing investments, it is worth noting that there are strategies for embracing SaaS solutions onto a legacy capability--thus essentially enabling the organization to retain its prior investment while modernizing capabilities in a cost-efficient manner.
What challenges do companies face managing data and how can better data management practices impact the power of other enterprise applications, including business intelligence, CRM, ERP, etc?
As I said, a successful data quality endeavor requires commitment at the highest management levels and needs to be supported with a data governance board. As guidelines are put forth, oftentimes changes in business process are required. These kinds of changes and “ownership” issues are perhaps the most difficult to navigate.
Another misconception is the importance of data profiling. Just like data cleansing, this needs to be done continually. Strict monitoring must also be enforced. Software tools can assist in this, but it, too, is ultimately a management/business process issue.
From a technical point of view, many are challenged by data integration or data federation requirements. There many options including those offered by PBBI.
What products or services does Pitney Bowes Business Insight offer in the areas of data quality and data governance?
PBBI built the Customer Data Quality Platform (CDQP) specifically to meet the “ease-of-use” needs of the data steward and the business user. Built on service-oriented architecture (SOA), IT believes it was designed for them. The solution has many unique benefits. It is built modularly, so you buy only the functionality you need. It is unsurpassed in its data cleansing, matching, consolidation and data governance capabilities. Other modules can give you greater insight into your customers, such as geospatial and location-based data. If your enterprise can benefit from a more streamlined operation, better customer insight, and stronger customer relationships, we can help.