By using tdwi.org website you agree to our use of cookies as described in our cookie policy. Learn More

RESEARCH & RESOURCES

5 Tips for Cleaning Your Dirty Data

With spring upon us, it's time diving into "spring cleaning" your data by following these top 5 tips for leveraging the value of your dirty data.

By Drew Rockwell, CEO, Lavastorm Analytics

Dirty data -- data that is incomplete, outdated, or contains errors -- is a major problem for businesses looking to optimize and measure their performance. According to TDWI, the unusable nature of dirty data costs organizations up to $600 billion each year.

The growing volume of data used and stored in organizations, along with an increasing number of disparate data stores, is leading to a critical dirty data problem. Spelling discrepancies, multiple account numbers, missing data, and data-value variations are much more likely to occur when a tremendous amount of data is spread out among multiple data stores. Along with costing a company billions, errors in data can also alienate customers and shareholders.

Given the growing impact of dirty data, gaining insight into operational and financial insight across business processes with the necessary business logic and ability to identify issues has never been more important. The problem is that most data analytic tools do not make complex analytic processes transparent and traceable.

With spring upon us, it's time for businesses to dive into "spring cleaning" their data by following these top 5 tips for leveraging the value of your dirty data.

Tip #1: Dedicate resources to maintaining data integrity

It takes more than one person to address all the data quality issues. Of course, having employees with statistical skills is important, and it is necessary to have a "data champion" with the knowledge to drive successful projects. However, good decisions require input from across the business, not only from a statistician. A lack of shared understanding among employees regarding the uses and value of data is one of the main causes of data errors. Fortunately, as analytics plays a larger role in many organizations, it is driving the evolution of flexible analytics tools to enable collaborative analysis between IT staff and business users.

Tip #2: Embed your analytics

Companies should shift or enhance their use of analytics, moving beyond building analytic models to "embedding" analytics into business operations to improve performance and ensure data accuracy. A key element of embedded analytic solutions is the use of built-in business rules to identify error/warning conditions and allow the business to eliminate dirty data at the source and respond rapidly to changing situations and process anomalies. These solutions generate alarms that can automatically initiate workflow tasks, including the execution of data cleansing processes, shortening the time it takes to recognize, understand, and address data problems before they escalate downstream costs.

Tip #3: Don't force an overarching schema

Next-generation analytics platforms don't impose overly rigid requirements on the structure of your data, so you won't have to develop an overarching schema and, therefore, make better use of data that doesn't fit neatly into specific sets of rows and columns.

Tip #4: Provide visibility into the origin and history of the data

If you plan to use the insights from your data to convince executives in your organization that data issues exist and impact the quality of your decisions, bringing attention to your work -- and making the data discrepancies clear -- can make the difference between trust and skepticism. With a visual data analytic tool, you can accomplish this by showing the origin of your data and the steps taken to arrive at any given result. Analyzing data in a visual environment enables an iterative and collaborative approach to analytic design that promotes ease of use and accurate results that can be trusted.

Tip #5: Think beyond Excel

Excel is a very powerful business tool and can be effective for certain business needs, but the growing urgency for current, transparent, and accurate data analysis reveals some very real limitations. Excel leaves something to be desired when it comes to trust in analytic results because it's often impractical to trace each result through the analytical process to ensure its accuracy, Further, Excel does not enable automation or persistent auditing and cannot easily integrate large, disparate data sets -- something that is increasingly important in an era of Big Data.

A Final Word

A proactive approach should replace the "afterthought" crisis management approach to dirty data. With business collaboration and flexible analytics solutions, businesses will be able to successfully "spring clean" their dirty data and maintain a vigilant approach to data integrity going forward.

Drew Rockwell is the CEO of Lavastorm Analytics, a global business performance analytics company that enables companies to analyze, optimize, and control the performance of their business processes. You can contact the author at [email protected].

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.