Confidence in Analytics Begins with the Data
Organizations need to adjust data quality processes to fit the requirements of analytics rather than stick with what they have established for standard BI reporting.
- By David Stodder
- March 3, 2015
Analytics continues to be a hot topic as more organizations seek to derive greater competitive value from data. In pursuit of smarter decisions, users across organizations want to expand past the limits of spreadsheets, standard business intelligence tools, and applications that offer embedded reporting but not much for serious data analysis. Technology providers have been moving fast to provide analytics tools and platforms to meet heightened demand. Yet, technology can only take organizations so far. In the middle of all the frenzy is the data; without user confidence in the data, analytics will never deliver on its promises.
However, confidence in the data is different from an analytics perspective than it is from the perspective of BI reporting. In the BI and data warehousing realm, confidence in data comes from having strong processes for data quality, consistency, and completeness. BI reporting on financial data in particular requires a steady flow of high-quality data that is consistent and well-structured, including how it is presented in dashboards or other visualizations. Adding new data to such a process can be a big ordeal in part because of the high standards it must meet. Users of performance management systems will stop taking key performance indicators (KPIs) seriously if they lose confidence in the quality of sales, financial, or other "system of record" data behind the metrics.
Many types of analytics may not require meeting such a high bar for data quality. Customer analytics, for example, depends on integration of data sources that can be famously inconsistent. Organizations' internal customer data sources are often incomplete, conflicting, and incorrect. Common external sources from data services providers can have quality problems as well, including being outdated. In a 2012 TDWI Best Practices Report I wrote -- Customer Analytics in the Age of Social Media (Third Quarter 2012, page 33) -- TDWI Research found that only 10 percent of survey respondents considered their customer data of the highest quality: that is, with no fragmentation, duplication, or inconsistency. [Editor's note: The Best Practices Report is available for download at no charge, but a short, one-time TDWI registration may be required.]
Then as now, organizations have tried to remedy these problems by consolidating customer data into one source such as a single, enormous customer data warehouse -- a process that allows them to apply data profiling and other processes to improve data quality. A little over half (52 percent) of respondents to the 2012 survey said they were taking steps to consolidate records, link records from disparate sources, and purge duplicates. Yet, even as organizations take steps to consolidate customer data, sources are growing more diverse. Innovation in customer analytics often depends on access to new data from social media, click streams, and other less structured behavioral records created by customers. Hadoop files that might be holding this data can easily exceed the size of customer data warehouses. Unable to finish the creation of a single source, organizations could be frustrated in their pursuit of exacting data quality goals.
For many types of analytics, however, the purpose of accessing these data sources is exploratory -- pretty much the opposite of reporting. Analysts want to try different questions and approaches to learn what they don't know. Performance standards are different; analytics can often require many passes through different types of data to test models and look for patterns. Such requirements are one of the reasons behind in-memory computing's rise in popularity as a business-driven alternative to the disk access bottleneck as well as to adhering to IT's data integration and quality requirements.
As organizations increase the amount of exploratory and other types of analytics -- and base important strategic decisions on the results of analytics -- how can they maintain user confidence in the data? Here are three practices that can be valuable:
Practice #1: Use analytics to learn about the data itself. As the volume and variety of data grow, organizations may need to implement more consistent methods, including the use of analytics, to increase their knowledge of the data and how elements may be related across sources. To meet regulatory requirements, this practice may be essential; organizations need to know where sensitive data is located and how it is being used.
Practice #2: Set users' expectations about data quality. Make sure that when users access sources, they understand what they are getting in terms of data quality, consistency, and other factors. This practice is particularly important for nontechnical users such as executives who are not familiar with the data -- or even with common best practices for data analysis and reporting. You don't want executives to just grab a data point and use it without knowing the level of confidence they should have in the source.
Practice #3: Amend data governance policies to fit analytics. Many organizations have made strides in improving data governance, but the policies are often pegged to standard BI reporting activities, not exploratory analytics. IT management should observe analytics processes so that they can adjust data governance policies appropriately.
Adjusting Quality Processes to Fit Analytics
Organizations have the potential to make great strides in improving business performance, customer interactions, and process efficiency by using analytics. However, if the analytics are unknowingly based on poor quality data, the result could be the opposite: poor decisions that put the organization in all kinds of jeopardy. Thus, organizations need to adjust data quality processes to fit the requirements of analytics rather than stick with what they have established for standard BI reporting.