Study Finds Organizations Struggle with Data Issues
Note: TDWI’s editors carefully choose vendor-issued press releases about new or upgraded products and services. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the vendor's statements.
O'Reilly, a valued source for insight-driven learning on technology and business, released results of research into the state of data quality in 2020. The "O'Reilly State of Data Quality in 2020" report reveals concerns around data quality and uncertainty about how best to address those concerns in the enterprise.
Key findings include:
- There are too many data sources and little consistency: When asked to share the primary data quality issues they face, more than 60 percent said they were suffering from “too many data sources and inconsistent data.” This was followed by 50 percent reporting “disorganized data stores and lack of metadata” and “poor data quality controls at data entry” (selected by 47 percent).
- Organizations are dealing with several data quality problems at the same time: A majority of respondents reported that they’re dealing with either three or four data quality issues at the same time. Over half (56 percent) of respondents reported at least four data quality issues and 71 percent reported having at least three data quality issues.
- Data governance best practices are not being adhered to: Eighty percent of respondents say their organizations do not publish information about data provenance or data lineage, which -- along with robust metadata -- are essential tools for correctly diagnosing and resolving data quality issues.
- Few resources are currently available: Over four in ten respondents (44 percent) said that they had "too few resources available to address data quality issues."
- Use of machine learning (ML) and artificial intelligence (AI) to address data quality issues is growing: Almost half (48 percent) of respondents, however, say they are now using data analysis, ML, or AI tools to address data quality issues. This should help improve the lack of resources problem because ML and AI can help simplify and automate the tasks involved in discovering, profiling, and indexing data.
"These findings show the need for both better education and better data management and cataloging tools -- those that generate metadata and capture/manage data provenance and lineage," said Rachel Roumeliotis, vice president, content strategy for O'Reilly.
The full report is available online with no registration required.