Moving the Data Quality Needle Still a Challenge
Despite the growing need for data quality, organizations are stuck facing the same problems using the same tools and techniques as years past.
- By Steve Swoyer
- June 27, 2016
Organizations are still using the same tools and techniques to deal -- or not to deal, as the case may be -- with the same issues. They're likewise dealing with many (if not most) of the same problems. The upshot is that it's hard to move the data quality needle.
A recent data quality discussion at TDWI's Chicago conference was a case in point.
To be sure, participants addressed a range of cutting-edge issues, such as supporting self-service business intelligence (BI) and self-service analytics. The panel also devoted a significant amount of discussion to the problem of integrating and governing information from big data sources.
All the same, most participants could have been discussing the state of data quality circa 1996 or circa 2006. Now as then, legacy data sources -- and the people who tend to them -- pose significant quality and governance problems. Now as then, it's hard to get most of the people in an organization to really care about data quality.
Too many organizations are still behind the curve when it comes to effectively managing their data. Ad hoc, custom-coded, or manual tools and processes are common, and the goal of automated and reliable data ingest is still elusive.
"We're trying to merge the data coming from a legacy system into more modern technologies such as CRM; [we're] trying to merge promotions and campaigns with the actions that our customers take," one participant explained.
So far, so good. Nothing that can't be managed using a combination of process tweaks, technology solutions, and, of course, good old-fashioned human behavioral modification. Right? Wrong. "We have rules that are not applicable to the data we receive from the legacy systems. This matching process with business rules and the [quality of the] data coming from legacy systems ... is really what is challenging for us," this person said.
"Sometimes the effectiveness of a marketing campaign can be affected by as much as two to three percent because of this misalignment between data [and rules]," he added.
Philip Russom, senior director of research for data management with TDWI, observed that this isn't an uncommon problem. "Often with legacy platforms, the data quality problem is upstream," he said. This issue is compounded by the fact that most organizations are in holding patterns with respect to their legacy investments. "Their philosophy is, 'I don't think we want to invest any more in these legacy systems.' In a lot of cases, in fact, they're just afraid to change anything."
In his case, the participant reported, the business is at least engaged and listening. The problem is that his group has little to no input into (or control over) the way data is managed on the source legacy systems. Even with an engaged and listening line of business, his team can't get the IT group that owns the legacy source data to relax its grip. At best, problems are addressed once they're identified on the downstream system.
This is a reactive approach to data quality that's far from ideal. "Our challenge is that ... because we don't know what the data looks like underneath [i.e., in the legacy system], it's something that we have to present [to the legacy IT group] every time there is an issue. Then we have to get them to correct the data. It's a constant challenge," he said.
Attendees also spoke on another, all-too-familiar theme: given the choice between short-term expediency and even the slightest inconvenience, most line-of-business people will opt for expediency each and every time.
One person cited the persistence of unsupervised information distribution -- users extracting data from a warehouse or source system and dumping it into Excel and collaborators emailing spreadsheets to each other. In his organization, "Excel spreadsheets are a self-service BI tool. The primary BI tool is an email and an Excel spreadsheet," he lamented. "The system is collapsing under its own weight on a daily basis ... but nobody realizes it's a problem."
Another described an even more horrific scenario.
"We are probably 98 percent [an] ad hoc data warehouse with no pre-prep data. [There's] no data quality -- none. It's the way they've been doing it for decades, so that's just what the users are used to," he said, noting that this practice stems from the 1980s and 1990s -- specifically to his organization's use of end user-oriented desktop BI software from the former Brio Technology Inc. (The former Hyperion Solutions Corp., itself acquired by Oracle in 2007, acquired Brio in 2003.)
The lesson is clear: introducing new data quality or governance restrictions in a previously unrestricted context is a very tough sell. "Especially with the old Brio tool, we let them have that capability, so that's what they grew up with, that's what they expect," this participant noted.
The discussion wasn't all doom and gloom. Participants were optimistic, not pessimistic, about their ability to accommodate and promote the self-service use case, for example. Although all agreed that big data poses daunting problems for data quality and governance, most said they aren't yet grappling with those issues -- thanks chiefly to the fact that few (if any) sane organizations are using big data platforms to support core financial reporting or traditional BI-analytics applications.
That said, the old problems aren't going anywhere. Another participant noted that line-of-business people are still mostly indifferent to the importance of data quality -- until there's a problem.
In such cases, he said, even though the business itself might be the source of the problem (data entry errors, inconsistent metadata definitions, chronic use of spreadmarts), the buck always seems to stop with IT. One solution is to compel -- by executive fiat, if necessary -- the business to take responsibility for the data that belongs to it, this participant argued.
Otherwise, he concluded, "they'll quite frequently try to pass the buck. They'll say 'Someone in BI screwed up, or someone in IT.' As soon as the business recognizes that they own their data, that changes the game, and then these things become important to them. Then it's 'I'm responsible for data quality in my business unit.'"
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at email@example.com.