Question and Answer: How Data Infrastructure Enhances Your Bottom Line
In a challenging economy, how can you extend the life of your information systems? One solution is to focus on data quality and consistency.
- By Linda L. Briggs
- August 26, 2009
In a challenging economy, how can you extend the life of your information systems? One solution is to focus on data quality and consistency, thus laying a solid foundation for more costly master data management and data governance projects you might undertake later.
In this interview, we talk with Martin Boyd, vice president of marketing at Silver Creek Systems, a firm focused on product data quality or what it calls “product data mastering.” Boyd discusses a range of topics, from data quality to master data management (MDM) to data governance, and offers steps you can take to improve the effectiveness of your data infrastructure and deliver positive results straight to the bottom line.
BI This Week: In general terms, can you describe what is happening in the data quality market?
Martin Boyd: Despite the economy, the data quality market is still growing at a healthy rate. There are a number of drivers for this, from MDM to data governance, but in the end they all boil down to a need to treat data as a business asset. That means managing and maintaining it, and making sure it is available whenever and wherever it’s needed and in whatever format it’s needed. Whatever strategy you pick and however you label it, there is clearly an underlying need for clean, complete, and consistent data.
Something else that’s going on is that the market is a maturing and broadening into new areas -- specifically, into new types of data. Whereas structured, relatively uniform data such as customer data has been the focus for many years, there is now a much greater focus on much less structured types of data, such as product data.
With this broadening focus, many companies are realizing that product data is different and that -- despite vendor claims to the contrary -- with data quality solutions, one size rarely fits all. The traditional data quality engines built around customer data can’t handle the complexity of typical product data. This is because product data is highly variable, lacks standards, and has different rules by category. That dictates a semantic approach for cleansing and standardization for product data.
You said that MDM and data governance are driving issues. Are companies proceeding with those projects despite the economy?
Yes and no. On the one hand, they are still actively looking at MDM, and the requirement is certainly no less. On the other hand, the return on investment of MDM and related initiatives has always been hard to quantify and justify, so in this economy, many companies are looking twice before they commit.
It’s something like trying to decide whether to repair or replace a leaky roof. Either approach should stop the leak, and the new roof may offer architectural beauty, but any homeowner can tell you that it will cost many times more and will take much longer to plan and execute. In any case, if the roof is leaking you’ll have to do some tactical repairs immediately before further damage is done.
The same is true in business, where 80 percent of the value of the MDM system (that beautiful new roof) can be delivered by a short term fix to the data quality in existing systems (quickly fixing the leak) that probably takes 20 percent or less of the time and cost of the bigger replacement project.
Not to push the metaphor too far, but with the data quality leaks plugged, you can continue to live in the house and consider the best long-term solution, and how and when to implement it -- for example, when you have a little more money in your savings account!
In some cases, companies many really need to replace “the roof” immediately, but in many more cases, they can get a lot more life out of the same infrastructure with some well-placed data quality fixes.
Analysts put the cost of a typical MDM project at around $7millon, and say that 50 percent of the effort is in data remediation. That makes it easy to argue -- especially in this economy -- that a serious look at data quality is the right place to start, and that’s exactly what we’re seeing in terms of increased interest and demand in the market.
MDM and data quality seem to have a chicken-and-egg relationship. Which do you think comes first?
MDM and data quality are certainly related activities, but I think in most cases data quality should come before MDM. MDM relies on data quality, but data quality does not necessarily rely on MDM. In other words, it’s hard to imagine success in MDM without data quality, but data quality programs can certainly exist without MDM.
On the other hand, data quality gains may be short-lived without some form of MDM program to consolidate and lock in the initial data quality gains, helping to formalize them as part of the ongoing process.
Unfortunately, as fundamental as it clearly is to ongoing success, data quality is very often under-planned and under-funded -- which may be why so many MDM programs falter even in their initial stages.
If data quality is such a basic requirement, why does it often seem to get less planning focus?
Interesting question. I don’t think anyone would say data quality is unimportant. I think they just don’t realize how big a task it’s going to be -- for two reasons. First, before companies start the process, they probably don’t know what poor shape their data is in. That means putting a bit more effort into a realistic data assessment that can measure true “fitness for purpose,” based on their MDM or other use case scenarios.
Second, most firms don’t have the right tools to deal with data quality (or they assume they will get them as part of their MDM system purchase -- generally not a good assumption), so there may be something of a head-in-the-sand issue going on.
Often, good data quality is assumed to be a natural by-product of implementing MDM. Certainly, an MDM program can help maintain good data quality -- but only if it is there in the first place.
As the field of data quality matures, we’re starting to see sub-segments emerge, including a focus on product data. What’s driving that?
Essentially, customers and vendors alike are realizing that product data -- which includes all manner of assets, parts, spares, items for sale, and items for stock and their related manufacturer and supplier information -- is a whole different world. The data is less structured, comes in many categories with many rules and is infinitely variable. This is not at all what the traditional data quality vendors were designed for.
As more companies -- and vendors -- realize that, we’re seeing a much greater recognition that “data quality” isn’t just one item, but requires specific capabilities to handle specific problems. This wasn’t so true a couple of years ago, when data quality vendors had a tendency to say that they could do it all. Buyers are more experienced now and certainly more discerning, often because they’ve tried the “we do it all’ vendors with poor results in certain areas.
Certainly, those vendors are usually good at handling customer data -- which was their primary design goal -- and with simple structured data, but product data usually stumps them.
Product data presents different challenges than customer data. In what circumstances in particular is product data a big issue?
There are many. We see complex product data problems in almost every industry, but one example is online commerce, where product data must be changed from the many formats published by suppliers into the single, consistent format required for a Web site.
Another example is manufacturing, where large volumes of product are bought (sourcing and procurement), designed (product lifecycle management systems), manufactured (operational planning and maintenance, repair, and operations systems), and sold (requiring various forms of product data quality assurance and publishing).
Health care is my final example, because there is an enormous focus on reducing supply-chain costs by better identifying which products are being used, as well as which procedures are most cost-effective.
What does Silver Creek Systems bring to this discussion?
Our technology was built from the ground up to handle extreme variability. It’s ideally suited for product data, although it can also be used for other areas. We use semantic-based technologies to recognize words and concepts in context, which virtually eliminates the recognition issues that bog down other approaches.
We also have built-in auto-learning technology to recognize data variations; semantic marching to identify exact, close, and alternate matches; survivorship to merge records; and a governance module to ensure the whole process is working properly as well as to remediate problem data.