How Data Catalogs Accelerate Decisions, Boost Productivity
Delayed decisions mean delayed benefits. With data catalogs, everyone from data scientists to self-service BI users can benefit from useful information about their data.
- By James E. Powell
- November 15, 2018
Let's say you start a new data-driven project to answer a question or solve a business problem. Perhaps it's to understand a new form of customer churn. Once you identify the nature of that churn, there's always another form of customer churn coming. What should your enterprise do to minimize this new churn?
The trouble is, it takes time to find relevant data, and the longer you delay a decision, the more likely it is your corrective action will come too late.
Data Catalogs Improve Data Access
In a recent webinar, "New Practices in Data Cataloging," Philip Russom, senior research director for data management at TDWI, noted that many enterprises want to be data-driven, and in order to reach this goal, many types of users need to find just the right data for the task at hand, and find it quickly.
Finding the data is just the first step. "You want to look at various characteristics of the data to be sure it is in good technical condition and that other users have used it successfully. Eventually you form a data set that represents the profile of the type of customer you think is churning at the moment, " Russom told Upside. "Can you do all this without reinventing the wheel? Can you profit from what your colleagues have done on similar projects?"
That's the beauty of data catalogs -- they hasten, target, and document the creation of new data sets for analytics and other data-driven projects.
Unlike metadata -- which tracks technical details such as table and field names and the data types they contain -- data cataloging keeps track of data according to attributes you really care about. Is a table made up of customer, financial, or product data? If it's customer data, does a table or field contain information for support (as in a call center -- sometimes customer churn is driven by dissatisfaction with products and services) or is it more appropriate for sales and marketing's use?
Where is that data managed and where did it come from? Is it reliable, trusted, in good quality? What have colleagues used in the past and how well has that helped them solve their problems?
A data catalog can help you answer these questions and more.
Russom draws a clear distinction between metadata and data catalogs. "There are at least three types of metadata, and we need technical metadata for interfacing a wide range of applications. But that's not what a catalog does. The catalog goes beyond technical labels and describes business and process attributes that are a complement to technical metadata.
"You can catalog data by the kinds of things you really care about, such as its quality, trustworthiness, usefulness, prior successful use, or domain segment (e.g., customer data in a sales context, which differs from customer data in a support context). With this kind of cataloging in place, you can search, query, and browse the catalog in a more targeted way, to find just the right data quickly and accurately.
"Data cataloging has reached a new maturity level," he notes. "It's like a lot of things we see in data management today. It's something we've been dreaming about -- and the technologies and the user best practices are catching up and coming together. We can do cataloging today at a level we just weren't able to do in the recent past."
Another advantage of data catalogs, which tend to be centralized, is that by having a repository for all these facts for data across all systems, an enterprise gains a unified view of highly physically distributed data. That helps foster collaboration among a wide range of users -- including combinations of business and technical people, as well as technical people cooperating on analytics projects.
A Bevy of Benefits
Imagine incorporating data quality metrics into a catalog. "As you're browsing data and considering its use for a data-driven project, you can also consider its level of quality. Sometimes the quality's so bad you just want to skip it. Sometimes the quality is so good that you'll want to spend more time with the data. That affects the level of trust that you have." It's all possible with a data catalog.
"Imagine being able to categorize things that are as intangible as trust, yet that is one of the things I see people doing with catalogs. Another thing I see people doing with catalogs is to catalog the data according to compliance sensitivity," such as for HIPAA and the GDPR.
Data catalogs also accommodate annotations. "If you work with data, you can put in notes telling people 'this was great for my application in customer churn' or 'I had these potholes when I was working with this data for these things and here are the workarounds I found.' That way you can share your experience and other people can learn from and leverage what you've experienced."
Annotations can also assist you with finding uses for new data types. "It's funny how different types of location data can apply to different applications. Wouldn't it be nice to get some annotations and see what your colleagues are doing with location data?" Now you can.
Data lineage is increasingly important as data consumers ask frequently where the data in a report or analysis came from, how it might have been transformed, and whether there are other copies of this data they should be aware of. Lineage isn't really captured in metadata, but it can be in the catalog.
More to Learn
In his presentation, Russom discusses the role of automation (which helps the cataloging process move a lot faster) and use cases in analytics, operations, governance, and compliance. He explains why catalogs are not just for new data sources such as big data from new devices. "Now it's for all the data, no matter if it's traditional enterprise data or new data from more modern platforms." Russom also explores how crowdsourcing can keep catalogs up to date.
From enabling intelligent data discovery to data-driven self-service best practices and data prep, data catalogs offer a host of new features.
Russom concludes by explaining that "the catalog helps you develop a global inventory for all data and its uses. The cool thing about data cataloging is that it has really broad applicability. It's useful in analytics, in operations, and even compliance."
James E. Powell is the editorial director of TDWI, including research reports, the Business Intelligence Journal, and Upside newsletter. You can contact him
via email here.