The Six Cs of Trusted Data
By Philip Russom Director, TDWI Research, Data Management
In my last column for TDWI Experts in BI, I defined “trusted data” as:
Data that is drawn from carefully selected sources, transformed in accordance with data’s intended use, and delivered in formats and time frames that are appropriate to specific consumers of reports and other manifestations of data.
These and other data properties assure that data is trustworthy from a technical viewpoint, as well as trusted by users who consume the data through reports and applications. Trust is important for data. Without trust, users may ignore supplied data and build their own data stores. Data in poor condition can lead to poor decisions.
In this column, I will drill deeper into how to achieve trusted data. A base assumption is that the problems resulting from non-trusted data can be avoided by following modern best practices in data integration, plus related disciplines such as data quality, data profiling, master data management, and metadata management.
The mere presence of these isn’t enough, of course. Solutions created with data management techniques—in order to produce trusted data—must focus on the data properties that are key to trust. In short, the data must be complete, current, consistent, clean, compliant, and collaborative.
As luck would have it, the six data properties that are key to trust all start with the letter C, which is why I call them “the six Cs of trusted data.” Let’s take a look at how each contributes to trusted data.
Complete data. This results from data integration techniques that produce consolidated data structures. For example, an enterprise data warehouse fosters trust, as the single—and complete—version of the truth for decision making. Likewise, the EDW provides a historic context for real-time data, and 360-degree views of customers give users confidence that they really know the customer.
Current data. A common question in BI is “How old is the data in this report?” With time-sensitive practices such as operational BI, fresh data is considered trustworthy, whereas stale data isn’t. As data is delivered faster and more frequently (leaving little time for data preparation), delivering current data that’s also consistent, clean, and compliant is a challenge.
Consistent data. Consistency stems from consistently applying definitions of business entities, such as customers, products, and finances. Metadata management and master data management can improve consistency by documenting data’s origins and meanings. Without consistency, users don’t trust that data was sourced or aggregated properly, especially when data travels across multiple IT systems.
Clean data. This is typically the result of data quality techniques, such as standardization, verification, matching, and deduplication. Users’ perceptions of data’s quality are probably the biggest challenge to trust, which is why data quality techniques are critical. Quality decisions and operational excellence both depend on clean data.
Compliant data. Compliance regulations come from many sources. Some are external to your enterprise, including federal legislation and your partners. Others are internal, including your own standards for data architecture, quality, security, and privacy. Technical and business people need to trust that data has been accessed and distributed in accordance with multiple internal and external regulations. Achieving this level of trust may require a data governance board or similar organizational body.
Collaborative data. First and foremost, collaboration over data helps ensure that data management and business management goals are aligned. Cross-functional collaboration improves trust in crossdepartmental data sharing. Collaboration can drive consensus in how business entities are defined in data. In a lot of ways, the first five Cs are data properties, whereas the sixth one—collaboration—reaches across and unifies the other Cs. Collaboration is the “secret sauce” that adds trust to an EDW’s complete data, operational BI’s current data, data quality’s clean data, and data governance’s compliant data.
A Measuring Stick
In summary, let the six Cs of trusted data guide you. They are a measuring stick for both technical and business people, defining goals that data management staff must strive toward continuously. Yet the six Cs define business users’ requirements, too; satisfy these and data management solutions will be considered successful at delivering trustworthy data that users can feel confident about using.
Philip Russom is a research director at The Data Warehousing Institute (TDWI), where he oversees many of TDWI’s research-oriented publications, services, and events. Prior to joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research and Giga Information Group. He has also run his own business as a BI consultant and independent analyst, plus served as a contributing editor to leading data management magazines. You can reach him at firstname.lastname@example.org.
This article originally appeared in TDWI Experts newsletter December 16, 2010. For more information or to subscribe, visit tdwi.org/pages/publications/newsletters.
This article originally appeared in the issue of .