Governance Is About Trust, Not Just Regulatory Adherence
Improve governance and trust in the data with data catalogs, glossaries, and metadata repositories.
- By David Stodder
- March 5, 2021
Governance is a major priority for many reasons, but regulatory adherence is often at the top of the list. Organizations need to monitor sensitive data use to adhere to data privacy regulations and respond to governance audits. In TDWI Best Practices Report research, over half of organizations surveyed (57 percent) want to use a centralized data catalog, glossary, or metadata repository to address these challenges.
In a related survey question, TDWI learned that only 10 percent of organizations surveyed are very confident in their ability to use metadata and a data catalog to improve governance; 29 percent are somewhat confident and 53 percent are not confident (8 percent don't know). [Note: All research quoted in this column is from Q3 2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern Business Analytics, available here.]
A key aspect of governance is monitoring data usage and lineage, which 43 percent want to improve by establishing a data catalog. Data lineage is about documenting and monitoring data sources and its journey through the organization: that is, who is responsible for sourcing the data, what applications use and share it, and how it has been transformed and enriched along its journey.
Data lineage is helpful in compiling a data inventory, which most data privacy regulations require for customer and consumer data. Data lineage is also important to establish trust in analytics and visualizations such as dashboards; if users question the source of the data in predictive models or charts, they can use data lineage to track it down.
Catalogs can maintain ongoing reports showing compliance with data privacy regulations. A modern, "active" data catalog can automate steps for tracking data lineage so organizations can respond to governance audits and help users learn about data sources. "Active" means that the system uses artificial intelligence, business rules, and automation to work without user intervention and be responsive to users' interaction with the catalog system.
As they evolve, active data catalogs can give users advice when certain data (or blending of data sources) may be sensitive and might raise governance or regulatory concerns. Using AI and automation in data catalogs is essential as organizations scale up to support thousands of users and their dashboards, data pipelines, and analytics.
Problems with Self-Service and Lack of Visibility
TDWI research finds that from a governance perspective, organizations surveyed are most challenged by the increase in self-service data access, analysis, and sharing (52 percent), followed by -- as we would expect -- lack of visibility into data-related activities (43 percent). These organizations appear to be struggling with not knowing what self-service users are doing with the data and with whom they may be sharing it.
Fewer organizations are facing challenges in their ability to control access and authenticate users (23 percent), which indicates that most organizations are confident they can protect the data in systems. A somewhat higher percentage, however, is experiencing challenges in governing and securing data in motion across networks (29 percent).
In the midst of the coronavirus pandemic, with so many remote workers, data could be exposed if it moves across networks to endpoints outside firewalls. However, organizations can effectively control remote access to systems situated behind firewalls or protected by cloud platform security. A bigger governance and security concern may be data movement from on-premises systems to cloud data platforms and between multiple cloud platforms; 38 percent of research participants cite integrating on-premises and cloud-based data as one of their biggest challenges.
Data growth, diversity, and distribution make governance challenging. Data, of course, typically does not stay the same, come from the same sources, or even stay in one place, which can create unending challenges for governance and security. Just over one-third (35 percent) of research participants say growth in volume, diversity, and speed of data represents a challenge and nearly the same (34 percent) say data distributed across on-premises and cloud platforms can be difficult. These and other issues can drive up the costs and resources required to govern and secure data, which 35 percent of participants cite as a challenge.
Improving Trust in the Data
Beyond monitoring sensitive data and establishing rules and policies for data use, an important governance objective is to improve users' trust in the data. Users can then be confident in their reports, dashboards, and analytics because they know that the data meets the organization's governance standards for quality and is approved for the intended purpose.
TDWI asked research participants what actions their organizations have undertaken to improve users' trust in the data. The largest percentage say their organizations monitor data quality (63 percent), which can include a range of practices and technology implementation to oversee data creation, use, maintenance, and sharing to make sure that data adheres to organizational standards. Data quality monitoring typically focuses on issues such as validity, completeness, consistency, and redundancy.
Organizations can use tools to set up notifications that alert administrators to problems during data ingestion and migration. They can also create metrics to measure whether data quality is improving over time and enable analysis of what issues may be causing data quality to become worse, such as lack of stewardship for self-service users or poor oversight of new data. Indeed, facilitating data stewardship and mentoring is the second most-common action taken by organizations surveyed (53 percent).
Half of respondents say their organizations are validating new data sources (50 percent). Steps that can improve trust in data validation include tracking data issues centrally so everyone can see what problems exist with certain sources. This can help address the issues at those sources as well as make data validation a part of workflow management so there is a record of whether validation checks have been performed during loading and collection into a target system such as a data warehouse.
Almost as many survey respondents say their organizations are training users in governance and responsibilities about data (45 percent) and setting clear expectations for responsible data use (42 percent), which are often duties of stewardship. About the same percentage of respondents use governance to increase users' confidence in the data (43 percent). Less-common activities include documenting data provenance, which typically includes recording what influences or changes the data over time. Just 28 percent of organizations surveyed are doing this, which is about the same percentage (27 percent) TDWI found when we surveyed organizations about this in 2017.
Organizations may need more automated tools to make documentation easier and governance overall more streamlined. To improve data trust, organizations should set up a center of excellence (CoE) or competency center to improve governance, accountability for the quality of data sources, and workflow management. TDWI research shows that about a third (32 percent) of organizations surveyed are setting up a CoE or competency center.
Where to Focus
Here are two of our best recommendations for improving governance and trust in your data.
Improve data catalogs, glossaries, and metadata repositories. Having an easily accessible and up-to-date knowledge base about the data, its lineage, and its location is invaluable. It can shorten the time it takes people to find, prepare, and use data, reduce confusion about data quality and consistency, improve collaboration, and aid in governance. TDWI research finds that organizations are not fully satisfied with data catalogs, glossaries, and other metadata repositories (many do not even have one). Organizations should invest in modern, AI-infused technologies for establishing and improving these shared resources.
Make governance part of raising overall data quality and trust. As the number of business analytics users grows and workloads increase, governance can become more difficult. Organizations need to adhere to data privacy and internal data-use regulations. They can also use governance initiatives to improve the overall quality of data, analytics models, and visualizations. However, governance enforcement must be balanced with widespread interest in self-service business analytics. Organizations should use modern technologies and data stewardship practices to make governance effective but enforcement less obtrusive.