Achieving Scalable, Agile, and Comprehensive Data Management and Governance (Part 3 of 3)
In the last part of our three-part series, Kevin Bohan, director of product marketing with Denodo, and Liam Yu, product marketer with Hitachi Vantara, discuss the new Best Practices Report on achieving scalable, agile, and comprehensive data management and data governance.
- By Upside Staff
- October 6, 2023
In this recent “Speaking of Data ” podcast, Denodo’s Kevin Bohan and Hitachi Vantara’s Liam Yu discussed the latest developments in data management and governance. Bohan is director of product marketing with Denodo and Yu is a product marketer at Hitachi Vantara. [Editor’s note: Speaker quotations have been edited for length and clarity.]
“One of the big challenges I see with our customers,” Bohan began, “is managing data silos. People often think moving to the cloud creates one single location for data, but that’s not the case. For example, AWS alone has more than 15 different services, each designed for a different use case. Hybrid and distributed environments are going to be with us for a long time.”
Yu concurred, saying that oftentimes data teams have approached the problem of integrating data silos too narrowly, such that it worked for a time but is no longer up to the challenges of today’s much broader needs.
The key, Bohan explained, is to facilitate access to this distributed data by adding a logical abstraction layer on top of all these silos so it appears to users as though they’re connecting to a single source. In many cases, adding this abstraction layer is a simpler task than trying to create the ETL methods required to integrate the data into one location, he added, and can cut data delivery times by as much as 65%. However, Bohan cautioned that there will still be cases where data will need to be brought into a data warehouse or data lake before it can be analyzed.
He went on to discuss two popular approaches to this abstraction layer: the data fabric and the data mesh. The data fabric, he said, is primarily focused on technology and infrastructure while the data mesh is more about people and processes, though he pointed out that the differences are less important than how they both can contribute to creating value for your organization, both individually and together.
Another major challenge both Yu and Bohan agreed on was data governance of the new, distributed landscape.
“Organizations have been governing their data for a long time now,” Yu explained, “and are now working on maintaining that but at a lower cost or with less impact.” There are four key areas where this is happening:
- Establishing data quality standards. Organizations are taking the advances they’ve made and standardizing them across their data teams, their storage, and their management structure.
- Creating and enforcing security policies. As part of creating a data-driven culture, users receive training in how to handle data properly as well as tools to ensure data is secure. This also includes access control policies to govern who can read and write data.
- Implementing an auditing framework around data. This is where organizations learn how to properly identify data with respect to its usage -- including the risks and opportunities of making it available to users.
- Leveraging automation. Automation is what helps organizations improve over manual processes and make them scalable, especially in the context of self-service BI and analytics.
“The distributed nature of the modern data environment makes governance a real headache,” Bohan added. “Maintaining consistent policies across the organization can be a challenge, especially when there are regulatory differences across state and national borders.” He explained that one of the advantages of the data mesh in this regard is that it allows domain experts with deep knowledge of the data to be the guiding force behind classifying data for governance purposes rather than a central IT or data team that lacks that expertise.
Bohan and Yu also agreed on another potential solution to problems of data access and security: data catalogs and metadata management.
Yu explained that another advantage of having businesspeople integrated into the process of managing data is that the resulting continuous process of classification keeps metadata and data catalogs up to date on the various important aspects of data such as who should have access, what level of sensitivity is appropriate, and so on.
“One of the highest levels of metadata management is ‘active metadata’” Bohan noted. “Where regular metadata is just data about your data, active metadata is data about how people are accessing the data and how the components of your data stack interact with one another.” This active metadata, used in concert with other tools (such as AI and recommendation engines), can improve how your entire environment operates, he explained.
“When done right, data catalogs should make the experience of exploring, discovering, and accessing data simple,” Bohan continued. “It should be as easy as any online shopping experience.”
Yu went on to say that innovations in generative AI and LLMs are only increasing the importance of existing requirements for data quality and governance.
“These are a new class of applications that are consuming data in much more sophisticated ways. If they’re consuming incorrect or poor-quality data, your applications will start to deviate in unexpected and undesired ways.” This will put new demands on data managers for things such as real-time data.
[Editor’s note: This is the third and final part of a discussion about data management and governance. David Stodder, senior research director for business intelligence with TDWI, discusses his newest Best Practices Report, which covers how to achieve the best in data management and governance, in part 1. You can also download a free copy of the TDWI Best Practices Report.
Kevin Bohan will be speaking at TDWI's Virtual Summit, "Transforming Data Integration" (October 11-12, 2023).]