Why Data Operations Platforms Can't Survive without Governance
We explore the three pillars of governance: trust, interoperability, and collaboration.
- By Chris Kalima
- January 18, 2022
Data operations (DataOps) is a data management methodology that improves an organization's efficiency and profitability by applying agile techniques to the workflows that derive business value from data. It leverages a combination of processes and technologies to reduce data friction, improve business agility, and increase security, integrity, and reliability throughout the data pipeline. However, to achieve increased productivity, a DataOps system has to ensure that the data is governed throughout its life cycle as it moves through increasingly complex pipelines and analytics workflows.
Data governance is another data management concept that focuses on the necessary people, processes, and technologies to ensure the availability, usability, integrity, and security of enterprise data, based on internal data standards and policies. Effective data governance ensures that data is consistent, trustworthy, and appropriately used by data consumers. Data governance focuses on the organizational strategies, roles, and policies that define who can take what action on what data, in which situations, using what methods. This governance framework is then operationalized and executed within the organization's data operations.
The three pillars of governance are trust, interoperability, and collaboration.
Best Practice #1: Ensure trust
Mechanisms for trust include but are not restricted to:
Access control: Different data types have different sensitivities that determine how and with whom the data can be shared. For example, open data can be shared widely while sensitive data is typically restricted to select roles in an organization. A DataOps environment that supports fine-grained data access controls ensures trust by making sure that data will only be accessed according to set policies.
Access restrictions can also apply to software and applications because much of it comes from third-party providers. Being able to sandbox these programs so they only access the data they are approved for and share the results according to policies will also help establish trust.
Authentication and identity management: Authentication and identity management is essential for data access control, and a prerequisite for any system that helps organizations ensure trusted data operations. As data from Internet of Things (IoT) devices becomes more prevalent in DataOps environments, ensuring the provenance of this data is increasingly important to ensure trust. Extending authentication and identity management to connected devices that actually contribute data is an essential element in establishing data provenance. A unified system brings together data, devices, and users under a single umbrella and increases overall data security resilience.
Audit trails: For all the partners in the ecosystem, including regulatory agencies and oversight groups, the ability to prove that certain actions have been performed will be a core requirement. Auditing capabilities will be essential to providing transparency to the appropriate stakeholders, which in turn promotes greater levels of trust.
Best Practice #2: Enable data interoperability
Data interoperability addresses the ability of systems and services that create, exchange, and consume data to have clear, shared expectations for the contents, context, and meaning of that data. Unfortunately, many organizations today are plagued by data friction (the complexities that prevent data from being delivered when and where needed).
Implementing a data interoperability system includes:
- Data integration: Synthesizing data from disparate sources into a unified view and schema
- Data exchange: Taking data from one data source and transferring it to a target data source as accurately as possible
- Data usability: Enabling effective use of data once it's made available to data consumers
Data interoperability addresses interactional problems that include organizational (i.e., business goals and processes), semantic (i.e., the meaning of the data), and technical (i.e., formats and protocols) elements. There are also social factors that impact the ability of stakeholders to collaborate and share data. This includes the discovery of the necessary data, convincing data owners to share data by communicating the value of collaboration while addressing risks and concerns, and convincing data consumers to rely on shared data by communicating its authority, provenance, and quality.
Data catalogs and data curation greatly help data interoperability by providing data consumers with appropriate metadata, including descriptions, taxonomies, statistics, lineage, and quality. These features allow data consumers to quickly locate data of interest and determine if that data is appropriate for their needs.
Best Practice #3: Increase data collaboration
Data is an organization's intellectual property, so organizations tend to safeguard it as an asset of competitive advantage. The fact that data contains proprietary information that companies need to strictly control hasn't changed, but the nature of business is changing so companies can no longer remain data silos. Businesses are finding themselves as members of multiparty ecosystems that require collaboration with stakeholders outside their organization.
Still, it's difficult for companies to break their data silos, which are entrenched due to legacy systems and organizational habits. Because of this, governments are increasingly encouraging companies, via legislation and regulation, to share data to both advance national objectives and stimulate business ecosystems. Perhaps the most well-known such initiative comes from the European Union (EU).
As part of its data strategy, the EU is pursuing what are called data spaces, which aim "to create a single market for data, where data from public bodies, private sector organizations, and citizens can be used safely and fairly for the common good." Data collaboration is not just a goal of national governments; for example, the New York State Energy Research and Development Authority has also established the Integrated Energy Data Resource program to encourage data collaboration in New York State's energy market.
Many companies will need to kick off their data collaboration efforts to ensure compliance with their relevant regulatory bodies, but we predict that the data infrastructure needed to support these efforts will also be useful in furthering their business efforts with a broader set of value-adding partners.
Organizations that adopt an agile and composable data architecture that provides trust, interoperability, and collaboration will not only be ready to participate in these emerging ecosystem economies but will be well positioned to thrive in them.
Chris Kalima is the VP of product management at Intertrust Technologies Corporation, where he is responsible for the company’s data platform products. A firm believer in the transformative power of data, Chris has spent the past five years working closely with large enterprise customers in the energy, insurance, and automotive industries, helping them launch new data-driven products and solutions. You can contact the author (a href="mailto:firstname.lastname@example.org">email or LinkedIn.