Data Mesh Domains Explained
At the heart of a data mesh are data mesh domains, but what are they, exactly? We explain the basics you need to know.
- By Dmitrii Nabirukhin
- October 2, 2023
Many companies are facing drawbacks of their centralized data warehouses and want to improve their data sharing capabilities and data quality. They want faster access to critical data and lower costs. One solution they may consider is a data mesh.
Moving to a data mesh primarily involves changing the operating model rather than implementing new technologies. It requires a fundamental rethinking of data management and supporting business processes within a company. It also introduces new concepts, such as data mesh domains.
I was involved in moving from a data warehouse to a data mesh at a tech corporation with over 10,000 employees and I learned that the most challenging part of data mesh adoption is getting a clear understanding of what a data domain is at the business level. This article will explain the concept of domains to ease your transition to the data mesh.
What Is a Domain?
A domain in a data mesh is a product that consists of data and access to it. As with any other product, it has its own market (which is the company itself) and is designed to meet the needs of clients (in this case, the employees). It also addresses the following:
- Resources: The components of the domain
- Value: The customer problem the data domain solves
- Clients: Who uses the domain data
- Owner: Who is responsible for domain development and management
- Reports: What information is used to evaluate the product’s effectiveness
Let’s explore each of these areas to understand the data mesh domain.
Besides the main product (data), the domain includes:
- A team that consists of several roles, such as a product manager, data analysts, data engineers, and data scientists working on product development.
- Hardware might differ from company to company and may depend on the technologies a company uses to build a data mesh. However, the basic setup is an isolated environment in storage and computational nodes using containerization.
- Knowledge includes all the experience, skills, and information gathered throughout the product development process. It is collected, processed, and stored for future reference. For instance, source code is stored in a Git repository, the domain’s metadata is kept in a data catalog, and the technical documentation is maintained in a corporate wiki, such as Google Drive, SharePoint, or Confluence.
There are two types of domains: source-aligned and consumer-aligned.
Source-aligned data domains are the building blocks of data meshes and are the primary data sources for all users in the data network. They consume data from only one source system but serve any other domains in the data mesh that need data from that source. The most important feature is that the data is provided as-is -- without any modifications or improvements. This means that when data is replicated from the source system, its context is fully preserved.
This concept is not always easy for business teams to grasp. This example clarifies the issue. A business has “clients” and “prospects” and they store data about these people in a CRM system. Let’s assume the CRM system stores all the data in one place and keeps clients and prospects in one database named “clients.” In the context of the CRM, both clients and prospects are identified using the single term “clients.” However, in the context of the business, clients remain “clients,” and prospects remain “prospects.”
In contrast, consumer-aligned data domains don't use data from a central source system; rather, they work with other source-aligned and consumer-aligned data domains to modify the data to meet business requirements. Throughout the transformation, data gets its own domain-specific context. Following the example above, the CRM data will be imported and transformed into a “customer analysis database.” As part of this, it may be enriched with data from other sources such as purchase and payment history to separate “clients” from “prospects.” It won’t be data as-is; it will be data suitable for business needs.
There are two types of domain clients: employees and other technology products. Employees access domains and use data for different purposes (e.g., analyzing data in Excel or SQL).
Different technology products (business intelligence systems, operational systems, and other consumer-aligned domains) also have access to domains. The process is the same as for employees: they read or extract data and modify it following the required business logic. Most often, they are extract, transform, and load (ETL) processes.
The role of a domain owner is similar to a product owner, which includes communication with users, as well as managing the backlog, team, and resources so the domain can perform its function. Moreover, the domain owner is responsible for data governance. Because each domain is a part of the overall data mesh, data governance must be applied at the corporate level and followed by every domain owner.
There are two types of metrics for domains: common KPIs and specific ones.
Common KPIs include the general metrics that apply to every domain, such as total cost of product ownership and data quality. Specific KPIs focus on the effectiveness of business tasks and requirements assigned to an individual domain. For instance, if a domain goal is fraudulent operations detection, a metric for the task can be preventing or reducing monetary damages from these operations.
Is Any Data Product a Domain?
The simple answer is no. To be a domain in a data mesh, a data product should comply with the company's data management principles. Every company has its own set of principles, which typically include:
- The domain client can find data. There is a user-accessible domain’s data description in the data catalog.
- A domain client has access to data. The domain supports different connection types, such as JDBC, API, and others.
- A domain is responsible for the data quality. Automatic data quality control tools and incident management processes are implemented and maintained.
- A domain is responsible for data security. Data access policies are defined and configured following information security requirements.
To create a data mesh within a company, several domains must be created and interconnected. Each domain enters into contracts with the other domains for the use of its data in the same way as suppliers commit to buyers. These contracts specify details of the agreement, such as who provides what data to whom, in what form, how often, and how the agreement can be changed. These contracts are a universal way to inform domain owners about the need for data and to protect against unexpected changes in its composition or structure.
The data mesh approach challenges the familiar way of seeing and managing data within a company. Its decentralized view of data based on domains makes the transition from a data warehouse or data lake potentially more complicated if it is not effectively addressed to business decision-makers. Hopefully, now is the time to start the data transformation.