Skip to main content
00 Days
00 Hrs
00 Min
00 Sec

What Is a Data Mesh? A Plain Language Guide to a Shifting Architecture

Data mesh arrived in the data engineering conversation around 2019, introduced by Zhamak Dehghani in a series of articles that diagnosed a specific problem with how large organizations manage data at scale. It generated an unusual amount of discussion, partly because the diagnosis was accurate and partly because the proposed solution was genuinely different from how most organizations had been thinking about data infrastructure.

It also generated a fair amount of confusion, because "data mesh" describes an organizational and architectural philosophy rather than a specific technology or tool. You can't install a data mesh. You can only adopt the principles it proposes and redesign your organization accordingly.

The problem data mesh is trying to solve is familiar to anyone who has worked in a large organization's data function. As organizations grow, their data tends to centralize. A central data engineering team becomes responsible for ingesting data from all the business domains, building pipelines, maintaining a central data warehouse or data lake, and serving data to anyone who needs it. This seems efficient. In practice, it creates a bottleneck.

The central team becomes the dependency for everything data-related across the organization. Business domains that generate data have no ownership or accountability for its quality. Consumer teams that need data wait in queue for the central team to build what they need. The central team, serving everyone and owned by no one in particular, has limited context about the specific needs and semantics of each domain. Quality suffers. Delivery is slow. Trust erodes.

Data mesh proposes a fundamentally different organizational model. Rather than centralizing data ownership in a dedicated data engineering function, it distributes ownership to the domain teams that generate and understand the data. The team that owns the sales process owns the sales data. The team that owns customer operations owns the customer data. Each domain team is responsible for making its data available to the rest of the organization as a data product, complete with quality guarantees, documentation, and defined interfaces.

This is the first principle of data mesh: domain ownership. It is also where most of the organizational difficulty lives, because it requires domain teams to take on responsibilities they haven't previously had, and it requires leadership to resist the centralizing instinct that tends to dominate data infrastructure decisions.

The second principle is data as a product, covered in more depth in a separate piece in this blog. The point here is that distributing ownership only works if the data each domain produces is genuinely usable by other domains. Without a product mindset, distributed ownership just means distributed chaos: every team doing things differently, with no interoperability and no quality guarantees.

The third principle is self-serve data infrastructure. If domain teams are responsible for their own data products, they need infrastructure that makes it feasible for them to do that without becoming data engineering experts. A self-serve platform provides the tooling, templates, and guardrails that let domain teams build, deploy, and monitor data products without having to solve the same infrastructure problems from scratch every time.

The fourth principle is federated computational governance. In a decentralized model, you still need consistency across domains on things that matter organizationally: data standards, security policies, compliance requirements, interoperability conventions. Federated governance means establishing those standards centrally while leaving the implementation of data products to the domains. It's the difference between mandating that all data products meet certain quality standards and mandating that all data products be built by a central team.

Whether data mesh is the right approach for a given organization depends heavily on scale and organizational structure. For a small organization with a small data team and a manageable number of data sources, a centralized approach works fine and data mesh would add unnecessary overhead. The organizational model data mesh proposes makes sense when the centralized model has demonstrably broken down, which typically happens at a scale where the central team genuinely cannot keep up with the demand placed on it and where domain teams have enough technical capability to take on data ownership responsibly.

It's also worth being clear about what data mesh is not. It's not a specific technology stack. It's not a product you can buy. And it's not a quick fix for data quality or delivery problems that have organizational rather than technical causes. Organizations that adopt the data mesh label without the underlying organizational changes, without genuine domain ownership, without real accountability for data product quality, without a self-serve platform that makes decentralization feasible, tend to find that they've added complexity without solving the underlying problem.

The concept is valuable as a diagnosis and as a set of principles, even for organizations that don't fully adopt it. The questions it asks, who owns this data, who is accountable for its quality, how do consumers discover and trust it, are good questions regardless of the architectural answer you arrive at.