Skip to main content
00 Days
00 Hrs
00 Min
00 Sec

Master Data Management: The Problem of Having One Version of the Truth

Ask a large organization how many customers it has and you'll often get different answers depending on who you ask and which system they're looking at. The CRM says one number. The billing system says another. The data warehouse says a third. Each system is confident in its own answer. None of them agree.

This is not a hypothetical problem. It's one of the most common and most expensive data quality issues in enterprise organizations, and it has a name: the lack of a single version of the truth for master data.

Master data management, MDM, is the discipline built around solving it.

Master data refers to the core entities that an organization's business processes revolve around. Customers. Products. Suppliers. Employees. Locations. These are the nouns of the business, the things that transactions happen to and between. Unlike transactional data, which records events, master data describes the participants in those events. And unlike analytical data, which is typically read-heavy and historical, master data is actively used across operational systems in real time.

The problem arises because most organizations have accumulated many systems over time, each of which maintains its own version of these core entities. A customer might exist in a CRM, an ERP, an e-commerce platform, a support ticketing system, and a billing system simultaneously. Each system created its own record, possibly at a different time, possibly with different information, almost certainly with a different internal identifier. Over time, those records diverge. The customer's address gets updated in one system but not the others. Their name is spelled differently across systems. One system has them as two separate records because they called in once with a slightly different name.

The downstream consequences are significant. Reporting that joins data across systems produces inconsistent results because the same customer looks like different customers depending on which identifier you use. Regulatory reporting becomes difficult when you can't definitively say how many unique customers or counterparties you have. Customer experience suffers when different parts of the organization have different information about the same person. AI and analytics initiatives built on top of this fragmented data inherit all of its inconsistencies.

MDM addresses this by establishing a system of record for master data, a single authoritative source that other systems either feed into or synchronize with. The process involves identifying which records across systems refer to the same real-world entity, a step called entity resolution or deduplication, reconciling the differences between those records, and maintaining the resulting golden record as the authoritative version going forward.

Entity resolution is harder than it sounds. Records that refer to the same customer may have different names, different addresses, different phone numbers, and different internal IDs. Determining that they're the same person requires probabilistic matching, comparing fields across records and calculating a confidence score that two records refer to the same entity. Rules-based matching catches obvious duplicates. Machine learning-based matching handles the ambiguous cases that rule sets can't anticipate. Neither approach is perfect, and both require human review for cases that fall below a confidence threshold.

MDM implementations also have to address the question of architecture. A registry-style MDM system maintains a central index of golden records and their relationships to source system records without replacing the source systems themselves. A consolidation-style system pulls master data from source systems into a central hub and manages it there. A coexistence-style system maintains the golden record centrally while also synchronizing updates back to source systems. Each approach has different implications for how much the source systems need to change and how tightly the MDM system integrates with the rest of the architecture.

The organizational dimension of MDM is as important as the technical one. Maintaining a single version of the truth requires agreeing on what the truth is, which means agreeing on definitions, ownership, and processes for resolving conflicts when systems disagree. That agreement has to happen between business stakeholders, not just technical teams. Who owns the customer record? Which system is authoritative for a product's description? What happens when the CRM and the ERP have different addresses for the same supplier? These are governance questions, and MDM without governance behind it tends to produce a technically sophisticated system that nobody trusts or maintains.

MDM is not a project with an end date. It's an ongoing capability. Master data changes constantly as customers move, products are updated, suppliers are acquired, and new systems are introduced. The discipline is in maintaining the golden record accurately over time, not just cleaning it up once.