RESEARCH & RESOURCES

Mastering Reference Data

Master data and reference data are not exactly the same. Understand and manage the difference and you can improve planning, business processes, reporting, and service delivery.

By Robert Rowe, Senior marketing manager for MDM, Software AG

When it comes to master data management (MDM) and reference data management (RDM), there remains disagreement about the distinction between master and reference data. One expert says that reference data is master data (or at least a subset of master data). Another says that reference and master data are two different things.

Listening to these expert discussions can induce a kind of deer-in-the-headlights paralysis among those responsible for enterprise-wide data management. This is problematic because instinctively we know that we have to manage this data regardless of what we call it.

The real difficulty is that we have to manage master and reference data differently, to some degree, and that gives rise to other questions that are hard to answer: If you've deployed tools to manage and master your master data, have you implicitly brought in the tools to also manage and master your reference data?

Let's step back and consider a practical distinction between master data and reference data, one that people who do not spend their lives doing research and analysis on such matters can use to improve operations within their organizations.

Working Definitions

Let's start with master data. What is it? Master data documents and describes the data that is part of your organization's core business processes. It's data whose definitions are defined within your organization and are not necessarily recognized by anyone outside your organization. One set of master data elements might be related to your customers: a customer number, a contact name and address, phone numbers, and shipping addresses, for example. Another set of master data elements might cover your products, and for any given product there may be a set of master data elements that identify component parts, each of which would have a unique part number. Other product attributes that would be mastered include the product description, packaging information, engineering documents, and images.

Reference data is different. In many (if not most) instances, reference data is created and defined externally by a governmental or independent agency or the International Organization for Standardization (ISO). Reference data elements have meaning and significance that is shared among many users, organizations, and companies. A ZIP code, for example, should be viewed as a reference data element. ZIP codes are defined by the postal service, and any organization that sends a package to Reston, Virginia, will use the same ZIP code -- 20190.

From this one example, you may begin to see a wide range of data elements in use throughout your enterprise that should be viewed as reference data elements: medical procedure and billing codes, time zones, industry codes (SIC and NAIC), airline flight schedules, currency codes, currency exchange rates, and much more. These data elements may be intimately tied to master data, but they are reference data elements and not so frequently changed. We obtain and use them differently, and certain aspects of them are managed differently.

Distinct Management Challenges

There are some similarities in the mastering and management of master and reference data, and there are also differences. For example, both require the application of security and data governance. Only specifically designated people should be authorized to contribute to, change, or delete this data. However, data cleansing operations, such as removing duplications and merging records, is really focused on master data, not reference data, and mainly for customer and product domains. Let's investigate some other differences in managing this data.

Data Creation or Sourcing

Fundamentally, creating or sourcing reference data is easy. Much of it is created and maintained by an external agency, and you can obtain it directly from its source. That's rarely even a manual process today; you can rely on a data feed from the agency that owns the data.

Thus, you could use an ISO exchange rate code in your accounts receivable system, for example, instead of a specific exchange rate. The ISO exchange rate code would always refer to the actual value of the exchange rate prevailing in the market at that moment. By deriving the exchange rate from the ISO exchange rate code, you eliminate the need to enter a new exchange rate every time the rate changes -- but your reference data for exchange rates is always up to date.

The ability to rely on an externally maintained reference data element, however, poses its own challenges. If a new currency is introduced, for example, or if one member of the Eurozone decides to abandon the Euro and return to a sovereign currency, the owning organization will update the exchange rate codes themselves. If you are relying on ISO currency codes and exchange rate codes, you'll need to version the ones in use for auditing purposes and then publish the updates to subscribing systems.

You'll want to be able to master and manage this updated reference data in a carefully planned and coordinated manner. A workflow process is critical, as is an approval process that considers your organization's policies about data governance Should subscribing systems fall out of sync, there is a real danger that the organization will encounter inconsistencies that could affect everything from business planning and purchasing to manufacturing, financial reporting, and auditing.

A mastering tool that can version reference data is key. Once the previous data is versioned, the updated data can then be provided to subscribing systems simultaneously to ensure that the same version is being used throughout the enterprise. Additionally, storing previous versions assists with compliance reporting because the actual version used during a given reporting period can be easily produced.

In addition to the externally sourced reference data we have so far described, organizations may also produce their own reference data elements, such as a set of mapping codes that references a supplier's part numbers and maps them to your internal part numbers. You might use the mapping codes to map the part numbers from several vendors to your own product codes so that parts from alternate vendors could be substituted if one vendor cannot deliver the quantity needed for a production cycle. Even though these mapping codes are created internally, they act as reference data elements in this scenario and must be managed with the same attention to governance, versioning, and subscription synchronization discussed previously with regard to externally generated reference data elements.

Data creation or sourcing for master data is handled differently. In mergers and acquisitions, many times data is acquired from legacy systems, cleansed, and mastered in the centralized MDM hub. New data may be created directly in the MDM system because it provides a collaborative environment for each department to contribute their respective part of the data, it provides a user-friendly authoring interface, and because it'll end up there anyway. The MDM system then becomes the single source of truth, containing that "golden record" that applications and business processes all access.

Data Distribution

Reference data can also reside in an MDM or RDM system to be accessed there via pointers or "look ups" from other systems. That makes it easy to ensure that these systems are using the same reference data without the need to publish to subscribing systems. If the data element is updated, all systems see the updated information immediately.

This is not the only way to enable the consistent use of reference data. Some departments may want a local copy of the reference data. Others may be using legacy systems that cannot perform external lookups easily. To support these systems, your RDM system must be able to push mastered reference data out to the systems that will use them. As noted earlier, a critical aspect of this approach is that the dissemination of updated reference data must take place in a coordinated and synchronized manner so that all systems have access to the same release of reference data at the same time.

Whether you use a publish-and-subscribe approach or a lookup approach really depends on the architecture of your organization and the needs of the groups and systems involved. Both approaches are viable as long as the approach you choose is undertaken in a well-considered and mindful manner.

For centralized master data architectures, the data in the repository is accessed through Web services, JDBC, or other means by the applications and processes that need the data at runtime. It is equally important that the MDM system be able to export data easily for downstream systems such as data warehousing and business intelligence systems. High-quality, cleansed data provides the basis for better forecasting and business decisions.

Developing a Master Plan

All this brings us back to consideration of the systems you use to master your data. You need a tool that can ensure the consistent use of master data throughout the enterprise -- and that's exactly what MDM tools are designed to do. You also need tools that can enable you to acquire, master, version, and distribute your reference data because it, too, must be used consistently throughout your enterprise.

Multi-domain MDM solutions typically are more oriented toward mastering reference data. This is because - single domain, or template-driven MDM solutions, may not provide the necessary flexibility for mastering comprehensive, internal reference data with all its potential domain overlaps.

That said, with the right MDM tools you can, in fact, manage both master data and reference data successfully. The tools should be able to ensure oversight and governance of master and reference data. For reference data, these tools also need to provide features for version management and auditing. It could be argued that any good MDM system should have these as well because the versioning of hierarchies is not uncommon, nor are regulatory compliance and auditing needs for master data.

Ultimately, well-managed reference data is a key contributor to operational efficiency and clarity of business insight. Any organization that expects accurate information from diverse organizational units and data sources needs reference data that has been mastered and managed for consistency across all the units and all the systems in use. Without well-mastered and well-managed reference data, there is real potential for reporting errors that can affect outcomes at every level, from the manufacturing floor to the board room. Choose your data mastering tool(s) wisely to ensure that they will suit your needs now and the future.

Robert Rowe is senior marketing manager for MDM at Software AG where he is responsible for master data management product marketing. Rowe holds over 20 years of experience in the software industry where he has held positions in marketing, IT, engineering, training, and account management. You can contact the author at robert.rowe@softwareag.com.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.