RESEARCH & RESOURCES

What is a Customer?

Data Objects Data Standards and Master Data Management

You can only get the most out of your data warehouse if you have a uniform understanding of your data. Such fields as "customer" must mean the same to all business units. We explain what data standards are and why they are key to your success.

I often pose a question to our clients: “What is a customer?” The typical response involves groans, eye-rolling, and heavy sighs, as most organizations have either endured endless meetings arguing over the precise meaning without ever reaching a conclusion, or have avoided even attempting to answer the question. The growing interest in centralized “source of truth” systems (i.e., master data management) forces the issue, though, since the component data sets that organizations want to consolidate and synchronize into a master data set have evolved within their own application areas over long periods of time. As a consequence, the semantics associated with those replicated data objects have also diverged.

The issue becomes more acute when organizations take a “data cleansing” approach to master data integration without considering the semantic implications. While data quality tools are important to the mechanical and operational processes of data merging, the absence of fundamental oversight of the entire consolidation process may scuttle the project’s success when data published back out to the contributing source applications no longer meets those applications’ needs. This happens when we inadvertently modify the semantics through the merging process, especially when the application’s end-clients have taken some shortcuts in their data management style.

Here is an example: in one customer database, the model allows for the storage of three addresses: a delivery address, a billing address, and an alternate contact address. The first two are populated with real addresses, but apparently the third address field was never used for addresses. By virtue of some business need, this third field became a repository for customer notes such as “Tax exempt,” “Pays in cash,” or other comments. In fact, closer inspection reveals similar notes embedded within the name fields as well. A straightforward name-and-address cleaning is likely to remove the extraneous data in the name fields and present the customer notes in the address fields as invalid addresses. In this case, cleansing the names and addresses for the master file will introduce flaws into the business process as that extraneous, yet meaningful, data is eliminated.

The issue here is not the mechanics of integration. Instead, it relates to the identification of relevant data “things” (i.e., objects, entities—choose your favorite term!) that are embedded within your data sets, whether they are defined explicitly or implicitly. Having identified those data objects (my favorite term), the goal is to determine the semantics assigned to those objects within each source application as a prelude to the integration process. And this is where data standards and data governance come in; these will help prevent the unending conflicts on data terms and their definitions.

A data standards process is an approach to synchronizing the various metadata aspects of shared or exchanged data objects. By formalizing the process of gaining consensus among the different participants, and enabling their active engagement in both defining and governing that process, we can evolve a collection of well-defined business terms, information object models, information exchange packages, and a means for mapping these shared object definitions into the models that are ingrained within the legacy environment. In addition, a data standards process can be used to help harmonize common business language terms and data elements to represent those terms as part of a master data management program.

What is a data standard? It is:

  • An agreement between parties on the definitions of common business terms and the ways those terms are named and represented in data;
  • A set of rules that may describe how data is stored, exchanged, formatted, or presented;
  • A set of policies and procedures for defining rules and reaching agreement.

But the essence of a standard is not just the consolidation of metadata, the formal description framework, or even the rules for definition and governance. Instead, it is the premise that:

  • The individual stakeholders desire to work together to develop a common language;
  • Participants are provided with an opportunity to participate in the process through proposing new standards, evaluating proposals, and providing comments;
  • Most importantly, the participants agree that the defined and agreed-to standard is the single set of guidelines to be used for data integration and sharing.

The main issues introduced as part of a data standards program involve establishing practices that may have been ignored during the original legacy applications’ design and implementation. This leads to the following challenges:

  • Absence of clarity for object semantics. Relying on the implied meanings associated with business terms may be fine when a system is self-contained, but as soon as there is a need to compare values between two or more environments, subtle differences in meanings become magnified.
  • Ambiguity in definition. The ambiguity is typically aligned along application, and subsequently, departmental lines. Exposing ambiguity will encourage individuals to promote their own semantics—to the exclusion of others. This plants the seeds for organizational conflict.
  • Lack of precision. People tend to be less than precise in standard conversations, because humans can derive understanding through context. However, in an imprecise environment, it is difficult to resolve measurements and metrics into a unified view.
  • Variance in source systems. Aside from the semantics issues, implementation decisions may create reliance on application frameworks, leading to religious wars (e.g., .NET versus J2EE; XML versus flat data).
  • Flexibility of storage and exchange mechanisms. The multiple modes by which data is exchanged can expose conflicts in metadata descriptions when trying to create a seamless means for integration. This may mean creating adapters that can transform data objects between formats, such as between records in flat files and XML documents.

As an example of one of these types of challenges, one question that might arise when attempting to integrate data values from systems with different constraints into a master system has to do with value types and sizes. For example, when merging product names, application A may use a fixed-size field, while application B may allow for variant-length names. When aggregating product data, it might be reasonable to assume the fixed-size format to ensure that no subsequently shared data exceeds the storage space allowed within application A. On the other hand, it might be reasonable to assume the variant-length size, to accommodate application B. The issue occurs when the master copy of the data is published back out—either approach introduces inconsistencies between the master table and the applications.

For the purposes of master data integration, the objective of a data standards approach is to identify all the objects and to reach consensus on their respective definitions, types, formats, and structures. While the approaches that different organizations take to reach consensus may differ, here are some suggestions for getting started:

  • A free-form brainstorming session where the terms used are identified, documented, and made available for discussion of a more precise definition.
  • A gap analysis to determine whether definitions already exist, and if not, what additional information is needed. At the same time, the terms that are discovered can be categorized within some logical hierarchy.
  • Researching the information needed based on the previous step to locate complete and acceptable metadata definitions and specifications, if they exist.
  • For each term, seek consensus by reviewing all the distinct definitions for each term, and either accepting a definition, revising a definition, or creating a new definition. As the participants reach agreement, the definitions are documented along with any supporting information that influenced the decision.
  • The definitions and supporting information are assembled in a metadata repository.

There is great value in establishing a governance framework for developing master data semantics, and when embarking on a master data integration project, it is wise to incorporate a data standards program into the process.

David Loshin is the president of Knowledge Integrity, Inc, a consulting and development company focusing on customized information management solutions.

About the Author

David Loshin, president of Knowledge Integrity, Inc, (www.knowledge-integrity.com), is a recognized thought leader and expert consultant in the areas of analytics, big data, data governance, data quality, master data management, and business intelligence. Along with consulting on numerous data management projects over the past 15 years, David is also a prolific author regarding business intelligence best practices, as the author of numerous books and papers on data management. David is a frequent invited speaker at conferences, web seminars, and sponsored web sites and channels and shares additional content at www.dataqualitybook.com


TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.