Five Key Elements Your Data Governance Business Glossary May Be Missing
The average data definition doesn't have all the answers to users' questions. To get more information, add one of these five key elements to your organization's data governance business glossary.
- By Jake Dolezal
- February 16, 2016
For data governance programs, the business glossary is a sacred text that represents long hours of hard work and collaboration between data stewards and subject matter experts to develop and refine. Often much attention is given to the accuracy of business data terminology, and rightfully so. You have also likely worked painstakingly on maintaining the consistency of your glossary and its form and formatting. When it comes to the content of your business glossary, however, accuracy and consistency are only two-thirds of the equation. The other third: are your definitions complete?
Today's data-savvy business users demand a greater understanding of the data they use. For example, business analysts may use an enterprise data warehouse, which is stocked full of derived attributes, conformed dimensions, and aggregated measures that barely resemble the original source data they are analyzing. Therefore, they must spend precious time trying to understand the data when they could be doing their analysis.
Most business glossaries follow a dictionary style in their form:
- Word: "This is a business term"
- Etymology: "This is the system, department, group, or team who defined it"
- Part of speech: "This is how and where it is used"
- Definition: "This is one or two lines of what it means"
- Synonyms: "These are any aliases, nicknames, or abbreviations by which it is known"
Unfortunately, this often falls short of answering the full gambit of business users' questions about the data they are working with and trying to understand. Many users need more information to reach a full understanding of the data they analyze and report on.
To bridge the understanding gap left by the average data definition, here are five key elements that may be missing from your organization's data governance business glossary:
1. A one-liner: Depending on how verbose your current definitions are, you may want to consider having an additional definition -- a one-liner in addition to the full narrative. A one-liner is a concise response to a question such as, "What does 'Customer Type' mean?" It's similar to the elevator pitch in that a businessperson simply needs a quick clarification of what is meant by a single term. This is separate from the full narrative of your business glossary definition, and the remaining four elements below are considerations to add to your full business term entries.
2. A real-life example: Humans gain understanding of an unclear meaning when given a concrete example to which they can easily relate. Create a new, short (two- or three-sentence) paragraph that starts with the phrase "For example." Give a classic or contemporary example of the data in use in a real-life scenario within your business. Think of the lowest common denominator across your entire user base, and start with that. There is no need to offer an overly complex example.
3. A list of all possible valid values: Another way to drive home the meaning of a data element is to list the known possible values of the data attribute. If you have a "Customer Type" attribute, maybe stating the possible values of "In person, phone, online" are probably sufficient for an analytics user to understand the meaning and boundaries of a term. A list of valid values is also known as an enumeration list. Feel free to add the value of your data profiling here by providing some basic statistics and frequency of these values within your system. Also, call out how new, unknown, blanks, and null values are handled.
4. The rules and logic of how the data element was derived: Break down the barriers to understanding derivations of data by providing the business rules and logic that were used to group, categorize, summarize, and transform data flowing out of source systems. This could be a translation table or decision tree. Some very data-savvy business users and analysts even would like to see the raw SQL logic embedded in ETL and ELT jobs and scripts.
5. The full ancestry of a data element: Another particularly useful component of a complete business glossary entry is a full ancestry of a data element in terms of source-to-target, life cycle, relationships, and dependencies. Provide your analysts with a fill data lineage from creation with the source to consumption by BI users. Think about a table that shows the system, database, table, and attribute each step of the way. Identify any derivations, aggregations, or other transformations as well. If your business glossary tool allows it, consider inserting a data flow diagram. As an added benefit for data governance, this is also an excellent resource for root-cause analysis and backtracking a data quality issue that data stewards would enjoy.
I challenge you to review the completeness of your business glossary's content. For a data governance program, business users of data are its primary customers, and what better way to serve them than with a fully robust business glossary. Adding any of the five elements discussed here to your data definitions will save your business users time and infuse good data usage practices throughout the organization.
Dr. Jake Dolezal is practice leader of Analytics in Action at McKnight Consulting Group Global Services, where he is responsible for helping clients build programs around data and analytics. You can contact the author at email@example.com