TDWI Blog

TDWI Blog: Data 360

Blog archive

Rethinking the Data Dictionary

We know the internet has transformed the encyclopedia. It’s now transforming the dictionary. For those of us who create and maintain data dictionaries, the implications are worth considering.

Wikipedia unshackled the encyclopedia from its paper-based, expert-centric, retail publishing heritage. Now anyone can contribute an entry or suggest edits and anyone with a Web browser and internet connection can use Wikipedia free of charge. This free-spirited collaborative approach to gathering and synthesizing human information has created a resource that dwarfs any traditional encyclopedia. Today, more than 85,000 active contributors have worked on 14 million articles in 260 languages.

Erin McKean, a lexicographer, is now trying to do the same for dictionaries. A former editor of New Oxford American Dictionary, McKean is co-founder of a new online dictionary called Wordnik that leverages the internet to redefine what a dictionary is and how it works.

First, McKean doesn’t believe a dictionary should be the final arbiter of words but rather a collector. To that end, Wordnik encourages people to submit new words to the online dictionary or add new definitions to existing words. Second, and most importantly for our discussion, she believes words only have real meaning in context. Therefore, her dictionary not only publishes standard definitions (from traditional dictionaries), including synonyms and antonyms, but adds a host of contextual information to make the words come to life.

Contextual Information

For instance, when you type the word “sough” (meaning a soft murmuring sound) you currently see 50 examples of how the word is used in sentences that have appeared in books or articles. You can also hear it pronounced (courtesy of American Heritage dictionary), and you can read detailed etymologies of the word’s origins. You can also see words that aren't synonyms or antonyms but show up in the same sentence and provide valuable clues about its meaning. For example, words related to sough are whiporwill, washing tub, and grooving.

Beyond soft context, Wordnik provides quantitative data. You can see a bubble chart that shows how much a word has been used every year going back to 1800 as well as statistics about punctuation applied to the word. Wordnik also links to Flikr images and Tweets that contain the word so people can see how it is being used in modern day parlance. People also can tag the word or add personal comments.

This rich contextual information turns the dictionary from a sterile arbiter of meaning to a sensuous, multidimensional, exploration of culture and history through the vehicle of words. And perhaps best of all, it gets people excited about words.

Implications

So, what is the implication for our lowly data dictionaries?

I should point out that our data dictionaries have to be precise, even moreso than traditional dictionaries where a single word like “set” can have 33 definitions. Our terms can only have one precise meaning. They are the semantic gold-standard for our organizations, the ultimate arbiter of meaning and the basis for our shared vocabulary and language.

But does that mean our data dictionaries have to be dry, static, appendages to our BI environments? Of course not. In fact, if we take a cue from McKean, we can transform the data dictionary into an active agent of organizational and cultural knowledge, which is something that’s been missing from our BI and data governance programs.

Rethinking the Data Dictionary. Think about it. When we define a data element (metric or dimension), let’s show related data elements and link to their definition pages. Let’s encourage people to rate the element and comment on what they like or don’t like about it and how it can be improved. Data owners and stewards can moderate these online discussions and track the ratings. Let’s also encourage people (both business and IT) to suggest new elements to add to the dictionary and provide definitions and contextual information.

In addition, let’s display statistics of the number of reports in which the element appears and who uses those reports the most. And then let’s link directly from the dictionary to reports or dashboards that display the element so people can see how it’s used in context. The more context we provide to the definitions and descriptions in our data dictionaries, the more useful and used they will become.

Wikis. Now you might be wondering how to incorporate all this context into a lowly data dictionary. Let’s take a cue from Wikipedia and use a wiki. In fact, many BI teams are already experimenting with wikis to collect metadata, foster collaboration, and improve communications.

Sean van der Linden of AT&T Interactive delivered a presentation last year at a TDWI BI Executive Summit in which he described Wiki templates his BI teams uses to describe/define operational data sources, business processes, data elements, and reports and showed how they could be used to facilitate requirements gathering, project management, and governance processes.

As we all know, it’s a Herculean feat to create standard definitions for key data elements. But once you do, rather than publishing static descriptions of these data building blocks, consider following McKean’s example and create an interactive metadata environment that provides rich context and a collaborative environment to enhance communication.

Tell me what you think!

Posted by Wayne Eckerson on January 13, 2010


Comments

Average Rating

Add your Comment

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Rating:
 
Comment:
Please type the letters/numbers you see above