Data Augmentation and Enhancement: Thinking Outside the Organization
Organizations must recognize that their data warehouses can be populated for data sources from both inside and outside their enterprise.
- By Mike Schiff
- March 11, 2010
One of the major decisions behind any data warehouse implementation is the determination of its data content. Many organizations, especially those embarking on their first data warehouse project, concentrate on their own data without realizing that external data sources can also be used to populate the data warehouse.
Although an organization's databases contain data about its interactions with the business subjects it will analyze, to achieve a more complete view it may be desirable to augment this internal data with additional data from external sources. This is especially true when the data warehouse subject areas involve customers, prospects, or vendors.
There are many third-party sources of data. For example, governmental agencies, universities, and commercial companies collect and market demographic data. A wealth of consumer data -- including details ranging from ethnicity to preferences in magazines and automobiles -- are compiled from sources that include real estate deeds, marketing surveys, and warranty cards.
One source that almost everyone is aware of is the U.S Census Bureau which this year will collect data including occupant name, sex, date of birth, race, household relationship to the other occupants, whether the home is owned or rented, and if there is mortgage. Although data about individuals or individual households is not released, the census can provide a wealth of information, sometimes down to the geographic block level. This data could be used to identity geographic areas where a high percentage of home-owning senior citizens do not have mortgages.
Examples of External Data
Third-party data can be appended to customer or prospect records, enabling organizations to better segment or refine their target prospect audiences. Examples of available data for individuals and households include home value, number and age of children, income, education, purchasing behavior, religion, marital status, occupation, and ethnicity. Available information about businesses includes SIC codes, revenue, and number of employees. Other attributes such as telephone number, e-mail address, and geocode (of home or business) apply to both individuals and organizations.
It is safe to assume that data such as property assessment that once was only available by visiting individual town halls is now readily available online from the towns themselves as well as from commercial third-party data aggregators. It is also likely that some, if not most, of the personal data that consumers supply when filling out product warranty registration cards may have been sold to providers of marketing databases.
For instance, I recently purchased a small kitchen appliance and the warranty card (which I choose not to complete; the fine print stated "failure to return this card will not diminish your warranty rights") requested data including my date of birth, occupation, household income, the credit cards I use, my hobbies and interests, and if I donate to charitable causes.
Although the above examples pertain to consumer and company subject areas, data is also available to augment other subject areas including health care and diseases (subject, of course, to HIPAA privacy rules), geographic areas (e.g., the census data), and publicly traded stocks and bonds.
The Bottom Line
Organizations must recognize that their data warehouses can be populated for data sources from both inside and outside their enterprise. They need to think "outside the organization" to obtain external data that can be integrated with data from internal sources. When analyzing what data should be contained in their data warehouses, almost all organizations will benefit by asking their users one very simple question: "What additional data would be helpful to you that is not available from our own data sources?" and then determining if the data can be obtained from commercial or government sources.
By the way, think twice before filling out a product warranty registration card, especially questions that relate to personal information such as income. In most causes, furnishing such data is not a legal requirement for the warranty to be in force. Unless the vendor agrees not to share your data with anyone else, the data you supply may wind up being sold by third-party data providers.
Cool BI: 2010’s Emerging Innovations
March 25, 2010
Speaker: Cindi Howson
Data Integration for Data Warehousing and Data Migrations
March 29, 2010
Speaker: Philip Russom