Working Together to Build Your Data Model
Building a data model reveals insight into your organization's data for users and IT alike, but it won't happen unless both groups work as a team.
- By Mike Schiff
- October 21, 2009
Imagine trying to build your dream house without first creating a blueprint. The resulting structure would likely be an architectural nightmare with rooms that don’t fit together and electrical outlets, plumbing fixtures, and heating and air-conditioning vents that are don’t work or are not where you need them.
Similarly, a successful data warehousing effort should begin with the creation of a data model of the underlying data. In general (and certainly oversimplified!), data modeling is a multi-step process that involves identifying the entities for the subject area; discovering the relationships among the entities; determining the attributes required by the business and associated with the entities; identifying primary, secondary, and foreign keys; and ultimately translating entities, attributes, and relationships into tables, fields, and constraints to design the physical database.
However, it is important to recognize that data modeling is not merely a tool for facilitating the design of the physical database schema. Perhaps more important, it is a powerful vehicle for helping the user community better understand the underlying assumptions, business rules, and definitions behind their organization’s data.
For example, when modeling data relative to the subject area of supply chain and procurement, you'll quickly recognize the many-to-many relationship that exists between vendors and parts. That is, one vendor can supply several different parts and one part can be supplied by several different vendors.
Insight into Intersections
From an IT perspective this is resolved with an “intersection entity” and a resulting physical database table with a combined key of vendor number and part number. However from a user perspective, it also serves to call attention to the fact that a “quality rating code” is not necessarily an attribute of a vendor or a part. Rather it should be one of the attributes of the vendor/part intersection entity since a vendor will likely receive different quality ratings for different parts and the same part may have different quality ratings depending on which vendor supplied it.
Although many data modeling projects, especially those involving a new project team, start off with unfocused goals -- it's akin to attempting to discover how much metadata can fit on the head of a pin -- once the preliminaries are out of the way, these efforts can lead to many insights that may significantly benefit the organization.
Apples to Apples
For example, I have been involved in efforts that led to the discovery that the metrics being used to compare the performance of organizational units were defined differently for each of the units being compared. One metric involved measuring how long a service call took to complete. In one unit this was measured from the time the customer first called in the request for service, but in another unit the clock did not start until the service technician arrived at the customer's site. If an organization does not define its data consistently, it may find itself building a Tower of Babel rather than a data warehouse!
When building a data model, I have found it useful to involve a cross-section of the user organization rather than to place too much reliance on the department manager, who, even if he or she came up through the ranks, may not have an accurate view of the details and data needs relevant to how things are currently done. Having both managers and front-line workers in the data modeling sessions will usually lead to better identification and definitions of the relevant entities, attributes, and relationships that must be understood to create the data foundation for a successful data warehouse implementation.
Data models -- don’t attempt to build your data warehouse, or interpret your results, without one!
Michael A. Schiff is founder and principal analyst of MAS Strategies, which specializes in formulating effective data warehousing strategies. With more than four decades of industry experience as a developer, user, consultant, vendor, and industry analyst, Mike is an expert in developing, marketing, and implementing solutions that transform operational data into useful decision-enabling information.
His prior experience as an IT director and systems and programming manager provide him with a thorough understanding of the technical, business, and political issues that must be addressed for any successful implementation. With Bachelor and Master of Science degrees from MIT's Sloan School of Management and as a certified financial planner, Mike can address both the technical and financial aspects of data warehousing and business intelligence.