A Brief History of Data Modelling
What data models through the years and a gathering of cousins have in common.
- By Bill Inmon
- December 1, 2015
In the very earliest days of technology there was a saying: "We are talking with the users about the new system. Bring your coding pads."
The industry quickly learned that there was a lot more to the building of systems than the creation of hastily conceived code. One of the lessons learned was that in order to coordinate systems it was necessary to understand data at an abstract level. Thus born was the data model.
Peter Chen introduced the ERD -- the entity relationship diagram. The ERD allowed data to be addressed in a high-level perspective. To many people, the ERD was the first attempt to understand data in an abstract manner. The ERD became the basis of the design for operational systems and the data warehouse.
Then with data warehouses came data marts. After data marts on the scene came Ralph Kimball and the dimensional model that allowed organizations to rapidly build and rebuild data marts that were customized to the needs of a particular group of people, typically departments. With data marts, organizations could rapidly build reports that addressed the immediate needs of the organization.
Next came the data vault. Promulgated by Dan Linstedt and Hans Hultgren, the data vault approach extended the reach of the ERD model into the world of detailed integrity and lineage of data. Whole new applications could be built with the data vault.
Then came unstructured, textual data with properties not found in other forms of data. The other forms of data modelling just did not apply to text and unstructured data. A different form of data modelling was needed. That form of data modelling is called taxonomies and ontologies.
In truth, taxonomical classification of data has been around for a long time. History tells us that Aristotle and the Greek philosophers spent a lot of time studying taxonomies and ontologies. Attribution of taxonomical data modelling is far different from the attribution of the ERD model, the dimensional model, and the data vault. Those forms of data modelling occurred in modern times while taxonomies and ontologies reach across millennia and into civilizations unknown today.
Taken together, the different forms of data modelling have a great number of similarities among them. They are like a gathering of cousins. If you have ever sat and watched cousins play as children, you undoubtedly noticed their familial similarity. The cousins all have facial similarities, hair similarities, skin similarities, body build similarities, and so forth. That is because the cousins all have come from a closely related gene pool.
However, although there are similarities among the cousins, there are also distinctive differences. Each individual cousin has characteristics that belong only to the individual. For example, one distinctive characteristic of taxonomies is that the data on which the model is built cannot be changed. Stated differently, with an ERD, a dimensional model, or the data vault, if you find something wrong with the data, you go and change the data. With the taxonomy, if you find something wrong with the data, you go and change the taxonomy, not the data.
Undoubtedly the world of data modelling will continue to evolve. It is doubtful if taxonomies and ontologies are the last of the evolution of the data model.
Bill Inmon has written 54 books published in 9 languages. Bill’s company -- Forest Rim Technology -- reads textual narrative and disambiguates the text and places the output in a standard data base. Once in the standard data base, the text can be analyzed using standard analytical tools such as Tableau, Qlikview, Concurrent Technologies, SAS, and many more analytical technologies. His latest book is Data Lake Architecture.