A Great Data Imbalance
There is no denying the significant business value associated with structured data, but what about unstructured data?
By Bill Inmon
Computers have been in commercial use since the early 1960s, and since that early beginning the world of technology has never stopped evolving. Like all participants in an evolution, the participants in this one -- the technologists -- are often too much a part of it to actually see it.
Consider the evolution occurring with data.
Since the earliest days of technology there has been data. The very first program operated on data, and today there are whole mature professions dedicated to the technology of data. There are data modelers. There are programmers. There are database administrators. There are data administrators. Each of these participants in the evolution of data will tell you that their profession has an important perspective and role to play in the life of the enterprise, but nearly all are operating in a strangely myopic world.
The truth is that 80 to 90 percent of corporate data is untouched by these people and their activities. Ironically, as important as these professions are, they have been operating on a distinct minority of data. These professionals haven't even touched the main body of data in the corporation. The professions of data management and technology have sprung up around the study and management of structured data, but structured data is only a small part of what exists in an enterprise -- a rather small part at that.
Structured data is predictable and well behaved. You can place it in a standard database management system. In structured data, there are records, and in these records are neatly defined fields and keys and indexes and metadata. The same structure of data is repeated in multiple records over and over, typically holding data generated by a business transaction. Database administrators, data administrators, and data modelers can tell you all about this kind of data.
There is no denying the significant business value associated with structured data. Indeed, the whole industry of business intelligence is dedicated to the proposition that structured data is important. For years, major and important corporate decisions have been based on analysis of structured data.
So what about unstructured data?
If enterprises have existed all these years without basing decisions on unstructured data, is it possible that the unstructured data of the enterprise simply is not very important? The answer is that from a business value standpoint a wealth of important data is locked up in unstructured data. Consider the unstructured data found in:
- Corporate contracts. How many executives know how many contracts of what kind are in effect? The truth is that nearly every executive has corporate contracts and has no idea what is contained in those contracts
- Call center information. About the most enterprises know is how many calls they get per day and how long the calls last, but they have no idea about the content of those calls. Isn't it important to know what customers are saying?
- Medical records. Much of the data found in medical records is in narrative form, including EMRs. This information has to be manually read in order to be analyzed and digested. Wouldn't it be useful to be able to automatically scan and analyze medical records rather than manually?
This is just the tip of the tip of the iceberg.
There is no question about the tremendous value of information contained in unstructured format. The reason unstructured information has not been tapped is not because there isn't business value there. Instead, it has not been tapped because the world of computers is optimized to handle structured data.
But in an evolutionary fashion, that is all changing.
Bill Inmon has written 54 books published in 9 languages. Bill's company -- Forest Rim Technology -- reads textual narrative and disambiguates the text and places the output in a standard data base. Once in the standard data base, the text can be analyzed using standard analytical tools such as Tableau, Qlikview, Concurrent Technologies, SAS, and many more analytical technologies.