From Data to Information to Actionable Insight (Part 1 of 2)
Data without context lacks meaning and purpose, but how is data different from information? Defining your terms is the first step to more insightful decisions.
- By Barry Devlin
- November 7, 2016
A powerful, upstart competitor has just turned up in your home territory with a killer offering of product, pricing plan, and services. Who among your customers are likely to consider moving? Who will actually jump? Are they profitable ones you'd like to hang on to? If so, how and when can you intervene? What is your likelihood of success?
Answering such pressing questions is vital to your business. Depending on whom you ask, you may be told that you need to be data driven or even information informed. You may be told you need to trawl through the social media content and online interactions of your customers. Knowledge, or even wisdom, may be suggested as vital.
You Need Actionable Insight
However, if you don't know a data lake from an information warehouse, or a content store from a knowledge base, you're starting on the back foot. Furthermore, if you cannot do anything effective to address this problem, you have wasted your time and money storing and analyzing your data.
What you need might be called actionable insight. Unfortunately, this phrase -- like every other in the field -- has been so (mis-)appropriated by marketing materials that it is in danger of becoming meaningless. Nonetheless, it captures the essence of what you need.
Actionable is, I hope, self-explanatory. Insight implies an accurate, intuitive, and in-depth understanding of a situation. Let's start there.
Defining Our Terms
BI and analytics vendors talk a lot about data, information, knowledge, and insight without defining their words. It's easy to assume you understand these terms without discussion; after all, this isn't academic jargon, you use these words every day, right?
Consider this: you have an information technology department, whose main concern is databases and data stores, run by a chief information officer, who may have recently appointed a chief data officer responsible for analytics of externally sourced social media information and Internet of Things data.
Given all that, what's the actual difference between data and information?
Many IT people see data as the lowest level of the DIKW (data-information-knowledge-wisdom) pyramid, which dates back to a 1989 paper (see Notes) by organizational theorist and consultant, Russell L. Ackoff. He proclaimed that "on average about forty percent of the human mind consists of data, thirty percent information, twenty percent knowledge, ten percent understanding, and virtually no wisdom." However, he offers no basis for these percentages. Even the levels identified and their order differ across interpretations.
Raw Data Versus Added Context
The conventional distinction between data and information is that data is the raw numbers or facts and that adding specific context to data gives information. Under this definition, data, without context, lacks meaning and purpose. Why would you collect a datum -- a number or fact -- if you didn't know what it represented? If information represents a combination of facts and their context, then information is the fundamental material we collect and digitize.
Data, as raw numbers or facts, is a convenient representation of information for computers: in relational databases, for example, numbers are stored in columns separately from their context-setting names, the column headers. In a NoSQL data store, however, this separation no longer holds, nor is it true in less structured forms of information, such as text documents or audio or video files.
Consider the churn example at the start of this article. ARPU (Average Revenue per User) is a typical, if basic, measure of a customer's value and is derived from internal, transactional systems. If it were data in terms of the above definition, it would simply be a number, such as 236. Is that dollars or euro, per month or annually? In fact, because this internally sourced data, the answers are obvious; the context of the operational systems provides them. This implicit context turns data directly into information. Calling this data rather than information is innocuous.
However, estimating the probability of churn is a different matter. Probability is actually data: a simple decimal number between 0 and 1. Calculating that number depends on complex algorithms that combine information about usage trends and social media comments by a customer as well as a customer's relationships with other customers.
You may know, for example, that this customer and her friends are describing your product as "sick" 20 times this week vs. 10 last week. This is data. Unless you know the demographics of this group (context), you cannot know whether using "sick" is positive, negative, or irrelevant here. In this case, distinguishing between data and information is vital. It is often so when you deal with externally sourced information.
Without context, the current obsession with collecting vast troves of big data in which to search for statistical correlations is at best aspirational; at worst, it promotes a myth that "contextless" data can offer answers to questions not yet asked nor even dreamed of.
Let's agree: it is information we're after. We need to be information informed.
In Part 2, we'll see how to move from information to insights, which are the basis for meaningful decision making and relevant action taking.
Ackoff, R. L., "From data to wisdom". Journal of Applied Systems Analysis. (1989), Volume 15, pp. 3-9