5 Ways Comparisons Can Transform Data into Insight
Use these five comparison techniques to help your audience quickly grasp the important points in your visualizations.
- By Dan Gastineau, Andrew Roman Wells
- November 16, 2018
A number by itself is meaningful, but it does not tell the whole story. The number can tell you something about the world but not necessarily why it should matter. Is the number good or bad? Should it have been more or less? To answer those questions you need something to compare against. A number's value relative to something else is what makes it interesting.
For example, a trip to the moon is 240K miles. Sounds far away. If we tell you a trip to the moon is equivalent to 110 flights between Atlanta and L.A., then that certainly sounds far. If you're hitching a ride with Elon Musk to Mars, he might point out as you pass the moon that you're less than 1 percent of the way there. Now the distance to the moon seems trivial.
It's all about context, and comparisons provide a number with context. Comparisons transform raw data from something abstract to something relatable. They enable us to judge the value and relevance of a number. Data gives information about the world, but it needs a comparison framework to be useful. For the practitioner of data analytics and visualization, there are five key ways to get the most out of comparisons.
1. Know Your Objective
Which comparison should you use? It's simple: it depends!
From the perspective of a business, there may be a variety of goals, and each goal determines what type of comparison is needed.
If the company has promised investors year-over-year growth of 5 percent, then actual sales are much less important than the comparison of this year's sales to last year's. If your goal is to reduce the time to fulfill orders to three days, then how order time compares to that three-day target is the most important metric.
Knowing the objective tells you what type of comparison you should use. Once you know what to compare against, you are better positioned to make impactful decisions and generate value from your data.
2. Compare Apples to Apples
A great comparison enables you to judge the value of a number in isolation and allows members of a group to be compared with each other.
Here's an example of a comparison that at first seems like an insight but doesn't actually tell you much.
According to this chart, the East region appears to be performing the best. However, given that it's unlikely that the potential market size for each region is exactly the same, this doesn't reveal valuable insight about relative performance. Now let's reframe it with a better comparison.
With a metric such as market share that controls for different market conditions, the story has flipped to show the South as the top performer.
Market share is an effective comparison because it controls for the unique competitive landscape of each region while providing a common basis for comparison across multiple regions. Not only does it say something interesting about a region's performance in isolation -- it also ranks performance across regions.
3. Control for Irrelevant Sources of Variation
Variation in data is often caused by things outside your control -- market conditions or weather, for example. Instead of letting irrelevant factors distract from your overall message, find comparisons that remove the noise and focus only on the data you can do something about.
One example of this is illustrated in the chart below.
At first glance you may note the dramatic increase in sales during the summer months and again in November. With only this view you might think no action is needed in those months. However, the shape of the trend suggests seasonality could also be a factor. To understand what's truly going on, we need to limit the data to what's in our control.
A simple way to do that here is to look at year-over-year percent change in sales.
Instead of a nice sales gain during the summer months, the year-over-year change tells a different story. After controlling for seasonality, it appears sales are actually getting worse in the summer months and action may be needed to fix the issue.
4. Tell an Honest Story by Providing Appropriate Context
Mark Twain noted that there are three types of lies -- lies, damned lies, and statistics. No doubt the same could be said of data analytics and visualization.
Whether by intent or oversight, a visualization can easily misrepresent the truth by leaving out important context. The conclusion of an analysis can sometimes completely change by simply leaving out certain details, even when using an appropriate comparison metric. This can happen in a number of ways, but one of the most common is in time series visuals. When looking for trends, where you start and stop it can have a big impact on the shape of the trend line.
Let's look at an example of a trend line that at first glance shows a rosy picture but fails to tell the whole story.
Customer complaints appear to be trending in a good direction. However, adding years on each side of the chart tells a different story:
In this case, it wasn't the metric giving a misleading view but the lack of context. Simply adding a few years on either end of the data changes the story from something clearly good to something more ambiguous. It's sometimes tempting to force clarity on a data set by narrowing the context so much that it no longer tells an honest story. A better approach is to search for a different angle that allows for more transparency. Otherwise, accept that the data doesn't have a conclusive finding.
5. Determine Exception Thresholds
Meaningful comparisons can greatly increase the value of your data, but if you really want to maximize its value, figure out at what point it becomes exceptional enough that you need to take action.
You've probably seen the "Christmas tree chart" where every positive or negative gets a bold green or red. Although it's certainly helpful to encode meaning into a visual using color, the problem with giving everything a color is that it does nothing to draw the viewer's eyes to what's most important. A better approach is to determine a minimum or maximum threshold and highlight only the elements that exceed these limits. Doing so enables the viewers to immediately focus on the biggest outliers and more quickly turn their attention to deciding what type of action is needed (if any).
The two examples of product sales performance below illustrate the point.
In the left-hand chart, viewers must scan the whole list to compare each product. They must also determine which bars are big enough to be of interest. The cognitive load on the viewer is quite high with such a chart, particularly as the number of rows increases.
The chart on the right facilitates comprehension with a simple rank order and bold colors highlighting only the products exceeding your threshold. With a quick glance, the viewer can quickly decide what to do about Product F.
A vast array of available tools can help you transform data of varying complexity into insight. There are no doubt many useful applications for some of the highly complicated machine-learning concepts and other data science models, but often basic concepts such as comparative analytics will be more than sufficient to give the insight needed to make better decisions.
Making comparisons is a simple way to bring clarity to your data that doesn't require significant time or expertise to execute. If you know your objective, make apples-to-apples comparisons, remove irrelevant noise, provide the right level of context, and determine exception thresholds, you will be well on your way to getting significant value from your data.