The Language of Data Visualization
Data visualization helps you communicate analytics results in pictures. It's become a language of images. We explore the elements and rules of the data visualization language so you can speak it better.
- By Dave Wells
- January 20, 2015
[Editor's note: Dave Wells is teaching a brand new course on data visualization fundamentals at the TDWI Conference in Las Vegas (February 22-27, 2015). In this article, he shares the basic elements of the data visualization language.]
Data visualization is at the heart of business analytics. It is the means by which we turn large amounts of complex data into understandable, insightful, and often compelling business communications. Data visualization has become a language -- the language of images -- that is on par with the language of words both written and spoken, and with the language of numbers and statistics.
The most talented business and data analysts communicate effectively with visualization. Unfortunately, most of us don't learn the skills of visual communication in school. We learn the language of words when taught reading and writing, and we learn the language of numbers in mathematics classes, but visual language is typically relegated to art classes, often perceived as "fluff" and a non-essential skill.
The language of words has components -- verbs, nouns, etc. -- and it has rules that we know as syntax and grammar. The language of numbers has components -- numerals and base systems such as decimal and hexadecimal -- and it has rules expressed as fundamental mathematical and statistical functions. It should not come as a surprise, then, that visual language also has components and rules. The components of visual communication include visual cues, coordinate systems, and scales that we combine to create graphs.
Visual cues include such things as:
- Placement: Where we locate things on a graph with attention to both position and proximity
- Lines: Length, angle, and direction of lines in a graph all directly influence visual perception
- Shapes: Area and volume of shapes in a graph (circles, squares, etc.) communicate relative size of the things they represent
- Color: Both hue (red, green, blue, and so on) and saturation (intensity) influence perceived importance of objects in a graph
Common coordinate systems include:
- Cartesian systems place data points on a two-dimensional grid defined by an X-axis and a Y-axis. Scatter plots and line graphs are examples of graphs using Cartesian coordinates.
- Polar systems place data points based on a circular system. The center of the circle is the base point or zero point. The value and characteristics of a data point determine placement as distance from the center point on the radius of the circle and angle from 0 to 360 degrees around the circumference of the circle. Radar graphs are a common example of polar coordinates.
- Geographic systems place data points on a geographic (two-dimensional) or geospatial (three-dimensional) map. Data-point values determine placement in a coordinate system based on latitude and longitude. Placement also includes altitude for three-dimensional plots. Heat maps are an example of using a geographic coordinate system.
Scales define the association of data point values with graphical coordinates. Typical scales include linear, logarithmic, time, ordinal, and categorical.
Linear scales use equal divisions for equal values. The distance between 10 and 20 is exactly the same as the distance from 20 to 30, which is identical to the distance from 30 to 40, and so on. Linear scales are typical for many line graphs and scatter plots.
Logarithmic scales are similar to linear scales with a single significant difference. Units on the scale are not of equal distance but are instead graduated based upon a multiplier. The Richter scale used to measure strength of earthquakes is a base-10 logarithmic scale. A magnitude 2 earthquake is ten times the strength of a magnitude 1 quake, magnitude 3 is ten times the strength of 2 (and therefore 100 times the strength of magnitude 1). Logarithmic scales make sense for common graph types such as line graphs and scatter plots where the range of values is exceptionally large.
Time scales are a special case of linear scales (and occasionally logarithmic scales) where the units are measures of elapsed time. Time-series visualizations are most frequently line graphs.
Ordinal scales are used when relative position of items is important but there is no mathematical basis to establish distance between units on the scale. The one-to-five star rating of products in consumer reviews uses an ordinal scale. Bar and column graphs are common ways to illustrate ordinal data.
Categorical scales, also known as nominal scales, count things in named categories. A graph showing units sold by product uses a categorical scale where the product names become the categories. Bar graphs, column graphs, and pie charts are normally based on categorical scales.
We combine these components -- visual cues, coordinate systems, and scales -- to create graphs for data visualization. What about the rules of visualization -- where is the syntax and grammar? Where rules in written and spoken language focus on choosing the right words and assembling them in the right way, rules in visual language focus on choosing the right graphs and assembling their parts in the right way. Just as in choosing words, your choice of graphs begins with knowing what you want to communicate. Among the most frequently visual communications are:
- Comparisons to see differences and similarities of values. Common graphing choices include bar graphs, column graphs, and line graphs.
- Proportions to see the relative contributions of several values to a whole. Pie charts, bubble charts, and stacked bar graphs are common choices.
- Relationships to see correlations and associations among values. Scatter plots, multi-set bar graphs, radar graphs, and stacked area graphs work well to visualize relationships.
- Whole-part relationships to see the set of values that contribute to a whole. This is a convergence of proportion and relationship visuals where pie charts, stacked bar graphs, and tree maps are commonly used.
- Distribution to see how data values are spread across a scale. A histogram is the most common distribution visual. Bubble graphs and pictograms are also used. With a time scale, timelines are a useful distribution visual.
This is but a brief introduction to the language of images. We communicate many more things than the few listed here and use many chart types beyond those that I've mentioned. Join me at TDWI in Las Vegas to attend the debut of the new TDWI Data Visualization Fundamentals course and to learn more about the language of data visualization.
Dave Wells is actively involved in information management, business management, and the intersection of the two. As a consultant, he provides strategic guidance for business intelligence, performance management, and business analytics programs. He is the founder and director of community development for the Business Analytics Collaborative. You can contact the author at [email protected].