Q&A: Advanced Data Visualization: From Atomic Data to Big Data
A look at the current issues surrounding advanced visualization, from big data to who needs more powerful features.
- By James E. Powell
- July 1, 2014
[Editor’s note: Andrew Cardno is leading (with Stephen Brobst) a session about data visualization at the TDWI World Conference in Boston (July 20-25, 2014). Overcoming Information Overload with Best Practices in Data Visualization examines pitfalls in and best practices for BI visualization. In this question and answer session, we asked Andrew about key data visualization themes today’s enterprises are facing.]
BI This Week: How can the history of data visualization show anything relevant, given that with the advent of computerization, everything is more advanced?
Andrew Cardno: Many of the greatest visualizations were produced in the 1800s and 1900s. During this time, although the volume produced was low, the visualizations were sometimes of extraordinary quality. Consider the example of the 1861 thematic map of Napoleon’s ill-fated march on Moscow drawn by Charles Joseph Minard. Yale professor Edward Tufte, a recognized pioneer in data visualization, commented, “It may well be the best statistical graphic ever drawn." [See note 1]
Counter-intuitively, the advent of computerization has resulted in a flood of poor quality visualization techniques. I would argue that the history is extremely important to bring back some great work from the past without reinventing the wheel and allowing us to apply computerization to create visualization of the highest value.
What is the difference between traditional and advanced data visualization?
Traditional data visualization includes visual methods such as line graphs, bar graphs, and box plots. There are a solid set of practices that have developed around the use of traditional data visualizations and how they can be applied effectively. This application is an extremely important part of the data visualization world and they should be included in almost any analytics project.
The combination of the massive amount of detailed data and very powerful graphics now standard on most devices has resulted in a massive growth in the types of graphics available. Not all of these graphics are effective, but many are dramatic. Advanced data visualizations are the effective new graphics that have many times the data density or complexity of traditional data visualizations.
At what data volumes are advanced visualizations a better choice for displaying analytics?
Advanced data visualization can effectively show data volumes measured in thousands and tens of thousands of data points. Traditional data visualization typically shows data volumes measured in tens to hundreds of data points. Consider the canonical bar graph. It would be quite effective to show a bar graph with 25 bars (or 25 data points), with an advanced data visualization pivotal graphic we can comfortably display 2,500 data points. Furthermore, if this advanced pivotal visualization is animated over 10 months, 25,000 data points can be shown.
When looking at when to apply advanced data visualization methods, a key consideration is the nature of the analytical question. If the analytical question is about known patterns in the data (for example, show me the breakdown of revenue by customer segment), then a traditional method such as a bar graph is appropriate.
If the analytical question is about unknown patterns in the data (for example, discover the patterns in the individual customer behavioral changes over the same year), then an advanced data visualization should be considered. Generally speaking, advanced data visualization is aimed at finding the right questions to ask and traditional data visualization is aimed at answering or illustrating specific answers.
What is the relationship between predictive modeling and data visualization?
There are many places where modeling and data visualization work together effectively, in his 1973 article in American Statistician, Anscombe states that “Graphs are essential to a good statistical analysis.” In his article, Anscombe describes how we should always take the time to look at graphs and apply models; both contribute to our understanding. In an active business, we are often experimenting with different ideas; executed correctly, we are applying the scientific method to experiments that test these ideas in the real world.
Consider the example of experimenting on groups of customers with different marketing offers. The art of developing new ideas or direction requires creativity, which is driven by an understanding of the data and insight into patterns observed by the marketer. Data visualization has a powerful role to play in providing the marketer with the insights and understanding to drive this creative process.
After the test- and control-based marketing program is executed, the hypothesis is tested and the success of the experiment is a matter of statistics. Now the results of the test can feed into predictive models to model similar programs or to automate the execution of the initiatives.
What is atomic-level data and why is it important in analytics?
Atomic-level data refers to data where the rows of the data are not aggregated in any way. For example, a daily summary of revenue data by ZIP code is aggregated data and not atomic. Atomic data is extremely important in analytics because it enables grouping data in any way to answer a question.
In the world of data visualization, the user is often able to explore vastly different aspects of the data at will. To achieve this goal, it is critically important to allow the user to query the data freely without constraints imposed when using non-atomic data. In the world of advanced data visualization, the user may be displaying hundreds of thousands of data points on a single graphic, drawing on the depths of the atomic-level data.
The definition of atomic-level data is constantly evolving. Consider the example of Web analytics data. It has moved from tap (or click) events to gesture movements. This pattern of the definition of atomic moving onto the next, more-detailed level of interaction and the value that comes from this detailed data is moving the definition of atomic to finer and finer levels of data.
Do operational business people need data visualization? Can't they make their decisions based on a few key performance indicators and alerts?
In our hypercompetitive business environment, nearly all decisions have become data driven and competitive. In a competitive environment, the decision-makers with the better understanding, insight, and judgment are going to be the winners. Take the thought process of comparing two operational business people with similar experience and judgment. The operator with the better understanding and insight is going to consistently win in the competitive business landscape.
Consider the example of supply chain managers at retailers. They live in a competitive environment where the unexpected happens continuously. Inventory levels adjust continuously in response to a competitive marketing program because their competition is seeing the promotions in the same moment they are released and adjusting their business in response.
In this world of the unexpected becoming the norm, the operational person is constantly looking outside their models to react to change and optimize their business in near real time. Furthermore, data streams that were simply not available in the past are now relevant to inventory. For example, trends in social media can have a dramatic impact on inventory levels, but this is likely to be in unanticipated ways.
Furthermore, in today’s world, the graphics of computers are now extraordinary and no longer a constraint on the quality of visualizations we can generate. In fact, they are enabling techniques such as heat maps and animations that were previously not practical in an operational environment.
Explain war room collaboration and why you think it is important to operational business.
We live in an incredibly dynamic world; lines between business units are getting blurred, technology is enabling initiatives that were undreamed of even two or three years ago, and competition is coming from areas we never anticipated. In this incredibly dynamic world, the silos of business that are (or were) often times needed to operate effectively need to come together to collaborate in new and even unplanned ways.
The war room is a really powerful way to bring these business units together, enabling cooperation and change in the dynamic world. The fantastic thing about visualization is that it doesn’t have a vernacular and it accelerates the collaboration required to be competitive.
Notes
1. Tufte, Edward R. (2001) [1983], The Visual Display of Quantitative Information (2nd ed.), Cheshire, CT: Graphics Press.
2. See http://www.sjsu.edu/faculty/gerstman/StatPrimer/anscombe1973.pdf