Q&A: Big Data Issues: Visualization, Streaming Data, and the Cloud
Big data's impact on data visualization, DBMS vendors and NoSQL, analysis of streaming data, and cloud BI.
- By James E. Powell
- October 9, 2012
Big data isn't standing still, but it is having implications for data visualization paradigms, DBMS vendors, and cloud technology. We explore these topic with Marc Demarest is the CEO and principal of Noumenal, Inc. His keynote address at the TDWI World Conference in Orlando, Data Warehousing Investment Outlook: 2012–2020, will be delivered Monday, November 12, 2012.
BI This Week: We've heard you say in several TDWI forums over the past couple of years that you think there's a new visualization paradigm on the immediate horizon. What do you mean by "new"?
Marc Demarest: The visualization model employed in BI environments today is fundamentally based on hierarchies and business graphics: pie, line, and bar charts. For sure, we have a broader selection of business graphics than we had five years ago; if I see another misused heat map, I think I'll scream.
These environments have their purpose: information distribution to and for people whose scope of decision-making and situational awareness is fixed, whose jobs require them to understand the same fundamental information about the historical state of affairs, week in and week out.
We use the word "dashboard" a lot these days, and that has some resonance here. Today's BI visualization models are for the ordinary driver. They are essential, and they can confer little or no competitive advantage, because they are designed to structure the user's experience of the data, often in a draconian way.
What are Noumenal's clients are looking for from their data? Opportunity, differentiation, advantage, superior understanding, and superior decision-making. They aren't looking for dashboards; they're looking for heads-up displays in what they hope -- or want -- to be high-performance organizations, where the drivers aren't ordinary, the roads are treacherous, and the value of crossing the finish line first is significant.
In a word, they want analytics rather than information distribution, even if the information distribution infrastructure is -- as they are all becoming -- social, in the sense that decision-makers can share knotty data problems with one another, and kibitz online about things.
Mark Madsen has a nice model for analytics, and I'll just steal it to make my point. Mark says there are four basic things you can do with analytic tools: predict, estimate, describe, and model. I think that's right -- it's both comprehensive and accurate in terms of what I am seeing organizations attempting to do.
The visualization techniques associated with prediction, estimation, description, and modeling are not the visualizations we traditionally associate with business graphics -- they're radically different. We may describe the outcomes of a modeling or a prediction exercise for "the masses" with business graphics, but you don't see statisticians or arithmetic modelers using 3D bar charts to explain things to themselves.
It's hard to know -- particularly because analytics are so closely controlled by statistical analysis tools at this point in the market's evolution -- what the next visual paradigm will be precisely, but my view has been, and continues to be, that next-generation visualization will take its cues from films and games.
Why "films and games"?
Films are the visualization paradigm, in my opinion, for the past. The film -- the past -- is already written; it can't be altered. What we can do, with large volumes of increasingly complete data is visualize the past, along a time line, from one or another perspective.
We had, you may recall, a proto-version of this capability in first-generation BI tools: Forest & Trees called it "film-stripping," as I recall, and it was -- at that time (in the early 1990s) -- little more than a series of two-dimensional business graphics (like, say, a pie chart) arranged along a time line that played in an endless loop. We can do better now, whether the engine we used to visualize is one like Minecraft or one like the Halo engine.
Games are present- and future-oriented. Situational awareness breeds action in the present, which changes the possible futures available to me, the game-player. We have some really excellent gaming engines available today, both from commercial firms and from the open source community. These engines can implement complex models, orchestrate multiple actors with complex attributes, model equally complex interactions, and calculate outcomes. They are, fundamentally, well-suited for the kinds of work Madsen says we'll be doing when analytics are the norm, rather than the exception.
The trouble seems to be that the game engine designers think -- it's really a shame -- that their engines are only good for playing faux games, rather than playing the really interesting games of "my life" and "my business."
You've suggested, when you've been asked to forecast the next several years of the BI markets, that DBMS vendors will rise to the challenge of "big data" and the NoSQL movement. Why do you think that?
Two reasons. First, the merchant DBMS vendors are experiencing a flattening or decline in their DBMS revenue, in some cases due to their own ill-conceived pricing and/or support regimes. Having built their revenue model on the assumption of continued -- really, perennial -- growth in that revenue, they'll do what's necessary to get revenue kick-started again. Look, for example, at Oracle's 12c announcements.
Second, the noSQL movement has, in my view, created the preconditions for a backlash of significant size and scope, mostly by declining to be specific about what NoSQL technologies are good for, and what they're not good for. I see plenty of evidence that hundreds and perhaps thousands of companies have chosen to use NoSQL database engines for inappropriate purposes -- text-based query and search seems to be a common misuse, based on what I am reading.
We've yet to see a "database revolution" that the merchant DBMS players have not co-opted and absorbed. I see no reason -- other than the economic argument that all software is becoming free -- to believe that this revolution is different. Even if it is true that all software is becoming free, that transition will take at least a generation to take hold because purchase decisions are ultimately made by groups of people, of a certain age, with particular expectations about what they get for the money they do (or don't) spend.
With nearly every other aspect of our economy working to reduce or eliminate human labor, I find it hard to believe that in software we're moving to a cost-shifting model in technology, where bits are free, and the money we used to spend on bits we now spend on people to maintain and enhance the bits-that-are-now-free.
Are you still bullish on streaming data?
As ever. I have economics on my side; sensor arrays are the best available solution to problems of human inspection, exploration, and identification costs. What's particularly interesting to me, at the moment, about how sensors (and therefore streaming data) are making inroads into our organizational and personal lives is: the role of the public sector.
Traditionally, we think about public-sector agencies as lagging, rather than leading, the market, but I don't think that's the case where sensors are concerned. There's a pilot project, running in London at the moment, that exemplifies what's happening in a lot of metropolitan areas. Sensor arrays are being used to detect available parking spots in different parts of London and provide that information, in real time, to inbound drivers, on handheld devices and heads-up displays within vehicles. Dead simple sensing: is there a vehicle parked in this location or not? Immediate and significant benefit to rate-payers -- much more effective in-city navigation -- and probably immediate payback for police labor allocation because traffic wardens will be able to identify lawbreakers with ease, and with better coverage.
There are profound long-term effects, too: at scale, a view of the percolation network that is "parking in London" with all that it implies, including road reconstruction and traffic flow management, public safety and security, etc.
These kinds of projects are increasingly common in the public sector for one simple reason -- cities are now too expensive to maintain with human labor. We need machine labor instead, and if cities are the most complex machinery mankind builds, then the application of sensor technology to problems within cities will scale "down" quickly to other, less complex forms of machinery: companies, manufacturing and distribution processes, homes, and the lives they support.
What about BI in the cloud?
I'm afraid I'm one of those people who don't think BI in the cloud makes much sense for larger organizations. Certainly, organizations with the majority of their transactional portfolio "in the cloud" are going to consider BI in the cloud, but I think they're going to find that it's quite difficult in practice to source data from half a dozen cloud-based transactional applications and integrate those data sets on yet another cloud service provider's platform -- probably more difficult than integrating those data sets inside their own firewall.
The alternatives -- dealing with the stovepiped reporting tools those SaaS'd transactional applications provide, or trying to mash up those tools or pursue some other join-on-the-glass strategy -- aren't any more palatable in my view.
I have noticed that some of the cloud BI providers have toned down their make-pretty positioning and are focusing more on the data collection and integration issues; I hope that trend continues, because BI in the cloud could be the way we get out of this high-rate-of-failure problem that dogs the industry.
I remain skeptical. A few good public case studies, showing how a BI-in-the-cloud provider successfully integrated a half-dozen large data sets from SaaS'd and inside-the-firewall applications, and delivered those data sets in some integrated way to a few hundred users scattered across the planet, and using different access methods (phones, tablets, laptops) would go a long way to making me a believer.