The Three Vs of Big Data Analytics: VARIETY
Blog by Philip Russom
Research Director for Data Management, TDWI
This blog is number 2 in a series of 3, about the three Vs of big data analytics, namely data volume, variety, and velocity. You can read the first blog here online.
Data Type Variety as a defining attribute of Big Data
One of the things that makes big data big is that it’s coming from a greater variety of sources than ever before. Many of the newer ones are Web sources (logs, click streams, and social media). Sure, user organizations have been collecting Web data for years. But, for most organizations, it’s been a kind of hoarding. We’ve seen similar un-tapped big data collected and hoarded, such as RFID data from supply chain apps, text data from call center apps, semi-structured data from various insurance processes, and geospatial data in logistics. What’s changed is that far more users are now analyzing big data, instead of merely hoarding it. And the few organizations that have been analyzing it, now do so at a more complex and sophisticated level. A related point is that big data isn’t new; but the effective leverage of it for analytics is. (For more on that point, see my blog: The Intersection of Big Data and Advanced Analytics.)
But my real point for this blog is that the recent tapping of these sources means that so-called structured data (which previously held unchallenged hegemony in analytics) is now joined (both figuratively and literally) by unstructured data (text and human language) and semi-structured data (XML, RSS feeds). There’s also data that’s hard to categorize, as it comes from audio, video, and other devices. Plus, multidimensional data can be drawn from a data warehouse to add historic context to big data. I hope you realize that’s a far more eclectic mix of data types than analytics has ever seen (or any discipline within BI, for that matter). So, with big data, variety is just as big as volume. Plus, variety and volume tend to fuel each other.
To further support the point that big data is about variety, let’s look at Hadoop. I managed to find a couple of users who’ve used Hadoop as an analytic database. Both said the same thing: Hadoop’s scalability for big data volumes is impressive. But the real reason they’re working with Hadoop is its ability to manage a very broad range of data types in its file system, plus process analytic queries via MapReduce across numerous eccentric data types.
Stay tuned for the third and final blog in this series, which will be titled: The Three Vs of Big Data Analytics: VELOCITY.
NOTE -- Don’t miss TDWI’s Big Data Analytics Survey. Please share your opinions and experiences by taking this online survey.
Posted by Philip Russom, Ph.D. on June 14, 2011