The Three Vs of Big Data Analytics: VOLUME
Blog by Philip Russom
Research Director for Data ManagementTDWI
I was recently on a group call along with several other analysts where IBMers spelled out their definition of big data. They structured the definition by explaining big data’s primary attributes, namely data volume, data type variety, and the velocity of streams and other real time data. I don’t necessarily agree with everything the IBMers said, but I must say that the three Vs of big data – volume, variety, and velocity – constitute a more comprehensive definition than I’ve heard elsewhere. In particular, the three Vs bust the myth that big data is only about data volume. Plus, the term “three Vs” is a catchy mnemonic. So I freely admit that I am shamelessly stealing the concept of the three Vs as a structure for my own definition of big data.
Note that IBMers didn’t consistently link big data with advanced analytics – but I will. This blog focuses on data volume, whereas other upcoming blogs will hit data type variety and data stream velocity.
Data Volume as a defining attribute of Big Data
It’s pretty obvious that data volume is the primary attribute of big data. With that in mind, some people have asked me for a definitive number quantifying the volume, a common question being: “Exactly how many terabytes constitute big data?” In some user interviews I’ve conducted lately, users have said that big data used to start at 3 terabytes, but now the bottom threshold is more like 10 terabytes. In a 2010 TDWI Technology Survey, a third of users surveyed said they will have 10 terabytes within three years. So 3 to 10 terabytes seems an accurate baseline – for now.
But there’s a catch. Note that my research isn’t about just any big data; it’s about big data collected specifically for analytics. So the numbers quoted above are only for analytic datasets -- not all BI data stores and certainly not every bit and byte in an enterprise.
Here are some comments from the field that add more attributes to big data quantification. I asked one user how many terabytes he’s managing for analytics, and he said: “I don’t know, because I don’t have to worry about storage. IT provides it generously, and I tap it like crazy.” Another user said: “We don’t count terabytes. We count records. My analytic database for quality assurance alone has 3 billion records. There’s another 3 billion in other analytic databases.”
From this we see that big data is a moving target that’s growing, there are different units for quantifying it, and it varies with scope (e.g., analytics vs BI vs whole enterprise). In future blogs in this series, we’ll see that data variety and velocity are just as important as volume when it comes to defining big data. Please stay tuned for those blogs.
So, what do you think, folks? Let me know. Thanks!
Don’t miss TDWI’s Big Data Analytics Survey. Please share your opinions and experiences by taking the survey online: http://bit.ly/jxWh9N
Posted by Philip Russom, Ph.D. on June 9, 2011