Big Data is About to Get a Lot Bigger
Three critical factors are driving the exponential growth of data. Is "big" too small a word to describe what's happening?
By Bob Potter, Rocket Software
We all know the definition of big data: data so large and complex that it overwhelms conventional computer systems. We all know the three Vs: volume, velocity, and variety. The most important "V" in my opinion is volume, and let me tell you, we haven't seen anything yet!
There are three critical factors for the exponential growth of data:
- The phenomenon we now call the Internet of Things
- The explosive growth of data created by social media
- The improvements in database management technology
Factor #1: The Internet of Things
Let's explore the Internet of Things phenomenon first. Sensors, meters, biochips, transponders, controllers, appliances, wearables, etc., will constitute 50 billion devices by 2020 that will generate 50 to 100 trillion objects of data. Just a decade ago a terabyte was considered a lot of data, but now we routinely talk about analyzing petabytes of data, which is 10 to the 15th power or 1000 terabytes. Tomorrow we will be talking about exabytes of data which is 10 to the 18th power or one million terabytes.
Tooling for Internet of Things applications are popping up everywhere and venture capitalist are pouring 100's of millions of dollars into aspiring start-ups to help companies manage and analyze all this data. This is the new gold rush in the software industry. If a vendor doesn't have an IoT strategy they somehow think they're about to become extinct. But make no mistake about it. Machines and devices are generating data that was not even conceivable 10 years ago and companies want insight from that data to gain competitive advantage and sometimes to actually save lives.
Factor #2: Social Media
Interactive marketing spend is growing at 17 percent per year and social media spend is growing at 34 percent. Last year, companies spent $55 billion on social media ads. What is truly mindboggling is how my generation is leading the charge. The fastest growing segment for Twitter users is the 55 to 64 age bracket. This group also far exceeds the 16-to-24-year-old age bracket in growth for Facebook. Chances are your grandmother uses Facebook, but what does this mean for data?
After all, 6.4 quintillion bytes of Internet data were created in 2014 alone; 90 percent of all Internet data was generated in the last two years, and 43 percent of data created on personal social media accounts is gathered and analyzed. In general, structured and unstructured data is growing 60 percent annually, and as third-world countries become more connected, don't expect that to slow down any time soon. Social media companies, digital media companies, and other enterprises working in the interactive marketing business are expanding operations at a rapid rate and are buying commodity hardware at a voracious clip.
Factor #3: Improvements in Database Management Systems
The relational database hasn't changed much since it was invented in 1970. Most investment and innovation by vendors was been incremental, and the brand new models never lived up to the success of the relational model. Then document-based databases showed up about eight years ago and standards such as Javascript Object Notation (JSON) became accepted standards. Suddenly companies such as MongoDB and CouchDB were off to the races, adopted by independent software companies and large companies with Java development staffs. They introduced horizontal scaling and large file storage and processing. These products did not rely on SQL, but rather let the application itself define the queries.
NoSQL is not where all the new innovation is, either. SQL database companies such as IBM, Microsoft, and Oracle picked up the pace to accommodate large transactional database tables and newcomers such as NuoDB invented scale-out distributed SQL databases for cloud deployment. This start-up claims it can do 1,000,000 new order transactions per minute.
IBM continues to make huge improvements in their System/z database engines, and the next change will most likely target large table-management improvements as mainframe data continues to expand at a rapid rate. These improvements will take advantage of the newly released z/13 mainframe which comes with 10 TB of memory which they aptly call Big Memory.
IBM is investing so much in big data that their annual Information and Analytics Conference, Insight, has the tag line "Are you ready for something big?" Oracle and Microsoft are doing similar things to enhance their database technology to accommodate a growing appetite for their customers doing more transactional processing with larger database tables for what we consider traditional online transactional systems.
Any single one of these factors would accelerate the pace of data growth, but all three are occurring at the same time. What we consider big one year is nothing the next. We need a new way to describe big data. I don't know what word will go in front of data, but suffice it to say it won't be a word as humble as the word big.
Bob Potter is senior vice president and general manager of Rocket Software's business information/analytics business unit. He has spent 33 years in the software industry with start-ups, mid-size, and large public companies with a focus on BI and data analytics. You can contact him at [email protected].