Big Data Analytics: Frequently Asked Questions (FAQ)
Blog by Philip Russom
Research Director for Data Management, TDWI What exactly is Big Data Analytics?
It’s two things: big data and the kind of analytics users want to do with big data. Let’s start with big data, then come back to analytics.
Users interviewed by TDWI state that data isn’t big until it breaks 10Tb. So that’s the low end of big data. And some user organizations have cached away hundreds of terabytes--just for analytics. The size of big data is relative; hundreds of TBs isn’t new, but hundred just for analytics is—at least, for most user organizations. Big Data is all about multi-terabyte datasets, right?
No, there’s more to it than that. Size aside, there are other ways to define big data. In particular, big data tends to be diverse, and it’s the diversity that drives up the data volume. For example, analytic methods that are on the rise need to correlate data points drawn from many sources, both in the enterprise and outside it. Furthermore, one of the new things about analytics is that it’s NOT just based on structured data, but on unstructured data (like human language text) and semi-structured data (like XML files, RSS feeds), and data derived from audio and video. Again, the diversity of data types drives up data volume.
Finally, big data can be defined by its velocity or speed. This may also be defined by the frequency of data generation. For example, think of the stream of data coming off of any kind of sensor, say thermometers sensing temperature, microphones listening for movement in a secure area, or video cameras scanning for a specific face in a crowd. With sensor data flying at you relentlessly in real time, data volumes get big in a hurry. Even more challenging, the analytics that go with streaming data have to make sense of the data and possibly take action—all in real time.
Hence, big data is more than large datasets. It’s also about diverse data sources or data types (and these may be arriving at various speeds), plus the challenges of analyzing data in these demanding circumstances. What kinds of analytics go with big data?
The kind of analytics applied to big data is often called “advanced analytics.” A better term would be “discovery analytics” because that’s what users are trying to accomplish. In other words, with big data analytics, the user is typically a business analyst who is trying to discover new business facts that no one in the enterprise knew before. To do that, you need large volumes of data that has a lot of details. And this is usually data that the enterprise has not tapped for analytics. For example, in the middle of the recent economic recession, companies were constantly being hit by new forms of customer churn. To discover the root cause of the newest form of churn, a business analyst grabs several terabytes of detailed data drawn from operational applications to get a view of recent customer behaviors. He may mix that data with historic data from a data warehouse. Dozens of queries later, he’s discovered a new churn behavior in a subset of the customer base. With any luck, he’ll turn that information into an analytic model, with which the company can track and predict the new form of churn. What kind of analytic tool does a business analyst need for the “discovery analytics” that’s common with big data?
Discovery analytics against big data can be enabled by different types of analytic tools, including those based on SQL queries, data mining, statistical analysis, fact clustering, data visualization, natural language processing, text analytics, artificial intelligence, and so one. It’s quite an arsenal of tool types, and savvy users get to know their analytic requirements first before deciding which tool type is appropriate to their needs. Is big data a problem just to be managed (with its size, diversity, and speed) or is it an opportunity to be seized?
TDWI is currently running an Internet-based survey about big data analytics. An early extraction of survey data shows that only 30% of users responding to the survey are concerned about the technical challenges of collecting and managing big data. The vast majority – namely 70% percent of the users responding to the survey – say that big data is definitely an opportunity. That’s because through analysis the user organization can discovery new facts about their customers, markets, partners, costs, and operations, then use that information for business advantage.
So, what do you think, folks? Let me know. Thanks!
Don’t miss TDWI’s Big Data Analytics Survey
. Please share your opinions and experiences by taking the online survey
Posted by Philip Russom, Ph.D. on June 21, 2011