What's Really Different about Big Data Analytics?
We explore several fundamental changes driven by big data analytics.
- By Fern Halper, Ph.D.
- April 30, 2013
Big data analytics can be viewed in a couple of ways. On the one hand, organizations are using it to extend what they have been doing in analytics and do it faster or perhaps more accurately in order to improve decision-making. For example, you might have a model that predicts churn. Maybe that model took hours to run on your existing infrastructure. With a big data infrastructure in place, it might now take minutes.
On the other hand, organizations are using it to do new kinds of analysis. For example, think about smart meters collecting time series data in real time in order to predict power outages. That's a new kind of analysis made possible by the introduction of new technology (the smart meter) and big data analytics. Extend this idea to include other sources of data (such as geo-spatial and environmental data) that could be used determine the proper response to a storm or other natural disaster and you start to get the idea of just how powerful big data analytics can be.
When I was writing Big Data for Dummies, I spent a lot of time thinking about how big data analytics differs from how organizations traditionally have approached analytics. Of course, the infrastructure you use to process and store the data will be different, and we spend a lot of time talking about those technologies in the book. Additionally, organizations will have to consider that the algorithms they used on traditional-sized data sets might need to be refactored; i.e., changing the internal code without affecting its external functioning. Finally, companies are also realizing the importance of running analytics close to the sources of big data, so you don't have to worry about first storing it and then analyzing it. This approach of running analytics closer to the data source improves "time to answer."
However, there are other fundamental changes that are driven by big data analytics. For example:
Big data analytics can be programmatic. With the scale of big data, it can become difficult to manually load and explore data. Therefore, it often becomes necessary to deal with data programmatically -- i.e., using code. Most people have heard about programmatic stock trading. Big data technology will continue to move analytics programmatically. Programmatic advertising and marketing are good examples; it's been done before but can be extended in new ways using big data. Think about marrying various data sources about you (social, CRM, e-commerce, etc.) together with location data from your mobile phone to automatically target a promotion to you. That's a new example of programmatic analytics for big data.
Big data analytics can be hypothesis-free. This is sometimes called data-driven analysis. In a hypothesis-driven analysis, you first define the problem and then gather the data to analyze to validate or invalidate the hypothesis. In a hypothesis-free approach, you start with the data and see what it tells you. In other words, the data drives the analysis. This kind of analysis is becoming more prevalent with the advent of big data infrastructures and analytics. You now have the compute power to let an algorithm iterate on the data. This kind of analysis is being explored to look at cause and effect in biomedical research. It's used in marketing and in many other areas of the business. For example, you might want to explore customer data to find natural groupings of customers or find unusual patterns in the data. Of course, it's important to examine the results of this kind of analysis very carefully to avoid misinterpreting results.
Big data analytics can use lots of attributes. This is, in some ways, tied to my hypothesis-free example. My point here is that in the past you might have been utilizing hundreds of attributes from multiple structured data sources to do you analysis. With big data analytics, you can deal with much more. These attributes might come from sensor data, enterprise data, geospatial data, environmental data, social data, and so on. The data can be disparate in terms of variety and timing. It can be mashed up and analyzed in multiple ways. The disparity of the attributes requires newer technology to process the data (i.e. text analytics, voice analytics) and to analyze it.
Most companies are still very early in their big data journey. Some are experimenting with big data analytics in isolation from the rest of their overall computing environment. They are typically deploying one or two big data analytics projects to understand the issues and the value of the technology. However, since it is a rare company that can afford to build an overall big data analysis capability from scratch, it makes sense to think about big data analytics as an ecosystem of technologies, people and processes. Companies will ultimately leverage existing infrastructure together with any new big data technologies they may utilize.
For more information about big data, I encourage you to pick up a copy of the just-released book Big Data for Dummies (Wiley, April 2013) that I co-wrote with Judith Hurwitz, Marcia Kaufman, and Alan Nugent. Additionally, consider attending the TDWI World Conference being held in Chicago the first week of May; its theme is "Big Data Tipping Point."