The Three Vs of Big Data Analytics: VELOCITY
Blog by Philip Russom
Research Director for Data Management, TDWI
In prior blogs, I’ve talked about how big data’s primary attribute is data volume. That’s pretty obvious. But it’s defined by other characteristics, too. For example, one of the things that makes big data so big is that it’s coming from a greater variety of sources than ever before. Now let’s look at the last of the three Vs of Big Data Analytics, namely data velocity.
Data Feed Velocity as a defining attribute of Big Data
Big data can be described by its velocity or speed. Or you may prefer to think of it as the frequency of data generation or frequency of data delivery. For example, think of the stream of data coming off of any kind of sensor, say thermometers sensing temperature, microphones listening for movement in a secure area, or video cameras scanning for a specific face in a crowd. This isn’t new; many firms have been collecting click stream data off of Web sites for years, using streaming data to make purchase recommendations to Web visitors. With sensor and Web data flying at you relentlessly in real time, data volumes get big in a hurry. Even more challenging, the analytics that go with streaming data have to make sense of the data and possibly take action—all in real time.
So you don’t think this is all science fiction, allow me to share some of the use cases for high-velocity data feeds and streams that I’ve heard recently.
Here’s an unsubstantiated anecdote that someone told me: “There’s a cell service provider in Japan that collects GPS data from cell phone users. The cell provider collects the data in real time, and keeps track of which people are walking the furthest. Once a month, the cell provider gives an award to the walker who covered the greatest distance. In a way, cell phones are working like sensors to collect and analyze streaming big data.”
I also heard a similar anecdote: “Imagine that I’m a consumer walking around downtown in a city, and I’m shopping. Now imagine letting a shopping service know where I am, plus maybe the kinds of goods I’m looking for. As I walk, the GPS coordinates could stream to the shopping service, and it could point me to stores that match my interests.”
A consultant who specializes in streaming data told me about some video and audio analytic applications he’s looking into: “Think about the algorithms that enable us to parse text and perform sentiment analysis, sometimes in real time. Very similar algorithms can parse video images to document and analyze changes in the thing that’s being imaged. Satellite images could monitor and analyze troop movements, a flood plane, cloud patterns, and grass fires. Or a video analysis system could monitor a sensitive or valuable facility, watching for possible intruders, then alert authorities in real time. You can implement similar applications with sound monitoring; one of my apps involves two thousand underground microphones to listen for movement in geologic formations. I hope it can eventually help predict earthquakes.”
Here’s a related user story about streaming big data that I heard recently: “You don’t need all of the streaming data. You just need the interesting pieces or just the one piece that identifies what you’re looking for. We’ve all seen video footage from the US military’s unmanned jet drones. A drone is processing several frames of video per second looking for shapes or light signatures that match its programming. For example, it might be looking for shapes that look like tanks or sun reflections that could come from metallic weapons. The drone deletes almost all of the frames, because they’re not of interest. And that helps avoid data glut that could choke the system.”
A prominent Internet-based business told me a few weeks ago: “We load 200 gigabytes a day into our data warehouse. But that’s processed down from several terabytes of Web log and click-stream data. We mix this big data with data about our customers drawn from other touch points, then analyze it. Although the data is streaming, we collect the stream on disk, then process it down and analyze it over night. Our next step is to process and analyze streaming big data in real time. We’re definitely a customer-oriented business, so understanding customers and serving them better is the goal of analytics. We just need to do it both after the fact in batch and – eventually – in real time.”
So, what do you think, folks? Let me know. Thanks!
This blog is number 3 in a series of 3, all about the three Vs of big data analytics, namely data volume, variety, and velocity. You can read the first blog here. And you can read the second blog here.
Don’t miss TDWI’s Big Data Analytics Survey. Please share your opinions and experiences by taking the online survey.
Posted by Philip Russom, Ph.D. on June 17, 2011