Q&A: Power, Importance of Data Continue to Grow
IBM engineer Sam Lightstone, who works in next-generation analytics, talks about the growing power of data and where he sees big data and analytics headed.
- By Linda L. Briggs
- October 22, 2013
The power and importance of data will only continue to increase, says IBM distinguished engineer Sam Lightstone. In
part one of our interview, Lightstone discussed the challenges and opportunities big data offers. This week, we explore where big data and analytics are headed and discuss IBM's advanced technology for parallel in-memory columnar analytics.
BI This Week: What areas and industries in particular do you see taking advantage of big data -- as leaders in this area?
Sam Lightstone: It's focused on industries that are collecting and using lots of data. Retail is very big because every time a retailer sells something, they have another data point. If you're a big retailer, you're collecting data every single day with every single transaction. If you're in the financial industry, same thing -- you collect data with every financial transaction. I guess you could say that a lot of this is tied to how many transactions are coming your way. That is, I would guess, loosely correlated with how many customers you have. There's lots of big data activity in the financial industry, the insurance industry, the retail industry, and so on. Manufacturers also have lots of data, so there are quite a few of them in this space as well.
When we talk about big data and analytics, privacy almost never comes up. Does privacy fit into this discussion? Is that a concern when we talk about data volumes and the way we're using data these days?
I think it's a huge consideration. To some extent, it's very much dependent on the companies that have the data to handle it properly. It's not so much about the technology itself. Technology for working with big data is a tool, it's a device. You can use it for good or evil. If people want to abuse it, I suppose they can. In my experience, one of the good attributes of large companies is that they can't afford to be caught mishandling data. Big companies usually put a lot of focus on security and privacy, and proper handling of any kind of private information. It's harder and probably a little bit less urgent on average for smaller companies, but that's not to say that small companies are not being diligent as well.
You work in next-generation analytics, so where do you see us headed with big data and analytics? Where might we be in a few years, or even sooner?
You're asking me a question about the future, but I'm going to give you an answer that begins with past history. There's the saying that those who cannot remember the past are condemned to repeat it. Certainly, there's something to be learned from the past. One of the things that strikes me is the profound change in our industry from even the early 1980s until the late 1990s. There was a shift, but it almost seemed that nobody noticed or wrote about it. In the 80s, you could buy a computer and, wow, it was expensive. It doesn't matter if it was a computer for a large company or a personal computer. These things were very, very expensive.
Then there was all the software, which was comparatively inexpensive and could fit on a "floppy disk." Bill Gates supposedly once said that 640K of memory should be enough for anybody, right?
At that time, the paradigm was that the hardware is expensive. Treat it kindly. Preserve it. Data, on the other hand … was a cheap, temporary thing. Then the shift came. Per Moore's Law, computers essentially became obsolete every two years, so every two years, everybody started replacing their computers. Their data, however -- they found they couldn't live without it, so the data had to stick around. Everybody needs to have it on persistent stores, you have to have a backup and you have to have backups for the backup -- three copies of everything.
I see it as a massive transformation that went by without a lot of commentary, but changed everything, because data lives forever. What's continuing to evolve is not just that data lives forever, but data is uber-powerful. The more data you can collect and understand, the more power you have to make important decisions to discover useful things.
As we look to the future, we're at a stage now where we understand the power of data, and we're looking for better and more powerful ways to collect data, analyze data, and make important decisions from data.
You said that "data lives forever." Will this incredibly steep increase in the amount of data stored worldwide continue?
Yes, and it's going to continue to increase again in part because we have more and more devices that we're interacting with that are collecting data, but also because of this realization that data is so powerful. Companies are making an ever-more aggressive effort to collect it and understand it. Like I say, I believe most companies are trying to do that in an ethical way, but make no mistake about it -- it's much more deliberate than it ever has been before and it's going to continue to be much more aggressive. Although I think in the past, people have competed for technology share and so on -- "Buy my product" and so on -- our companies and industries are going to be competing more and more for the data itself.
In the context of our discussion here, can you discuss IBM's BLU Acceleration, which you had a role in developing?
We're very, very excited about what we've done with DB2 version 10.5 with BLU Acceleration. It's a project that we've been working on for some time and have worked very hard to keep a secret from everyone, even most of our customers. That's because we knew we were on to something big. It's not really IBM's style to be quite so secretive, but I guess we put some extra diligence on it this time.
We've come up with this new technology that is, in one fell swoop, 20 times faster and hundreds of times easier to use, and uses a fraction of the disk space for the same dataset. Attributes such as speed, ease, and storage savings are all very valuable, and usually one comes at the cost of another. If you try to save on storage space, it's going to cost you some CPU cycles to make the data more compressed. If you want better performance, you're going to have to add complexity to the system.
Our ability to improve all three of these at once is a huge improvement. It's really made this a game-changer for us. We have many customers who are now seeing exactly the kind of speedups that we've seen here in the lab, and with exactly the same kinds of ease-of-use improvements. It's really gratifying to know that it works as well for them as it does for us.
It must be satisfying to be involved in something so cutting-edge.
It's an opportunity of a lifetime, I think. All of us who have worked on it, both in the research team and the development team, have the sense of having been part of something that's really a game-changer. We try to have a lot of fun working on it, but it's really gratifying to see our customers trying it out and getting so much business value from it as well.