TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Architecting a Modern Martech Stack for Speed, Scale, and AI Readiness August 26, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Q&A: Quantum Leap in Technology Addresses Data Issues

IBM distinguished engineer Sam Lightstone discusses the challenges and opportunities offered by big data.

By Linda L. Briggs
October 15, 2013

“With our latest offerings, IBM has taken a quantum leap forward in making [big] data manageable, successful and analyzable in a scalable fashion.” That’s IBM distinguished engineer Sam Lightstone discussing IBM’s new BLU Acceleration technology with BI This Week. Lightstone is a business intelligence architect for next-generation data analytics, working with IBM’s DB2 for Linux, UNIX, and Windows development team. He is the product architect for BLU Acceleration, IBM’s technology for parallel, vectorized, in-memory columnar analytics, a project he founded in collaboration with colleagues at IBM’s Almaden Research Lab.

An IBM master inventor with over 40 patents granted or pending, Lightstone has been widely published on the topic of self-managing database systems, and is co-author of five books, including a guide to software development professionalism, Making it Big in Software.

BI This Week: What are some of the constraints around big data today that are slowing companies down as they wrestle with large amounts of data?

Sam Lightstone: First, keep in mind as we use that term that there are many different kinds of big data.

That said, one class of problems is that the data is coming from many different locations and in many different formats. Managing those formats, those different streams of data, is complex, as is unifying it into some whole that is understandable -- not just by human beings but by computers, too. It’s an IT challenge, and a key problem especially in larger organizations -- the more disparate the location of the data, the bigger the challenge. So that’s one class of problems.

Another class of problems is the sheer volume of the data. We live in a world that is increasingly interconnected, and data is being generated from more and more locations. It used to be that just institutions had data. Now, everybody with a phone is generating data -- [along with] your laptops and computers; it’s not just backend servers. This incredible explosion of data continues at a phenomenal rate. The world is generating exabytes of data on a daily basis.

So there’s a problem of volume and how to manage it and leverage it. You have all this data -- how do you leverage it in a way to tell you something useful, to convert that data into useful information?

Dealing with data volume and turning it into useful information has multiple aspects as well. For example, there’s the analytic problem of turning data into useful information. There’s also the engineering question -- how can you do that quickly enough for humans? Nobody wants to wait two days to get an answer to a question; we live in a very impatient world.

Those are some key challenges in working with big data.

When you talk about data formats, is part of the challenge dealing with structured versus unstructured versus semi-structured data?

Yes, but it’s even deeper than that. Within any of those realms, there are many, many different formats. Even within structured data, there are many formats, and it gets more complex as you go from structured to semi-structured to completely unstructured. It’s a painful problem. You’re dealing with problems of conflicting data types, mismatched schemas, and mismatched topologies.

What we’ve done at IBM is, I think, very powerful, in the sense that we haven’t begun this quest to tackle big data by asking ourselves, “What are the interesting engineering problems? What are the interesting scientific problems?” There was a time in IBM’s history when we looked at things in that way.

What we do now -- and what we’re very proud of -- is we look at these problems in terms of, “What are the challenges that are important for our customers?” All of our technology really is focused on this theme: What is important for the customer, and what can we do to meet the challenges of our customers and society with all this data? That’s instead of: what is interesting and scientific to us as engineers?

In listening to your customers, do you find big data is a big challenge for them or are they beginning to see it more as an opportunity – that is, more a positive than a negative?

It’s definitely a huge opportunity; it’s also still rapidly emerging. We’re really just at the beginning of the curve in the evolution and adoption of the technology. It’s been around for a few years, but it’s still in its early stages -- like your cell phone or even your television. These things take years to evolve and mature -- not just that the technology itself has to become better, but society has to wrap its head around what they want to use it for.

When we as a society first invented cell phones, nobody talked about a mobile phone as a place to play backgammon or watch movies. It was a phone. Now actually calling someone is just one minor task your cell phone can do.

Would you say we’re at the point where we’re collecting and storing data effectively but not yet using it effectively? Do companies really know what to do with their data, or is the volume simply overwhelming?

Well, I think that IBM, with our latest offerings, has really taken a quantum leap forward in making this data manageable, successful, and analyzable in a scalable fashion -- in a way that its performance is enough for human beings and human reactions. That’s behind the IBM phrase, “analytics at the speed of thought.”

The key idea there is -- and I think this is one thing that engineers across the industry don’t necessarily internalize, but we’ve tried to stress with our engineering teams -- it’s not really about being 20 percent faster or 10 percent faster than your competitor. Certainly, there’s a point where it’s too slow, because human beings are not that patient and businesses can’t afford to wait. There’s another point at the other end of the spectrum where it’s so fast that making it any faster really doesn’t buy you anything.

If you think of it in that context, and consider things from the perception of human beings, it doesn’t really matter to a person once you can get a sub-second answer, for example. It really doesn’t matter if it’s a tenth of a second, a hundredth of a second or a millionth of a second -- it’s all faster than your perceptions can acknowledge, so it’s all the same. However, it matters tremendously if it’s an hour versus 10 hours versus 100 hours, so by focusing on this notion of making it manageable and scalable to the human being, the human perception, we create a technology that is really valuable and consumable by the marketplace.

That might be the first time I’ve heard someone say that something related to data delivery could actually be too fast.

Yes, that’s not to say that sub-second responses are always fast enough, because sometimes they’re not. A classic case in which they’re not is this: You may be able to process a small transaction or a banking transaction at a millisecond, but if there are a million people or 10 million people doing it all at once, it becomes a totally different situation. The scale and the concurrency -- the number of concurrent users that are pounding on the system -- is an important consideration.

How much of a factor is speed for companies that are failing to use analytics in a useful way? Is a lack of speed really holding people back from using analytics?

Oh, yes. I think speed is a huge issue. It’s absolutely huge -- performance remains one of the main attributes of the technologies that we compete on. It’s not the only attribute. The ability to scale with the volume, the ability to handle all the different formats and so on -- to functionally do what customers need us do will always remain and still is a fundamental part of what we compete on. However, performance is almost always one of the main attributes, because the analysis of large amounts of data is time-consuming. It’s time-consuming even if you have 50 gigabytes, but if you have a terabyte, or 20 terabytes, or maybe hundreds of terabytes or even petabytes, then it starts getting really costly in terms of computation time. People are people, and human beings just don’t want to wait.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Q&A: Quantum Leap in Technology Addresses Data Issues

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research