TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Q&A: How Hadoop Blew Open the Door to Next Generation of Computing

A paper in the early 2000s about Google MapReduce helped to democratize distributed computing. A veteran of the industry discusses that time and the huge changes that continue to unfold today.

By Linda L. Briggs
June 30, 2015

The advent of the Internet of things calls for better ways to handle massive amounts of data. A Google paper about MapReduce in the early 2000s signaled a fundamental change in data architectures and distributed processing. In this interview, the second of two parts, Splice Machine CEO Monte Zweben, a long-time industry veteran, discusses how MapReduce, in his words, "broke open the big data world and democratized big data computing and distributed computing." [Editor's note: The first part of our interview can be found here.]

As we talk about the Internet of things, big data, and pending changes in how that data is managed and used, how important is the role of Hadoop?

Hadoop has changed everything. ... To give a little bit of history, I was sitting on an advisory board at Carnegie Mellon University back in the early 2000s. [We were looking at] a paper recently published by Google, the MapReduce paper. It was what preceded Hadoop. The paper was [about] a new computing paradigm -- the scale-out paradigm -- that Google was using. It really struck everyone in the computer science community in a very big way. ...

We've all basically struggled in computer science over the past 30 years to figure out a way to get many computers to work on a problem at once. People said if we could put a lot of computers together, we could solve much larger problems, but it turned out to be too hard. It turned out that you pretty much needed a Ph.D. in distributed systems in computer science in order to get computers to work together.

There were all these technical problems in getting computers to not "starve" each other -- meaning that one computer is waiting for something that the other one is producing, but that other computer is waiting for something that the first one is producing, and they lock up. There were all these technical problems like that in synchronizing machines to work together.

Then this paper came out and showed a way of avoiding all that and making it so that the average Java programmer could get hundreds or thousands of machines to work together.

The open source community, realizing the importance of this, in particular the paper that created Hadoop, replicated the Google infrastructure. They replicated the file systems, the MapReduce computation engine, and they replicated a database called Bigtable [a distributed storage system for structured data] from Google in Hbase.

That broke open the big data world and commoditize, or I should say democratized, big data computing and distributed computing. Now, suddenly, everyone could use hundreds or thousands of computers to attack problems.

You see that paper on MapReduce as a real turning point.

That is what I personally think broke it all open. It enabled programmers to take advantage of this massively disruptive architecture.

Now, what I see as the challenge today is this: IT is still in the dark and IT is in the dark because IT doesn't program computers, they use computers. They use platforms and architectures and databases. They're not in the business of developing software. They develop applications with software components.

I thought to myself (and my co-founders thought), how are we going to get the power of this distributed architecture -- which enables this huge distributed computing that will enable the Internet of things [and more] -- how do we get that out into the masses, into Fortune 500 and global 2000 companies and beyond?

Our view was that you had to deliver this power on something that everyone knew and understood, so what we're doing is bringing the power of Hadoop in the context of a relational database.

Everyone knows what a relational database is. The majority of applications built in the world are built on SQL in relational databases. What if the existing applications in the world, and the new ones that are going to be built, could be built in SQL but executed on Hadoop? How big could they be? How big could the datasets be, and how real-time could they be?

That is what I think is going to enable that second generation of applications that we talked about earlier.

We're trying to tackle that problem -- which is democratizing the power of Hadoop, not just for the programmers of the world but for IT, and we're doing it in the context of a relational database management system.

Is Splice Machine the only one doing this -- building a relational database management system that works with Hadoop?

I have both a yes and no answer to that. Lots of people -- lots -- have realized they need to make SQL work on Hadoop. They've recognized the power of Hadoop, they have their data onto the Hadoop file system, but they don't want their people to have to program in Java on MapReduce jobs. They'd rather access that data using SQL, so everyone jumped on that bandwagon to support analytics on Hadoop with SQL. There are plenty of people doing that.

However, nobody has tried to address that second-generation challenge -- to build a database that actually reflects the properties that traditional relational database management systems reflect. There's a computer science term for that; it's called ACID properties. It's a technical term that stands for atomicity, consistency, isolation, and durability. Suffice it to say that what ACID does is to enable concurrency. It enables multiple readers and writers of the database to keep the database consistent. That's what the first generation of applications on client-server machines required, and that's what Oracle and MySQL and Postgres and IBM DB2 and Microsoft SQL Server all provide. They provide the ACID properties for concurrency.

Now, what happened with the Hadoop world is, no one did that. Splice Machine is the only company that we know of in the marketplace that is really attacking that strategy, to truly power applications as you would with Oracle. Although there is a lot of effort to try to make Hadoop more accessible with SQL, we're uniquely differentiated in making Hadoop able to power real-time applications.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Q&A: How Hadoop Blew Open the Door to Next Generation of Computing

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research