TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

The Mainstreaming of MapReduce

Is MapReduce ready for the mainstream, or is it best used in niche environments?

By Stephen Swoyer
November 18, 2009

Last September, a pair of analytic database specialists -- Aster Data Systems Inc. and Greenplum Software Inc. -- trumpeted the availability native MapReduce implementations on their DBMS platforms that, they promised, would significantly accelerate performance for customers in several vertical markets. Recently, several other analytic database firms -- including Netezza Inc. and Teradata Corp. -- have followed suit, promising to introduce native MapReduce implementations of their own.

These are heady times for MapReduce advocates.

MapReduce promises to (drastically) simplify parallelizing certain kinds of queries or problems -- particularly those that involve extremely large datasets -- across a cluster -- even those that are petabytes in size. MapReduce advocates say it suggests the scale of the problems adopters plan to tackle.

The beauty of MapReduce -- in addition to its parallel processing -- is that it permits programmers to run queries against a database using any of several popular languages, including C++, C#, Java, and Python.

In this respect, MapReduce is similar to a facility like Microsoft Corp.'s Common Language Runtime, or CLR.

One obvious difference is that MapReduce is both highly parallelizable and intended for use primarily with very large datasets. CLR and similar facilities (Oracle, for example, supports CLR via its Database Extensions for .NET) are chiefly intended to let developers program against a DBMS in the language with which they're most comfortable. Its scope, then, is considerably more mundane.

MapReduce was popularized by Google Inc., which uses it to power its search technology. Not surprisingly, when Aster Data and Greenplum announced support for MapReduce (on the same day, no less), both companies sought to invite comparison with Google's MapReduce-powered search expertise.

The $64,000 question about MapReduce concerns the kinds of very large dataset problems for which it's best suited.

Skeptics -- principally, analytic database competitors that don't currently offer MapReduce implementations of their own -- like to question its applicability in the enterprise, at least for general-purpose data warehousing (DW) tasks.

"MapReduce is on the road map in a distant future for us. We still to date see almost no demand in the marketplace for it. It's become a marketing thing more than a desired customer feature thing," says David Ehrlich, CEO of analytic database specialist ParAccel Inc. At a time when competitors such as Netezza and Teradata have announced plans to support MapReduce, Ehrlich says he doesn't see the urgency.

"We are working with one customer right now where they were looking at a MapReduce approach to an implementation, and when the relational guys and the MapReduce guys finally got into the details of what they were trying to achieve, [they determined that] MapReduce was going to slow the performance down significantly."

MapReduce does have its uses, Ehrlich concedes -- and ParAccel probably will accommodate it at some point -- but he demurs as to when.

"MapReduce is a great approach to distributed computing for a lot of things, but if you have a workload that is especially friendly or honed or appropriate for a [typical] relational database environment, our belief is that you let the relational database environment do it. We haven't seen a lot of demand and we've seen very few environments around MapReduce where we think it would be a good answer for the customer."

There's still a clear sense in which the MapReduce use cases touted by proponents tend to skew toward specific conditions or environments.

Take, for example, Aster Data, which has introduced a software development kit (SDK) to support its MapReduce implementation.

Aster officials are predictably enthusiastic when it comes to MapReduce and its potential applicability, but the application examples they tout involve typical Big Data implementations -- e.g., Aster's MapReduce SDK introduces canned support for sequential path analysis and provides sample data sets for retail customers. The latter vertical is one of the textbook Big Data uses cases for which MapReduce is frequently touted.

"[W]hat I've found is that there are so many different types of applications that you can actually leverage [MapReduce] for. It's not only pieces where you see engineers or super-techie people latching on to it, but also business people," said Shawn Kung, director of product management for Aster Data, in an interview earlier this year.

"The thing that's surprising and a testament to our field sales is that we've been in many engagements where we brought forward the SQL MapReduce and … the techies, they get it, but maybe the most surprising is the fact that once our field teams have actually translated that into business value, the business sponsors … when they see that they can get a 10,000 to 30,000 [times] improvement in the way that they do certain kinds of analytics, and that's going to reduce cycle time dramatically, they become champions."

Kung was more vague, however, when asked to describe the kinds of enterprise applications for which business sponsors, in particular, have latched on to MapReduce. He instead positioned the technology as a proposition that programmers and data management (DM) pros are still getting their heads around.

"As we develop a community of SQL MapReduce users, there's going to be more knowledge-sharing. Think of in-database MapReduce as sort of [like] the early days of Java," he explained. "You didn't see people suddenly knowing everything about Java. It took time, and now there's a rich ecosystem around that. In many ways, I see [MapReduce] as sort of [like Java in its] early days, but in the coming months and years I see in-database MapReduce really proliferating, sort of the way Java did with the Internet."

Kung's prognostication might sound too optimistic, but at least one industry watcher thinks there's something to it.

"All of these years we've had parallel data handling. Now we need parallel processing. The parallel data side has been mature and strong. It's like a bride waiting for her groom. Some day, this MapReduce will grow up to be an incredibly powerful processing system. That day, Teradata will be challenged in scalability, and we'll just love it. It'll be wonderful to have a workload that's equal to us," says Dan Graham, a senior marketing manager with Teradata.

Graham's company recently announced its own MapReduce strategy via the open source Hadoop project, so Graham and Teradata aren't unbiased. At the same time, Graham notes the emergence of a future crop of MapReduce-based applications that -- eventually -- will transform data warehousing.

"MapReduce as it sits today is embryonic. It's clumsy. It doesn't have tools. It has a lot of excitement and momentum. [It has] a lot of installs, [but] no two installs are the same," he says, predicting that "in the next five to seven years, these things will grow up." When that happens, Graham says, DW practitioners and programmers will have to reach a separate peace.

"The most important thing right now is the perception that [MapReduce] is a replacement or displacement of a data warehouse. It's what a lot of these fellows chant on the Web. Teradata is preaching a coexistence strategy, simply because we're not going to fight the Java aficionado for his pride and his work," he continues. "At some point, data warehouse pros and their programmer counterparts have to realize that each tool has its place. If you couple them, you can get tremendous competitive advantage. There's a lot of workloads that can be done in MapReduce today that would be better [done] in a data warehouse."

Consequently, Graham concedes, there are workloads -- such as ETL processing on extremely large data sets -- for which MapReduce seems tailor-made.

"The most common early use of it will be as an ETL system on steroids. If you think about [having] a parallel system out there gathering the data, transforming the data, and handing it [off] to Teradata to be loaded, this is great! We're finally finding our equal who can feed us data. We have mainframes that can't keep up with us," he explains.

The drawback, he points out, is that MapReduce-based ETL means a return to hand coding. "If you're a dot.com with 1,000 servers with Hadoop Web data on it, you don't have a choice. Go to your ELT vendors and ask them how to do that and they will probably step up."

Graham anticipates that ETL-powered MapReduce could account for up to 80 percent of Teradata's "snuggle-don't-struggle" coexistence strategy.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

The Mainstreaming of MapReduce

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research