TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Informatica Takes on Big Data

Informatica's PowerCenter Big Data Edition is more than just a Hadoop DI tool: it even includes vanilla PowerCenter.

By Stephen Swoyer
November 13, 2012

At the Hadoop World conference in New York, Informatica Corp. announced a new big data-ready version of its PowerCenter data integration (DI) offering, the aptly-named PowerCenter Big Data Edition.

Just as PowerCenter itself has evolved into more-than-just-an-ETL tool -- it bundles data profiling, data cleansing, and other features -- PowerCenter Big Data Edition might be called more-than-just-a-Hadoop-DI tool: it even includes vanilla PowerCenter.

According to John Haddad, director of product marketing with Informatica, the new big data-ready version of PowerCenter bundles features such as data profiling, data cleansing, data parsing, and "sessionization" capabilities. PowerCenter Big Data Edition also includes a license for conventional PowerCenter, says Haddad; this permits customers to run ETL or DI jobs in the context -- i.e., in Hadoop or on one or more large SMP boxes -- that's most appropriate to their requirements or workload characteristics.

"It includes the license and [the] capability to run traditional PowerCenter and scale it up on multiple CPUs like an SMP box or on a traditional grid infrastructure," he confirms. "You're not going to use Hadoop for all of your workloads; if you're doing a few gigabytes of structured data on a daily basis and you want it to be processed in near-real time, you would deploy that on a traditional grid infrastructure," Haddad continues. "If the next day, you have 10 terabytes of data and you need extra processing capacity, you can run that in Hadoop."

Accommodating Hadoop

Vendors are accommodating Hadoop in different ways. DI vendors, for example, tend to take either of two approaches.

Some vendors have gone "all-in" on Hadoop and MapReduce -- the approach leverages the Hadoop implementation of MapReduce to perform the processing associated with ETL workloads. Open source software (OSS) DI specialist Talend is an example of this approach.

Other vendors have employed an embrace-and-extend approach. DI offerings from vendors such as Pervasive Software Inc. and Syncsort Inc., for example, run at the node-level across a Hadoop cluster; they use their own libraries in place of MapReduce, such that a Pervasive or a Syncsort engine actually does the ETL processing in place of MapReduce on an individual Hadoop node.

Informatica's approach is closer to that of Talend's -- with a key difference. In the context of Hadoop, PowerCenter Big Data Edition -- like Talend Open Studio for Big Data -- uses MapReduce to do its ETL heavy lifting. However, customers alsocan run non-Hadoop workloads in conventional PowerCenter. (The Big Data version of Talend Open Studio does not include a license for conventional -- i.e., non-Hadoop-powered -- Talend ETL. If you buy Open Studio for Big Data, you're using MapReduce to do your ETL processing.)

"Hadoop is not for all types of workloads and we recognize that. In some ways, the Big Data Edition is elastic. Even if you're doing a big data project, you're clearly going to want [to involve] some of your more traditional [data] sources, too," says Haddad, who adds: "Don't you want one package that can do it all?"

Haddad and Informatica aren't necessarily insisting on an arbitrary distinction. Some critics allege that although MapReduce-powered ETL is a good fit for certain kinds of workloads, it makes for a comparatively poor general-purpose ETL tool.

"[MapReduce] is brute force parallelism. If you can easily segregate data to each node and not have to re-sync it for another operation [by, for example,] broadcasting all the data again -- then it's fast," said industry veteran Mark Madsen, a principal with information management consultancy Third Nature Inc., in an interview earlier this year.

The problem, Madsen drily noted, is that this isn't always doable.

Haddad acknowledges that most of Informatica's competitors market Hadoop- or Big Data-ready versions of their DI platforms. On the other hand, he insists, PowerCenter Big Data Edition supports both Hadoop MapReduce and conventional ETL. For this reason, and in view of the shortcomings of MapReduce-powered ETL for certain kinds of workloads, Informatica's is the more "flexible" approach, Haddad claims.

"As companies move more of their workloads to Hadoop, you don't want them to go back to the stones and knives of hand coding," he points out, "so we provide the ability to remove hand coding within Hadoop for ETL and things like that. We also make it possible for [customers] to design and build [DI jobs] once and deploy [them] anywhere: on a traditional grid or on Hadoop."

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Informatica Takes on Big Data

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research