TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Pivotal's Hadoop-based Data Management Stack Coming Rapidly into Focus

Hawq, Data Dispatch, and GemFire XD are just a few of the DM-related products Pivotal delivered in 2013 for its Pivotal HD Hadoop distribution.

By Stephen Swoyer
January 7, 2014

Early in 2013, EMC Corp. spin-off Pivotal started shipping Hawq, an implementation of its Greenplum massively parallel processing (MPP) DBMS for Hadoop. Pivotal positioned Hawq as the centerpiece of Pivotal HD, its proprietary Hadoop distribution.

At last year's Strata + Hadoop World conference, Pivotal flanked Hawq with two complementary offerings: Pivotal Data Dispatch (which it bills as a data discovery facility for Hadoop) and GemFire XD (an in-memory database technology first acquired by VMWare Inc.).

Industry veteran Dave Menninger, head of business development and strategy with EMC Greenplum, describes GemFire XD as an in-memory database cache for Hawq, which -- like the Greenplum MPP database (and other MPP engines, for that matter) -- isn't a real-time database.

"Think of [GemFire] XD as in-memory database cache and Hadoop as the persistence for that information," Menninger explained. "We have capital markets that use this technology [because] it's for very low-latency rapid ingestion and analysis of information. The other key market besides financial services communications -- telcos, for example -- want to be able to capture streaming, real-time events, [such as] telemetry information, market data, and communications network information."

Data Dispatch, on the other hand, is designed to support a Hadoop "landing zone" use case. This describes a scenario in which an organization uses Hadoop as a landing, consolidation, and staging area for enterprise information. "This is technology for managing Hadoop-based data landing zones or data lakes. Organizations are inserting Hadoop as a collection point for all of the data that's being generated in their organizations. From that data landing zone or data lake, they're populating their data marts or data warehouses, or they're creating sandboxes, because you can typically collect that information in its raw form," he explained.

"We see the concept of a data landing zone emerging and Pivotal Data Dispatch is a tool for managing a collection of information into that landing zone and then the movement of that data through the data marts and data warehouses."

Data Dispatch also enables what Menninger called a "data lease" model: "You subscribe to or lease the information, and at the end of a lease, your access is terminated," he explained.

Pivotal's DM Strategy Comes into Focus

Hawq, Data Dispatch, and GemFire XD are a few of several DM-related products Pivotal delivered in 2013 for its Pivotal HD Hadoop distribution. (Another important deliverable is Spring XD, which Pivotal bills as an application development framework for big data.) At the same time, Pivotal has tweaked its messaging: when it first launched HD, for example, it sought to trumpet Hawq's massive performance advantage (up to 600x faster) with respect to Hive, the SQL-like interpreter for Hadoop that suffers from poor performance relative to MPP RDBMS engines.

Contextually, this made sense: months earlier, Pivotal competitor Cloudera Inc. had announced a new interactive query facility (Impala) for Hadoop, and -- at the same Strata 2013 event at which Pivotal announced Hawq -- Hortonworks Inc. unveiled "Stinger," its effort to improve both the performance and the flexibility of standard Apache Hive.

At Strata + Hadoop World, however, Menninger struck a more pragmatic chord, describing Hawq as (in effect) Hadoop made safe for data management (DM) practitioners.

He placed less emphasis on Hawq's performance -- which he dutifully described as "two orders of magnitude better than Hive" -- and much more on its DM feature set. Hive, Menninger claimed, currently lacks support for key SQL and RDBMS amenities -- including database transactions, subqueries, and materialized views.

By contrast, he pointed out, Hawq was arguably the first ACID-compliant, ANSI-standard SQL RDBMS implementation for Hadoop. Over the last 18 months, the open source software (OSS) Apache community and several commercial software vendors have invested a staggering amount of money and effort to help shore up Hadoop's DM feature set. In spite of this, Menninger claimed, Hawq remains the most SQL-savvy of Hadoop DBMS technologies.

This matters to enterprise IT organizations, he asserted. "If you're a more traditional enterprise, you have huge investments in SQL-based skills. Hawq is the best of both worlds. The data doesn't move. The data is in Hadoop. [HDFS] is the underlying storage mechanism. The results of your [MapReduce] analyses or just your data collection are immediately available to people with a SQL-based tool or SQL knowledge," he said.

"The fundamental issue with Hadoop is not its power, not its capability -- it's accessing the information that's in Hadoop. What Hawq does is it brings the entire world of SQL -- skills, knowledge, tools -- to the Hadoop community."

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Pivotal's Hadoop-based Data Management Stack Coming Rapidly into Focus

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research