TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
  - Why Enterprises Aren’t Ready for AI—And How to Fix It September 18, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Hadoop Usage Poised to Explode

Today, comparatively few enterprises are using HDFS, the distributed storage substrate of the Hadoop framework. That's quickly changing.

By Stephen Swoyer
April 23, 2013

Comparatively few enterprises are currently using the Hadoop Distributed File System (HDFS), the distributed storage substrate of the Hadoop framework. Only about 10 percent of organizations are using HDFS in production today, according to survey data from TDWI Research. That said, a simply staggering proportion of organizations expect to be using Hadoop. All told, three-quarters (73 percent) of respondents have either deployed (10 percent) or expect to deploy (63 percent) HDFS in production.

Slightly more than a quarter (27 percent) of respondents say they don't have any HDFS deployment plans.

These are some of the more intriguing take-aways from Integrating Hadoop into Business Intelligence and Data Warehousing, a new report authored by Philip Russom, research director for data management with TDWI Research.

According to Russom, "Many business intelligence (BI) and data warehousing (DW) professionals are looking at HDFS, MapReduce, and other Hadoop technologies as ways to cost-effectively extend their existing BI/DW infrastructure. For example, many DW environments need a bigger and better data staging area, which HDFS can enable. Many BI programs need to embrace a broader range of analytic techniques, which MapReduce can do. Furthermore, very few BI and DW solutions as yet do anything serious with unstructured data, which a number of products in the Hadoop family can assist with."

The TDWI survey, based on a sample of 263 respondents, suggests that Hadoop adoption could ramp up very quickly: for example, more than one-quarter (28 percent) of respondents expect to be managing production deployments of HDFS in the next 12 months. Others expect their Hadoop deployments to come online more gradually: 24 months (13 percent), 36 months (10 percent), or more than three years (12 percent).

Not surprisingly, HDFS and MapReduce, the parallel processing counterpart to HDFS' distributed storage substrate, are today the two most used Hadoop technologies: just over two-thirds (67 percent) of Hadoop adopters use HDFS; an even bigger percentage -- 69 percent -- use MapReduce. This, too, isn't surprising, Russom explains, because some Hadoop vendors (e.g., MapR) implement proprietary file systems. MapReduce itself has been implemented in many contexts: for example, two prominent analytic database platforms (Teradata Corp.'s Aster Discovery and EMC Corp.'s Greenplum) have supported in-database MapReduce -- across their own MPP clusters -- for almost five years.

"The high MapReduce usage also explains why Java and R ranked fairly high in the survey; these programming languages are not Hadoop technologies per se, but are regularly used for the hand-coded logic that MapReduce executes," Russom writes.

"Likewise, Pig ranked high in the survey as a tool that enables developers to design logic -- for MapReduce execution -- without having to hand code it."

Outside of these, Russom and TDWI found that certain Hadoop technologies tend to be more popular than others -- at least among TDWI's core audience of BI and DW practitioners. For example, half of respondents plan to adopt Mahout (an open source machine learning library for Hadoop) within the next three years; 44 percent say the same about R, a programming and execution environment for statistical computing.

Finally, 42 percent likewise plan to adopt Zookeeper -- a fault-tolerant synchronization facility for distributed applications -- in the same three-year window.

Hcatalog Adoption Lags

Only 40 percent of respondents said they plan to use Hcatalog, which comprises a nominal metadata catalog for Hadoop. A high percentage, to be sure, but many BI and DW tools use Hcatalog to get structured information out of Hadoop.

"We do have support for Hcatalog," says Rick Glick, vice president of technology and architecture with ParAccel Inc., who says that Hcatalog is the primary programmatic means by which ParAccel gets information out of Hadoop.

That said, he continues, Hcatalog still isn't commonly used. "[Hcatalog is] more [common] than what else is out there, [although] there's also the Hive catalog," he continues. "Most users tend to build something themselves to let them know [what they're storing in Hadoop]. Everybody throws data in there with an eye to using it somehow, or simply [as a means] to archive it with a way [i.e., a customer-specific schema] to get it out. Yes, sometimes people use Hcatalog, but it's actually not commonly used." In most cases, Glick says, customers use a "brief schema definition of the files" in Hadoop in place of Hcatalog.

If Hcatalog's lagging adoption is a puzzle, that of other Hadoop technologies isn't.

For example, comparatively few BI or DW professionals expect to adopt Chukwa (4 percent) or Ambari (6 percent); the former focuses on large-scale log collection and analysis; the latter is a still-incubating Hadoop management project. Neither is an explicitly DM-oriented project. Over time, Russom expects that some laggards -- e.g., Hcatalog, Ambari -- will likely see increased adoption.

"BI professionals are accustomed to DBMSs, and so they long for a Hadoop-wide metadata store and far better tools for HDFS administration and monitoring," he writes. "These user needs are being addressed by HCatalog and Ambari, respectively, and therefore TDWI expects both to become more popular."

Russom's 36-page report addresses many aspects of Hadoop adoption and deployment. You can download it at no cost from TDWI's website.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Hadoop Usage Poised to Explode

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research