TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Q&A: Big Data Meets Hadoop

Big data and Hadoop are popular tech terms, but what does the relationship of these two technologies mean for BI professionals?

By James E. Powell
June 26, 2012

Big data and Hadoop are popular tech terms, but what does their relationship mean for BI professionals? For answers, we turned to Paul Flach, vice president, enterprise analytics delivery at Stream Integration, who is leading two sessions on big data and Hadoop at the TDWI World Conference in San Diego (July 29-August 3, 2012).

Is Hadoop going to drastically change or even replace my data warehouse environment?

Hadoop is going to revolutionize the way you do analytics and your ability to deal with enormous volumes of data that previously were not accessible. However, it is not going to drastically change existing data warehouse (DW) architectures and certainly won't replace them.

Remember, RDBMSes have been evolving for almost 50 years and have become sophisticated. Although it is not pure brute force, Hadoop does not offer the same sophistication. Hadoop should be seen as a means to complement the DW architecture by processing the data flows and analytics that are beyond DBMS capabilities in terms of variety, volume, velocity, and even veracity.

What are the main differences between relational database systems and MapReduce?

Both have their strengths and weaknesses. RDBM systems mostly depend on structured data with a known schema. MapReduce works best with unstructured data and can also work with structured data. Also, MapReduce is read-oriented and there are no update capabilities in its current capabilities. RDBM systems are strong in transactional processing where MapReduce is batch oriented.

One other difference to consider is compression. Hadoop is very limited in terms of your ability to make use of compression techniques, and when you are throwing around petabytes of data, you need compression.

What are the 3 Vs of big data and how are they used to determine the right solution for my architecture?

The 3 Vs are volume, velocity, and variety.

We all know that data volume is growing geometrically, so we know what volume is. Velocity considers those time-sensitive processes such as fraud detection, where data is streaming in at a rapid rate and needs to be monitored in-stream in order to maximize its value. Finally, variety means that we are dealing with any type of data, from structured to unstructured such as text, sensor data, audio, click stream, log files, and others.

If you are dealing with volume of data on its own, you do not necessarily have a “big data” problem. An MPP shared-nothing platform or appliance provides robust capability and many provide the near-linear scalability required to solve today’s data volumes.

If you have data volume and velocity, there are technologies, such as Infosphere Streams from IBM, that are in-stream analytics platforms that provide real-time distributed processing. The benefits of Hadoop can be fully realized when you have all three Vs, in which case you have a “big data” problem. Again, one of the major benefits of Hadoop that distinguishes it from RDBMSes is that it solves the problem of data variety.

What is the best way to introduce Hadoop into my data warehouse architecture to get started?

A good introduction to Hadoop is to look at it as an extract, transform, and load (ETL) technology to complement your existing environment as a means to process Web logs, social data, text data, or machine-generated data. Remember, it will not replace your ETL architecture as a means to stream data directly into your reliable relational structures.

The outputs of a pure Apache Hadoop implementation will be stored in Hbase, a column-oriented database. From here, the outputs stored in Hbase can be further processed and stored in your SQL-based data warehouse.

What are the dynamics at play that determine the effectiveness of my big data architecture?

When you are designing a big data system, you need to look at performance, fault-tolerance, and a flexible query interface.

Performance is perhaps the most obvious characteristic when designing a system and will have a direct correlation to cost savings, especially where you can avoid expensive hardware upgrades. The movement in Hadoop implementations is towards low-cost or commodity hardware. However, the trade-off to this approach is that you have a higher potential for failure compared to high-end reliable hardware and DBMS systems with built-in fault tolerant capability. As you scale your commodity hardware to meet your data volume, you increase the probability of failure.

From a user-interface point of view, you have to remember all the money you have invested in the SQL-based BI technologies that your organization is accustomed to. SQL is a standard that has given business analysts easy access to data through ODBC and JDBC connectivity, without having to deal with your database software directly. Your architecture must continue to be friendly to your analytical community that will continue to communicate through an SQL interface.

What is the biggest obstacle preventing the adoption of big data technologies in today's enterprise?

The biggest obstacle is the same as it has always been for BI in general: organizations are not developing analytical skills at the same rate that technology is developing. Big data technology has been most successfully implemented where the determination to produce sophisticated analytics has been driven by a highly skilled analytical community.

Google had an analytic problem to solve and they refused to be constrained by the 3 Vs, so they implemented MapReduce to solve that problem. If your organization is slow to adopt your self-service BI tool, you certainly don’t want to present Pig and Hive as a more user-friendly alternative.

What is the best investment you can make to develop this analytical culture?

Before you make an investment in any more technology, send your analysts to a community college-level statistics course. If we are going to turn the corner and enter this new era of “data science,” analytics must become as common as any other comprehension skill for the entire organization and can no longer be relegated to the scientists.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Q&A: Big Data Meets Hadoop

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research