TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Picking the Right Platform: Big Data or Traditional Warehouse?

Big data or the data warehouse? Pick the wrong platform for the wrong workload and you could find yourself racking up hundreds of thousands (or even millions) of dollars in extra costs.

By Stephen Swoyer
December 17, 2013

The recent Teradata Inc.'s Partners Users Group (Partners) conference broke new ground in many ways -- not least with the framing of a question. This came courtesy of industry luminary Richard Winter, who outlined his vision of "TCOD" -- or "total cost of [big] data."

When, asked Winter, is it more sensible to use a big data platform and when is the traditional data warehouse (DW) a superior option? It's the kind of question that simply. would. not. have. been. asked at Partners events.

Winter, a pioneer in research and analysis of very large database (VLDB) platforms, said picking the wrong platform for the wrong job could rack up hundreds of thousands -- or hundreds of millions -- of dollars of unnecessary costs. In fact, he argued that misusing Hadoop for some types of decision support workloads could cost up to 2.8x more than a data warehouse.

"[This] cost is mainly in the complex queries and analytics. The problem is that it's much more expensive to create these queries in Java MapReduce than it is in a data warehouse technical environment," Winter said. "Each technology has its sweet spot, [and] each sweet spot delivers huge savings to the customer. However, if you get outside that sweet spot, it goes the other way."

TCOD Explained

Winter's papers on VLDB deployment, scale, and cost issues were required reading in the 1990s and 2000s. In his Partners presentation, he framed the issue with characteristic succinctness.

"Under what circumstances, in fact, does Hadoop save you a lot of money, and under what circumstances does a data warehouse save you a lot of money?" he asked, adding that TCOD, unlike other costing metrics, comprises a "complex cost estimating problem."

"If you want to look at the total cost of a project, total cost in the IT sense, what do you look at?"

It's in this respect, Winter argued, that traditional costing measures are inadequate. For example, a metric such as total cost of ownership (TCO) attempts to account for the (acquisition) cost of a system plus its ongoing maintenance. Missing from this is a slew of other costs, such as (with respect to decision support) the cost of developing and maintaining ETL, analytical applications, queries, and analytics -- along with the cost of upgrading the system over five years.

This takes into account the paradoxical cost of platform success: the more you give users (in terms of analytical applications, queries, or analytics), the more they'll want; paradoxically, then, the cost of a successful decision support or analytical platform tends to increase over time. "Costs grow as users find ways to leverage [the] value of investment," Winter explained.

For this reason, he built a CAGR of 26 percent over five years into his TCOD calculus. Elsewhere, TCOD uses published list prices per terabyte, per system: for Hadoop, this is $1,000 per TB; for the data warehouse, Winter says he "averaged" the prices of three "widely-used products" and also factored in the enterprise discount (usually 40 percent) most vendors offer. Salary information for full-time employees (FTE) was sourced from indeed.com; project costs (which also take into account lines of code contributed on a per-FTE basis) were sourced from qsm.com, which maintains a database with metrics derived from more than 10,000 completed software projects.

Winter found that a platform like Hadoop is significantly less expensive than a DW for some workloads. He used the example of "data refining," or the process by which manageable data sets are produced from the deluge of information generated by machines, sensors, applications, services, and so on. This casts Hadoop in the familiar "landing zone" role -- i.e., as an ingestion point for the landing and preparation of data for analysis. The cost economics of Hadoop are inversely related to those of the DW, Winter explained: Hadoop system costs tend to be drastically lower, Hadoop development costs tend to be much higher. The upshot, Winter argued, is that for a data-refining or landing-zone use case with 500 TB of storage, the cost of storage in a DW is several times that of Hadoop. Hadoop, then, is a significantly less expensive platform for these workloads.

Winter's enterprise data warehouse (EDW) comparison found just the opposite. He used the example of an EDW in a large enterprise environment, with 25 FTEs producing 10 new distinct complex queries and one new distinct analytic per day. Annually, these FTEs produce 300,000 lines of new code for analytical applications. The data volume baseline of this system, too, is 500 TB.

In both cases, its cost is staggering: Winter pegs the cost of an MPP-powered DW at a combined $265 million over five years. This is high, to be sure, but it's a fraction of the cost of its Hadoop equivalent, which costs 2.8x as much -- or approximately $740 million.

Winter added a few minor caveats. First, he said, TCOD works across a range of volumes: "My assumption was 500 TB [for the EDW use case]; at 50 TB, the savings is actually increasing with the data warehouse over Hadoop," he explained. "You can go way up in volume and still see a similar dynamic."

In addition, Winter's TCOD estimates don't take into account workload modeling and capacity management, which he says could produce different numbers. For the decision support use case, TCOD also uses vanilla Hive in place of new projects -- such as Impala -- which graft an interactive query facility onto Hadoop. Even though a technology such as Impala is faster than Hive, it's still considerably slower (as a query platform) than an MPP-powered data warehouse.

There's also the fact that some MPP data warehouse platforms -- such as Teradata -- incorporate advanced tuning and workload management features. These take years to develop and are mostly missing from Hadoop: "Simple queries aren't really completely equivalent on Hadoop and the data warehouse; if you have to do a lot of them on Hadoop, you get into issues of concurrency, and if you have to do a lot of concurrent work with different ... objectives, you get into workload management."

Winter wrapped up his presentation with a pragmatic assessment. "When you're launching a new initiative, creating a major new workload, bringing a major new source of data into your environment, you want to make an informed decision, taking into account along with other factors the total cost you're likely to see over time," he said. "This framework gives you a way to do that. The examples show that total cost is very sensitive to the choice of technology, so it's dangerous to think that all of your requirements are going to play out the same way in terms of cost."

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Picking the Right Platform: Big Data or Traditional Warehouse?

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research