TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

What Big Data is Really About

Ignore the hype surrounding big data. What's really important is to learn about the new models for data processing that big data is bringing so you can plan rather than react.

By Mark Madsen
January 22, 2013

[Editor's note: Mark Madsen is leading several sessions at the TDWI World Conference in Las Vegas February 17-22, 2013. ]

Big data isn't hype, but it is being hyped. There is substance to the technology shift happening in the broader data management market of which both business intelligence and big data are a part. The real question to ask is "what's different?"

The constant drone of the "three Vs of big data" we keep hearing in the media doesn't explain much. This focuses on a literal interpretation of big data, explaining it in terms of bigness (except when the data isn't big), variety (except when there isn't any variety in the data), and velocity (except when the data is processed in batch). So you can have big data without any of the three Vs, making this an empty definition.

Big data implies big, but is it? Many people are using the technologies to process moderate volumes of data, perhaps in the same way we use ETL. Big isn't necessarily a euphemism for unstructured data, either. Much of that data is structured. It might be log files, but those logs are events, generally easy to map to a relational table. It may be text, variably structured data, or the simple rows and columns we're used to.

The term implies that the shift is about data, but it's equally about technology. One assumption is that big data technology equals Hadoop. It's more than Hadoop. There are real-time data stores, processing and analytics engines, and streaming technologies for monitoring and processing data as it flows. Some are built on top of Hadoop or HDFS while others exist independently. What they usually share is an ability to be deployed in dynamically scalable configurations.

The reality is that big data is about new models for data processing. It's isn't some specific type of data, or huge volume of data, or specific technology. It's about applying new technologies to meet unfulfilled needs that (usually) can't be met by the traditional data warehouse architecture.

The areas where a data warehouse has difficulty are analytic processing, some types of data processing and transformation, and timeliness of development.

Some analytic processing is possible in SQL. Because of this, many database and analytic tool vendors say there's no need to change, or the answer is to use a more scalable database. Depending on the scale, the types of data, the user concurrency, and the algorithm, this may be true. It's equally possible that one of these elements limits the use of a database, which pushes the data warehouse to the side, as the source of the data that has to be moved, transformed, and processed elsewhere.

There are areas of basic data processing that the data warehouse technology stack has trouble with. This is less a failure of the database than of the data integration tools and the architecture. At large scale, processing becomes slow or expensive (or both). If the data is text or has a complex data structure, the DI tools may be poorly suited to the work. We end up in a situation where both the processing and the storage can be a mismatch to the tools we have available.

A constraint on the data warehouse today is the lack of agility: the response to changes or the need for rapidly cycling models and uses, as with much exploratory or experimental analytics. "Experimental" isn't restricted to scientific processing. A/B or multivariate testing of landing pages on a Web site is a form of experimentation. So are test marketing campaigns and staged product launches. It's a style of using information that means the data, models, and integrations need to be changed and updated rapidly -- something the data warehouse was not designed for. It was designed for predictable, or at least bounded, uses that didn't change significantly.

This is a constraint that is baked into the architecture. Models are static in the database, and both data integration and BI tools maintain mappings to the static database model. Each layer in the architecture managed independently, usually by different people. The capabilities in one layer are generally not available to tools in the other layers. Any change requires coordination from top to bottom. All this, and the need to model in advance, constrain the speed with which a data warehouse can react to change.

A data warehouse is good for storing the important data, and for delivery of information when the usage model is interactive query, as with BI tools and dashboards. It's less well suited to the read-write activities of exploratory analysis, the data-intensive processing of analytic models, and high-volume, low-latency, real-time workloads.

These are the usage models that various big data technologies were designed to address. They are designed more for these unmet needs than they are for the conventional workloads of BI. Because of this, they lack key features such as robust query support and good data management, to the point of sometimes lacking concepts such as metadata and schemas that are taken for granted in the data warehouse world. The features are missing because they aren't as important for non-BI uses.

Big data is about data processing and new usage models. As an architect or designer, it's helpful to look at what's different about the technologies available, the data being processed, and the uses. Big data isn't a replacement for data warehousing, nor is it an island to be maintained separately. It's part of the new IT environment in the same way data warehouses and BI were a new addition to the IT environment when there was only OLTP and batch reporting. We're still in the early stages of the market, making today the time to learn about the changes that are coming. Otherwise, you'll be reacting to changes instead of planning for them.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

What Big Data is Really About

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research