TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Modernize and Govern: Unifying Your Data Strategy July 10, 2025
  - Expert Panel: Best Practices for Modernizing Your Data Environment July 14, 2025
  - Powering Data Science with AI-Driven Tools and Practices July 15, 2025
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Executive Summit AI Accelerate 2025, Brought to You by AI Boadroom & TDWI August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Q&A: Deflating Big Data Myths

Does the term big data still have meaning? Analyst and author Barry Devlin tackles that question and others about the popular technology.

By Linda L. Briggs
June 4, 2013

Does the term big data still have meaning? Well-known analyst and author Barry Devlin tackles that question and others, including the often-overlooked privacy concerns big data raises. "The combined total of all the data available on the Web and in private databases is so large and so interrelated," Devlin says, "that privacy becomes well-nigh impossible."

A founder of the data warehousing industry, Devlin speaks and writes widely on the topic, including as an associate editor of TDWI's Business Intelligence Journal. He has 30 years of experience in IT as an architect, consultant, manager, and software evangelist. His company, 9sight Consulting, provides consulting services to clients worldwide. His newest book is due out later this year.

BI This Week: Let's begin with a point you made in a recent article for BeyeNetwork: Is there a difference between big data and just data, period?

Barry Devlin: I've struggled with the term "big data" since I first came across it over three years ago. The problem is that "big" is a completely subjective term. Similarly, all the "v" words -- and there are now so many of them, including velocity, variety, volume, veracity, and so forth -- are also entirely relative. How does a business know if they have a big data requirement, problem, or opportunity? The short answer is: they don't.

In this confusion, the market has been deluged with big hype from vendors (there's another one of those "v" words!) who are labeling existing products as big data or NoSQL, and bolting Hadoop onto just about every conceivable platform. I've recently seen IDMS described as one of the earliest NoSQL databases! What about IMS?

Then there's the misconception that big data equals Hadoop. Of course, this isn't true, irrespective of how you try to define "big data." Hadoop is just an open source, parallel programming environment for commodity hardware platforms.

So what are customers doing? We found in a joint 9sight /EMA survey last year that they seem to be declaring many projects as big data that they would have undertaken anyway under some other label. My guess is that this is often for internal justification needs -- big data is a topic the executives have seen in the business press as a "must have" for success, so it's easier to get budget approval for a project with "big data" in the title. (This might sound cynical, but many IT shops work this way.)

My overall take on all this is to step back and try to rationalize the situation. The term "big data" cannot easily be defined. Some vendors are attaching it to a wide range of products, while others equate it with Hadoop. Clearly, customers have already adopted and adapted the term for their own purposes.

If I could stop people from using the phrase, I would. Since I cannot, I have taken the position that "big data is all data," and I try to move the discussion to three data types that have better-defined characteristics: human-sourced information, machine-generated data, and process-mediated data. (See my white paper "Big Data Zoo" for more details on these three information domains).

That helps to explain the following point, also from one of your recent blog postings: "Many so-called big data projects have more to do with more traditional data types, i.e., [are] relationally structured, but are bigger or require faster access."

When WalMart was building the world's largest data warehouse back in the 1990s, they failed to notice they were undertaking a big data project. Why was that? Because the term had not yet jumped the "species boundary" from the world of physical science, where it was first used. Today, many similarly scoped projects using large volumes of traditional, relational, process-mediated data are labeled big data.

If we have big data, do we also have big analytics?

What the hype around big data has done is to create focus around statistical analysis of large volumes of information. Those of us who have been around a while (since back in the mid-1990s!) remember when this was called data mining. What has happened, in my view, is that advances in technology (processing, memory, and storage) have enabled at least two things: (1) broader use of statistical analysis techniques because of a much lower barrier to entry and (2) a move from sample set to full-set analysis.

Combined with the mushrooming of social media data (which is a component of human-sourced information), the opportunities for analytics have grown enormously. Certainly, there is value to be found there.

My concern is that statistical analysis is somewhat of a "black art" in which it is extremely easy to draw invalid conclusions from the process if one lacks basic statistical training. The suggestion that business users can self-serve if the tools are simple enough is worrying. Self-service BI is far less demanding of user skills, and even there, we have seen cases where data is misused, or used to draw incorrect conclusions. Businesses must move forward carefully in this area. Big data / big analytics is, I often joke, spreadsheets on steroids.

On the positive side, the elevation of data scientists to guru status has driven universities to take notice and begin to train students in some of the basics of playing with data -- from how to gather and integrate it, through analysis techniques and limitations, to the ethical concerns around privacy and accountability, particularly around personal information.

Here's a basic question from your blog (in fact, I'm quoting you): "Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances?"

We do need to examine all these prior architectural and design decisions, which were made when technology was much less powerful, data sources were entirely internal, and business demands were simpler. The emerging "biz-tech ecosystem," as I call it, is a much more complex and deeply interconnected environment than existed in the 1980s, when data warehousing was invented. Some of the design decisions will still be valid, while others have outlived their usefulness. This a topic I've been developing for a number of years now; it's a foundation for my new book "Business Unintelligence -- Via Analytics, Big Data, and Collaboration to Innovative Business Insight," which will be released later this year.

With all the talk about big data, there doesn't seem to be much attention paid to its downside and possible misuse. Can you discuss that?

Yes, this is a vital topic to which far too little attention is paid in the mainstream. Here are two quotes I use in my book:

Media that spies on and data-mines the public is destroying freedom of thought and only this generation, the last to grow up remembering the "old way," is positioned to save this, humanity's most precious freedom. -- Eben Moglen, professor of law and legal history at Columbia University and chairman of the Software Freedom Law Center at re:publica Berlin, in May, 2012

The Senate [passed] legislation ... granting the public the right to automatically display on their Facebook feeds what they're watching on Netflix. ... However, they [lawmakers] cut from the legislative package language requiring the authorities to get a warrant to read your e-mail or other data stored in the cloud. -- David Kravets, in Wired magazine, December, 2012

The point here is that big data in the sense of the combined total of all the data that is available on the Web and in private databases is so large, and so interrelated, that privacy becomes well-nigh impossible. While researching my book, I came across two articles published in the New York Times in the same week during January 2013 that illustrate the point. The first describes how patient records -- transcribed and digitized from doctors' notes, made anonymous, and stored on the Web -- can be statistically mined to discover previously unknown side-effects of, and interactions between, prescribed drugs. That's clearly useful and valuable work. The second article, three days later, revealed how easily a genetics researcher was able to identify five individuals and their extended families by combining publicly available information from the supposed anonymous 1000 Genome Project database, a commercial genealogy Web site, and Google.

The underlying genetic data is used in medical research to good effect, of course, but what are the possible consequences for those individuals thus identified. Certainly, we can imagine that insurance companies, governments, or other interested parties might well make potentially negative assessments based on their once-private genomes.

Let me end by saying that the old phrase caveat emptor -- buyer beware -- is in need of updating. You no longer have to buy anything before taking care of what you are exposing of yourself. Your Google search history may be enough.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Q&A: Deflating Big Data Myths

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research