TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Without Context, Big Data is Flat Data

It's not enough to know what your customers are doing. You need context to understand why customers do what they do.

By Jeff Catlin
March 1, 2016

Let's face it, these days big data is big. It's in the press every day, but what exactly is big data? In truth it can be anything from highly structured numerical content from things such as the Large Hadron Collider to more mundane such things as Google Search Logs of all tweets made in the last 30 days.

For our purposes, let's stick with the semi-structured or unstructured content sources of big data. The lack of structure in this sort of content makes it harder to understand what's going on in the data and spot the patterns and insights that are hiding within it (the hidden business value). To solve this problem, a variety of technologies have emerged, including "machine learning," "deep learning," and text mining. Before diving in and extoling the virtues of these technologies, let's take a step back and look at the problem we want to use big data to understand, not just what is happening, but why it's happening (the context of the data).

Why is it so important to understand the "context" of the information we're dealing with? To put it in layman's terms, it's the flavor in food. We could live without a sense of taste and smell but without it we'd miss out on what makes food worth eating. Similarly, we can mine big data and understand that Apple and the iPhone are mentioned everywhere, but it's the context that tells us why iPhone users are rabid loyalists who love the iPhone (hint: it's the superior user experience). With context we gain some understanding of why iPhones are so popular.

How do we dig in and understand the context? We've already mentioned two of the technologies (machine learning and text analysis). Machine learning is everywhere these days, so it's almost certain you've heard the phrase, most likely associated with IBM Watson and the Jeopardy TV show where Watson beat two humans in a general knowledge contest. What most people don't realize is that Watson is much more than simple machine learning.

The machine that beat humans was a mix of all technologies and it's that mix of machine learning and text analysis that made Watson what it is. Machine learning is a wonderful technology for classifying and cataloging information (this is about baseball, that is a car review), but it's not very good at adding flavor to that classification. The sentence "I wish GM created a cool new sports car" is much different than "GM created a cool new sports car," and it's text analysis that lets us dig in deeper and understand the flavor of the statement, the first being a desire for something and the second a positive statement about something that exists.

How do you extract context from the content (the flavor of the food). As humans, we understand the difference between "I wish GM created" and "GM created." One is a desire and the other is an opinion about an actual thing. As a person who might be looking to buy a car, we would put more emphasis on the opinion than the desire. How do we get a machine to understand that these nearly identical sentences have very different contexts? Grammar parsing answers this need, allowing a machine to codify human understanding, particularly as it relates to things like sentiment. Let's look at the Grammar parse of "I wish GM created a cool new car."

The correct grammar parse shows the beginnings of how we ascertain that this isn't really good news for GM because it represents a desire. With this parse we have rules for how things such as sentiment get attached to the entities described. "GM created a cool new car" is clearly good news for GM because cool is a positive word in the context of cars.

How do we then figure out that wish weakens the sentiment? It turns out that action words such as "wish" or "instructed" have rules that modify the sentiment in the tree; wish weakens the sentiment to its left, while instructed would move the sentiment on the right to the object on the left "I", so "I instructed GM to create a cool new car" would actually be good sentiment for me. It's this in-depth understanding of the text that lets us glean the context of a document and make better business decisions.

A big data collection of car aficionados is going to have information about "cool new sports cars", but without context we'll never understand whether they are talking about rumored production of new products or the recent release of a new Corvette. If we don't have the context of the discussion, we could easily make bad inferences about what the data is telling us and use this factually accurate yet incomplete picture to make really bad business decisions.

Without context, big data is flat data. That may be understating context's importance because without context, big data insights can be bad data insights. However, with good context you'll really understand the "what" and the "why" of these mountains of information so you can make insightful and reliable decisions. As you begin to consider what and how to leverage big data in your enterprise, ensure that if it includes unstructured information that you select a technology that can mine this information deeply enough to lead you to reasoned and accurate decisions.

About the Author

Jeff Catlin is the CEO of Lexalytics, the leader in cloud and on-prem text analytics solutions. With over 20 years of experience in the fields of search, classification and text analytics products and services, Jeff held senior management positions at Thomson Financial, Sovereign Hill Software and LightSpeed Software prior to founding Lexalytics. You may contact Jeff at jeff.catlin (at) lexalytics.com or sales (at) lexalytics.com.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Without Context, Big Data is Flat Data

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Without Context, Big Data is Flat Data

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career