TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
  - Redefining Clinical Operations with Agentic AI: Accelerating Innovation Across Data Management and Site Monitoring July 30, 2025
  - Smarter Marketing in Retail: How AI and Modern Data Foundation Drive Growth July 31, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Platforms & Architecture Week July 25, 2025
  - AI Bootcamp Week July 25, 2025
  - Data Governance Week July 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Big Data Will Create More Accurate Predictions

Although advances in technology have dramatically improved our ability to analyze and mine vast amounts of data, we must remember that the quality of our predictions is directly dependent on the quality of our data.

By Mike Schiff
August 11, 2015

Early Analytic Techniques

One of the earliest examples of data analytics was gathering two data points, each of which represented an independent variable (such as number of years of education) and a dependent variable (such as salary), and connect them with a straight line. Another person's salary would then be predicted by seeing where that person's years of education fell on the straight line.

As new data points (or observations) were added, methods such as least-squares fitting would be used to derive a new straight line that best fit the observed data points. This technique was known as simple linear regression. When additional dependent variables (such as age) were added, analytics evolved into multiple linear regression. Although numerous techniques have evolved to test the validity of the predicted results and derive new prediction algorithms, in many cases their capabilities were constrained by both the available data and the available computing power.

Technology Enhances Analytics Capabilities

Consider, for example, data mining. It wasn't that long ago when most data mining analyses used a relatively small subset of the available data to discover patterns and relationships. Along came parallel and distributed processing, solid state storage, and in-memory databases. Hardware costs decreased. For example, in the late 1960s memory cost approximately $1/byte and disk storage approximately $0.10/byte; today a megabyte of memory can be purchased for less than a penny and a gigabyte of storage for less than a nickel. Now, vast amounts of data can be quickly analyzed. This is truly a case of "cheaper, better, and faster!"

Another factor that has contributed to advances in predictive analytics is new database structures such as the Hadoop distributed file system and other "NoSQL" databases. Although relational databases excel at organizing and processing structured data, non-relational data structures can be more appropriate for the vast amounts of semi-structured and unstructured data that organizations now want to analyze and mine in their attempts to discover new insights.

In addition to data generated by transaction processing systems, today's data sources include sensor data, social media data, and even voice data from call center interactions. Depending on the organization (e.g., industrial, healthcare, government), these insights might relate to customer behavior, medical treatments, or even potential terrorist activities. Our ability to process greater amounts and types of data can only serve to improve the accuracy of predictive analytics by providing a more complete view of the subject at hand by increasing the total number of underlying data points.

Data Quality is More Important than Ever

However, as our ability to generate better predictions continues to improve, we must recognize that big data comes with big responsibilities. We must not forget that the accuracy of these predictions is only as good as the accuracy of the underlying data. Garbage-in, garbage-out still applies today and always will. We must continue to take proactive steps to ensure the quality of our data lest our data lakes become polluted data swamps.

The ability to cost-effectively analyze vast amounts of data can certainly generate additional insights and better predictions. However, if the big data under analysis is of poor data quality, it might also produce erroneous predictions, counter-productive decisions, and result in big data problems. The quality of the data we analyze is often more important that the quantity.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Big Data Will Create More Accurate Predictions

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research