TDWI BI & Analytics Glossary – What is a Data Lake?

A data lake is an unstructured data repository that contains information available for analysis. A data lake ingests data in its raw, original state, straight from data sources, without any cleansing, standardization, remodeling, or transformation. It enables ad hoc queries, data exploration, and discovery-oriented analytics because data management and structure can be applied on the fly at runtime, unlike traditional structured data storage which requires a schema on write. Hadoop is one data architecture that is well suited to data lakes, as it easily stores many different data types and structures with a high level of scalability and availability. The early ingestion of data means that operational data is captured and made available for analytics as soon as possible. The raw state of the data ensures that data analysts, data scientists, data warehouse (DW) professionals, and other users have ample raw material they can repurpose into many diverse data sets as needed.

Related Articles
Research & Resources
White Papers
Webinars

Successful Data Lakes: A Growing TrendFebruary 16, 2017
Once More into the Data LakeJanuary 31, 2017
Data Lake Management InnovationsJanuary 23, 2017
Benefits of the Hadoop-Based Data LakeNovember 3, 2016
The Data Lake: What It Is, What It's For, Where It's GoingJune 10, 2016
Q&A: Understanding Data LakesFebruary 3, 2016

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

What is a data lake?

Related Articles

Successful Data Lakes: A Growing TrendFebruary 16, 2017

Once More into the Data LakeJanuary 31, 2017

Data Lake Management InnovationsJanuary 23, 2017

Benefits of the Hadoop-Based Data LakeNovember 3, 2016

The Data Lake: What It Is, What It's For, Where It's GoingJune 10, 2016

Q&A: Understanding Data LakesFebruary 3, 2016

Research & Resources

TDWI Checklist Report | Open Cloud Data Storage: Four Best Practices for Maximizing Flexibility and Interoperability in the Enterprise Open Data Lakehouse

Digital Dialogue | Maximizing the Value of Your Data Lakehouse: How to Leverage a Data Catalog for Success

TDWI Checklist Report | Six Steps for Simplifying Data Lakehouse Integration to Reduce Latency and Enable Real-Time Reporting

TDWI Checklist Report | Modernizing Your Data Warehouse and Analytics: Key Pillars of a Data Lakehouse

TDWI Checklist Report | Seven Data Governance Best Practices for Your Data Warehouse and Data Lake

TDWI Insight Accelerator | Accelerating Your Data Lake Journey

White Papers

A Practical Guide to Data Lake Migration

Unleashing the Full Potential of Your Data Ecosystem

The Big Book of Data Engineering

Building the Data Lakehouse

Rise of the Data Lakehouse

Data, Analytics, and AI Governance

Webinars

Expert Panel: The Great Data Stack Reset: New Architectures, New Priorities in 2026

Expert Panel: The Evolving Landscape for Data Platforms, Tools, and Pipelines

Expert Panel: The Future of Data Architecture: Building for Scale, Speed, and AI

Lakehouse Analytics: Unifying Data for Modern AI

Developing Your Data Strategy and Foundation for Modern Data Management – Insights from a New TDWI Best Practices Report

Expert Panel: Data Management in the Age of AI

TDWI

Engage

Research