TDWI
GLOSSARY


What is a data lake?

A data lake is an unstructured data repository that contains information available for analysis. A data lake ingests data in its raw, original state, straight from data sources, without any cleansing, standardization, remodeling, or transformation. It enables ad hoc queries, data exploration, and discovery-oriented analytics because data management and structure can be applied on the fly at runtime, unlike traditional structured data storage which requires a schema on write. Hadoop is one data architecture that is well suited to data lakes, as it easily stores many different data types and structures with a high level of scalability and availability. The early ingestion of data means that operational data is captured and made available for analytics as soon as possible. The raw state of the data ensures that data analysts, data scientists, data warehouse (DW) professionals, and other users have ample raw material they can repurpose into many diverse data sets as needed.


  • Jump To:
  • Content » Events »