TDWI Checklist Report | Best Practices for Data Lake Management
October 12, 2016
A data lake ingests data in its raw, original state, straight from data sources, with little or no cleansing, standardization, remodeling, or transformation. These and other data management best practices can then be applied flexibly as diverse use cases demand.
Most data lakes are built atop Hadoop, which enables a data lake to capture, process, and repurpose a wide range of data types and structures with linear scalability and high availability.
The big data management layer ensures the speed, flexibility, and repeatability of the lake’s most prominent characteristics. These include continuous ingestion, the persistence of detailed raw source data, integrating diverse data types at scale, self-service data prep as part of the data supply chain, operationalizing the data supply chain to feed data products, and managing data assets to maximize business value.
This report drills down into the details of how big data management best practices apply to the design and implementation of successful data lakes with Hadoop.