Planning for a Scalable Enterprise Data Lake
TDWI Speaker: David Loshin, President of Knowledge Integrity
Date: Tuesday, March 26, 2019
Time: 9:00 a.m. PT, 12:00 p.m. ET
In this webinar we will discuss a more modern view of the data lake and consider best practices for planning and implementing a scalable enterprise data lake. The flaws in early data lakes were often rooted in the expectations of data consumers who put a premium on self-service data analytics. However, with no data governance mechanisms, data lakes quickly became more of a glorified “dumping ground,” “data swamp,” or “beta lake” for organizational data.
In recent years, though, some innovations have allowed the data lake to evolve into an agile yet managed environment for accumulating shared data resources that can be optimally used for competitive advantage. Data lakes have evolved beyond the original on-premises concept based solely on Hadoop and now include pretty much any distributed computing platform (Hadoop, Spark, EMR, serverless, etc.) and any storage mechanism (HDFS, S3, ADLS), either on-premises or in the cloud.
Modern enterprise data lakes provide accessibility to a wide range of data objects, a large and elastic pool of computing power and a variety of distributed storage mechanisms, essentially providing a platform for advanced analytics that goes well beyond what was available in the more rigid and traditional enterprise data warehouse architectures.
Attendees of this webinar will learn about creating agile yet managed data lakes by:
- Simplifying ingestion into the data lake
- Classifying and organizing data lake assets
- Enabling high-speed querying of data lake information
- End-to-end process orchestration
- Hybrid data lake issues and suggestions
David Loshin