Modernizing Data Lakes for Greater Reliability, Ease of Use, and Value
Webinar Speaker: David Stodder, Senior Director of Research for BI, TDWI
Date: Thursday, June 1, 2023
Time: 9:00 a.m. PT / 12:00 p.m. ET
With the rapid expansion of analytics and AI/ML and the vital importance of data insights to nearly all decision makers and applications, it is essential to match workloads with the right data design pattern and architecture with a focus on reliability, scalability, and data democratization. In the modern world, data sources and data sets are increasingly huge and varied, and users range from nontechnical people who need accurate reports and dashboards to data scientists launching AI/ML to uncover patterns, trends, and unexpected insights.
Many organizations have set up data lakes for ingesting a wide range of data types at a very large scale. TDWI research shows that organizations are taking advantage of the cloud for data lakes. However, data lakes can easily become enormous data swamps. Users struggle to find relevant data, organizations cannot keep track of what’s in the data lake, and performance problems make it difficult to gain timely or accurate insights.
Although there’s a growing demand for capabilities such as ACID transactions, time travel, record-level updates, and deletes, organizations also face trade-offs between storage formats. Specifically, how do they choose between open source and commercial options to ensure longevity and ease of use, but prevent lock-in? What is the best way forward with data lakes? More importantly, how can you ensure you have the right architectural pattern to match today’s and tomorrow’s data demands?
Join this TDWI Webinar to learn current practices and technology trends that will help you develop the best strategy for modernizing your data lake. We will discuss critical developments such as open table formats for increasing value and avoiding letting your data lake become an impenetrable data swamp.
Topics we will cover include:
- Architectures and best practices for solving data lake challenges and ensuring trust in data
- Open table formats and why Apache Iceberg is an important emerging standard
- How to enable easier data transformation and analytics on the data lake
- How to modernize data architecture by unifying data lake and data warehouse
- Evaluating skill requirements for increasingly complex data and workloads
Guest Speaker
Parag Jain
Principal Architect Field CTO
Snowflake
Parag is a big data and advanced analytics expert with over 17 years of experience primarily focusing on data science, data engineering, and data lake workloads. He has led and worked with several technical and corporate teams, from developers to CTOs/CIOs, to build cost-effective data solutions using public/hybrid cloud infrastructure. Further, he has worked on many complex analytics solutions for descriptive, prescriptive, and predictive analytics, data pipelines, real-time messaging, event-processing, and ETL/ELT data integration-data streaming solutions. He is passionate about new technologies and modern data cloud.
David Stodder