Automating the Open Data Lakehouse Using an Open Intelligent Metastore
TDWI Speaker: David Loshin, President of Knowledge Integrity
Date: Tuesday, May 24, 2022
Time: 12:00 p.m. PT, 3:00 p.m. ET
The innovative idea of a data lake holding a collection of data assets in their original or raw forms has influenced cloud-based data storage tactics. This is especially true in organizations wanting to empower a variety of data consumers of different levels of technical expertise. However, it is apparent that the absence of governance coupled with limited data awareness has diminished the utility of the data lake.
This webinar focuses on how coupling governance with an intelligent metastore can transform a data lake into a data lakehouse that can support the organization’s data workflows and analytics applications. Our discussion of open data lakehouse architecture will inform attendees about:
- Semantic tagging for data asset classification
- Empowering data collaboration in ways similar to shared developer code repositories
- Asserting a semantic layer providing virtual schemas
- Versioning of data lake data across time
- Data optimizations to automate common data management practices
- Improving data democratization while simplifying data utility
Webinar Series: Strategies for Democratized Cloud-based Analytics Using Open Lakehouse
The design of the traditional on-premises data warehouse is predicated on the presumption that data must be extracted from source systems, transformed into a format suited to an architecture specifically for analytical queries, and loaded into a segregated data system isolated from the original sources. Over the past decades, data warehouse design has abjectly followed this conventional wisdom, resulting in a piling on of technical debt that sometimes overwhelms the ecosystem for reporting and analytics.
As a platform, the cloud is positioned to change this. Fundamental aspects of cloud computing reduce or even eliminate technical dependencies that constrained the traditional data warehouse design, including, but not limited to:
- Decoupling of data from computing resources
- Effectively unlimited resource scalability
- Seamless data distribution
- Massively parallel computing
- Integrated data pipelines
No wonder that organizations are actualizing their analytics workloads to the cloud. A naive approach such as replicating an on-premises data warehouse in the cloud may seem like an easy approach for moving to the cloud. However, newer approaches such as data lakehouses that use an open foundation can provide a strategic architecture that best takes advantage of cloud services and technologies.
This series of three webinars examines the different aspects of the open data lakehouse and how it can support analytics performance and democratization while eliminating dependencies on proprietary data file structures, table formats, or system components.
Don’t miss any of the webinars in this special series!
April 27 - How Open Lakehouses for Query Execution and Performance Simplify Cloud Analytics
May 24 - Automating the Open Data Lakehouse Using an Open Intelligent Metastore
June 15 - Putting it all Together: Panel Discussion on Building Cloud Analytics Using an Open Lakehouse