Prerequisite: None
Modernizing to the cloud affords opportunities for decoupling data storage from computation, which enables increased flexibility in managing, sharing, and analyzing corporate data using both cloud-native data warehouses and cloud-based data lakes. However, in a modern analytics environment, neither the data warehouse nor the data lake alone can satisfy the expanding needs of a growing diverse community of data consumers.
In this talk, David Loshin will explore a blended approach referred to as a data lakehouse that enables a data warehouse and a data lake to work in a complementary manner. A data lakehouse uses a harmonized semantic layer over structured, semistructured, and unstructured data to establish a foundation for cloud-based business intelligence and analytics. By blending the best of both a data warehouse and a data lake, the data lakehouse concept simplifies data management and governance across an increasingly hybrid landscape while enabling greater downstream data consumer capability. A data lakehouse minimizes data copying, reduces data latency, and improves data reuse.
David Loshin will discuss:
- Benefits of separating data from computing resources
- Structured schemas and virtualization
- Integrated governance over management and access to data in object storage
- Support for query access to both structured and semistructured data
- Federated querying across a distributed environment