May 25, 2023
To stay competitive and thrive in a constantly changing environment, organizations are collecting and analyzing larger amounts of diverse data. As part of this process, they often realize that their current data warehouse and/or data lake is not sufficient for their needs.
In recent years, a new paradigm has emerged to address the deficiencies of both the data warehouse and the data lake—the data lakehouse—a combination of a data lake and a data warehouse that provides warehouse data structures and data management functions on low-cost platforms, such as cloud object stores.
The lakehouse grew out of incremental technical advancements of columnar storage types, data access patterns, cloud adoption, highly parallel computing orchestration, and increased indexing capabilities. This has led to a set of previously unavailable platform capabilities. These new platforms have blurred the distinction between the traditional data warehouse and data lake. They support and manage large volumes of diverse data along with SQL, BI, AI, machine learning, and other advanced analytics on one common platform—typically in the cloud.
This TDWI Checklist Report examines what sets the data lakehouse apart from the data warehouse and the data lake and the key pillars of the modern cloud data lakehouse. These pillars can serve as the requirements for evaluating lakehouse platforms.
Sponsored by Databricks and Tableau, from
Salesforce