Dremio Reduces Gap Between Data Lakes, Data Warehouses with Updated Dart Initiative
Dremio’s Dart Initiative creates new possibilities with cloud data lakehouses.
Note: TDWI’s editors carefully choose vendor-issued press releases about new or upgraded products and services. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the vendor's statements.
Dremio, a SQL lakehouse platform company, today marks the second delivery in their Dart Initiative, which enables customers to run mission-critical SQL workloads directly on a cloud data lake.
Dremio embarked on the Dart Initiative in June 2021 to help companies run a greater range of mission-critical BI workloads directly on a data lake, with the initial Dart Initiative release delivering faster performance and improved resource efficiency over previous Dremio versions. This subsequent Dart Initiative release introduces several more enhancements, including faster SQL expression processing compared to previous versions.
According to the 2020 Gartner Market Guide for Analytics Query Accelerators report, “Analytics query accelerators seek to shrink the performance impact of the zone of confusion. Put another way, they are trying to move the ‘line of good enough’ to the point where the data lake can provide sufficient optimization on the data to make it suitable for an increasing percentage of workloads.”
Here are some of the key innovations of Dremio Dart Initiative Fall 2021 release.
Scale-out Metadata Collection and Storage
Achieving near-instantaneous query startup times has been a challenge for traditional query engines, which must perform work to parse, plan, and gather data set metadata for each query before it can be executed. In contrast, Dremio enables interactive performance directly on data lake storage by reducing the amount of computation required at runtime.
This release delivers near-real-time metadata refresh for data sets as they are persisted in the lake, ensuring users are accessing the most current version of data, and receiving timely visibility into recent schema and data changes. Dremio has achieved this data freshness by carefully refactoring metadata processing to become a parallel, executor-based process, with metadata now stored and managed in Apache Iceberg tables on data lake storage.
In addition to the benefits to scalability and data freshness, its enhanced metadata management approach enables Dremio to deliver metadata refresh times significantly faster than in prior versions while governing them with the same workload management capabilities as queries, such as engine routing, priority, and concurrency controls.
Hardware-Optimized Query Processing
Dremio is an in-memory engine powered by Apache Arrow2, an open source columnar standard for in-memory computing that was co-created by Dremio. Gandiva, a component of Arrow, is an LLVM-based toolkit that enables vectorized execution directly on in-memory Arrow buffers by generating code to evaluate SQL expressions that fully leverages the pipelining and SIMD capabilities of modern CPUs. This Dart Initiative release enables Dremio to accelerate expression processing rates, ultimately providing a performance increase for end users.
Expanded SQL Coverage and Data Lakehouse Support
The Summer 2021 Dart Initiative release empowered companies to run an even broader set of enterprise SQL workloads by expanding SQL coverage to include additional functions, operators, and SQL grammar constructs. The Fall 2021 release extends the SQL coverage introduced through the prior Dart release, with functions such as pivot/unpivot and filtered aggregates.
Aside from broadening the scope of SQL workloads, this Dart release also expands support for open-source table formats. Table formats such as Apache Iceberg and Delta Lake enable users to perform inserts, updates, and deletes with transactional consistency, and to time travel, directly on data lake storage. Table formats have surged in popularity as these features were previously only supported by data warehouses. With this release, companies can now run interactive BI workloads on both of the leading lakehouse table formats, Apache Iceberg and Delta Lake.
For more information, visit www.dremio.com.