CEO Perspective: Modern Architecture for Comprehensive BI and Analytics
How can modern enterprises scale to handle all of their data? Tomer Shiran, cofounder and CEO of Dremio, told Upside why he thinks it's all about the data lake.
- By James E. Powell
- January 28, 2020
Upside: What technology or methodology must be part of an enterprise's data strategy if it wants to be competitive today? Why?
Tomer Shiran: We believe enterprises need to accelerate their modernization towards next-generation data lakes, especially in the cloud, powered by storage-agnostic SQL engines. This new, open architecture leverages multiple new technologies to finally deliver true separation of compute and storage.
Data remains in the data lake storage environment, without being copied and ingested into a proprietary data warehouse (whether on-premises or cloud-native). That data can then be analyzed by multiple best-of-breed engines optimized for various use cases, all without traditional OLAP cubes or extracts. In this way, enterprises can explore and analyze all their data, at any time, and at any scale.
What one emerging technology are you most excited about and think has the greatest potential? What's so special about this technology?
ODBC and JDBC were introduced in the 1990s in an era where data sets were much smaller than they are today. At Dremio, we helped develop a new open source project called Arrow Flight, a subproject within the Apache Arrow project, which is designed to replace ODBC and JDBC and enable applications to exchange data hundreds of times faster by eliminating serialization/deserialization and parallelizing data transfer. Apache Arrow is now a core component in the Python and R data science toolkits, so any data scientist can easily utilize Arrow Flight.
What is the single biggest challenge enterprises face today? How do most enterprises respond (and is it working)?
In our conversations with customers, we hear consistently that the single biggest challenge is actually getting value out of all of their data versus narrow slices of it -- and doing so in a timely fashion. The root cause of this challenge is the legacy analytics architecture, which is built upon numerous complex workarounds to deliver enough speed to data consumers. These workarounds include cubes, extracts, aggregation tables, and proprietary data warehouses.
We're seeing most enterprises respond by modernizing their data architectures towards next-generation data lakes, often built on low-cost, highly durable cloud object stores such as AWS S3 and Microsoft ADLS. However, these enterprises are often struggling with the fact that their data modernization efforts cause significant disruption to the hundreds (or even thousands) of data analysts, data scientists, and BI users who query that data every day. What these enterprises need is an analytics abstraction layer that allows them to modernize their data sources without requiring changes to existing queries.
Is there a new technology in data and analytics that is creating more challenges than most people realize? How should enterprises adjust their approach to it?
At first glance, next-generation proprietary cloud data warehouses seem like a good idea. After all, this new technology is built to benefit from the elastic scale of the cloud and to separate storage and compute within the data warehouse for independent (and more cost-effective) scaling. However, enterprises are finding that they're just as locked in to a proprietary format and legacy end-to-end analytics architecture as they ever were, and the elastic usage of resources is actually causing their costs to spiral out of control.
We believe enterprises should leapfrog the entire proprietary cloud data warehouse category and jump straight to a modern, open data lake architecture. Of course, proprietary data warehouses, cloud-native or otherwise, can still deliver value for some use cases. As a result, we recommend enterprises only use them where necessary and direct the rest of their resources toward modern data lake initiatives.
What initiative is your organization spending the most time/resources on today? In other words, what internal project is your enterprise focused on so that you (not your customers) benefit from your own data or business analytics?
We've developed our own internal data lake, which allows us to understand how our customers are engaging with us at a very detailed level. This includes their interactions with every function of the company, ranging from marketing and sales to product and support. Our internal data lake ultimately enables us to provide a superior customer experience.
Where do you see analytics and data management headed in 2020 and beyond? What's just over the horizon that we haven't heard much about yet?
The availability of nearly unlimited compute capacity in the public cloud is enabling new, high-performance analytics solutions that can deliver a lot of value. At the same time, the compute required to power those solutions is driving enterprise cloud costs through the roof. We believe that in 2020, enterprises will bring a lot more scrutiny to the cost efficiency of their analytics and data management solutions -- in fact, of all of their cloud solutions.
For example, we expect to see the emergence of monetary benchmarks to help enterprises compare price and performance for competitive offerings across many different product categories. In this new world, data analytics vendors will need to compete on more than just performance and scalability -- they will need to compete on cost efficiency.
Describe your product/solution and the problem it solves for enterprises.
Dremio solves the problem of slow, inefficient analytics -- caused first by data consumers' heavy reliance on IT and data engineering to make data sets available in data lake storage, and caused second by slow and expensive queries against those data sets. Dremio's Data Lake Engine delivers lightning-fast query speed and a self-service semantic layer operating directly against open data lake storage. Our Data Lake Engine eliminates the need to move data to proprietary data warehouses or to create cubes, aggregation tables, or BI extracts. Dremio delivers flexibility and control for data architects, and self-service for data consumers.
About the Author
James E. Powell is the editorial director of TDWI, including research reports, the Business Intelligence Journal, and Upside newsletter. You can contact him
via email here.