Q&A: Where Cloud Data Warehouses Are Headed
What characteristics do third-generation cloud data warehouses share?
- By James E. Powell
- May 10, 2019
What does it mean to be a third-generation data warehouse in the cloud? Actian’s SVP of Engineering, Emma McGrattan, gives us a preview of where the technology is headed and what benefits your enterprise can expect.
Your latest announcement about Actian Avalanche positions the solution as a third-generation cloud data warehouse. What characterized the first two generations?
Emma McGrattan: The first generation of cloud data warehouses is exemplified by solutions such as Amazon Redshift which are products delivered in the cloud as pseudo-managed services. The consumers of these cloud services must know what additional cloud infrastructure services they need to connect to build and manage a solution that meets their business needs.
First-generation cloud data warehouses are typically limited in their elasticity and don’t provide true cloud economics where you only pay for what you use (once you subscribe to the service the underlying cloud infrastructure service typically must always be on, so the meter is always ticking).
The second generation was architected for the cloud (e.g., Snowflake, which provides a cloud-native, fully managed cloud data warehouse). This generation of services provides improved on-demand elasticity that cloud consumers expect as well as the ability to pay only for the resources being used. The underlying cloud infrastructure is inherently hidden from the user, who can focus on consuming the database as a service.
In designing a third-generation cloud data warehouse, our motivation came from market demands for a service that addresses the limitations of first and second-generation cloud data warehouses.
What do you consider the top three features of third-generation cloud DWs? What’s driving these features?
One of the main differentiators is our hybrid capability. One critical requirement for many organizations that wasn’t being met is for a service that offers an on-premises equivalent that enables the same technologies, skills, and applications to be run on multiple cloud platforms and on premises. This addresses the demand that industries with regulatory compliance requirements, such as financial services, healthcare, and pharma, have for data. They want to leverage the same technologies for their on-premises and cloud analytics needs as well as have the ability to write queries and applications that seamlessly join on-premises and cloud-resident data.
Second, but equally important: providing scalability in terms of data volumes, users on the system, and query complexity. Earlier generations struggled with large user volumes running mixed workloads at enterprise scale. To address this, a third-generation service must be suited for use cases where data volumes are high, query complexity is varied, and the organization wants to provide every decision maker with real-time access to data.
Finally, a robust concurrency capability enables organizations to truly harness the data across every business function at scale with enterprise-grade reliability and security. Earlier cloud data warehouses experienced issues with concurrency; the latest third-generation solutions can accommodate hundreds of users querying the data in parallel.
One of the oft-touted benefits of having a data warehouse in the cloud is the cost savings (from having a third party manage the infrastructure, lowering hardware initial investments and benefiting from temporary (elastic) demand, paying only for the ongoing resources you need, and having someone else responsible for regular data backups). Are there any additional cost savings you expect from the third generation of DWs?
Third-generation cloud data warehouses allow use of the same skills, technology, and applications for both cloud and on-premises deployment, greatly reducing the staff required to administer hybrid deployments. In addition, they provide for self-tuning, which negates the need for expensive database tuning experts to create material views, cubes, and other complex database mechanisms that need to be managed and maintained.
What about other data management issues such as compliance or security? Any benefits there?
Third-generation cloud data warehouses provide all of the advanced security capabilities from past generations, but with the added benefit that data subject to regulatory compliance requirements can be retained on-premises while still part of a broader data ecosystem encompassing on-premises and cloud data assets. We have observed reticence in financial services and healthcare to move certain data sets to the cloud.
What third-generation features does Actian provide?
Actian Avalanche cloud data warehouse is designed to be a component in a broader cloud strategy and as such comes preconfigured with a set of connectors to popular SaaS solutions such as Salesforce, NetSuite, WorkDay, and ServiceNow so data from those services can be fed in at the speed of the business.
I mentioned concurrency before. We architected Actian Avalanche to handle this, executing complex queries for large numbers of users and massive data volumes similar to legacy on-premises enterprise data warehouses.
Third-generation cloud data warehouses will make it easier to maintain a hybrid environment, with some data in the cloud and some on-premises. In the case of Actian Avalanche, blending data from on-premises and multiple cloud deployments is seamless. It’s also an autonomous cloud data warehouse -- a self-managing environment that lets organizations focus less on administration and maintenance and more on their business differentiation.
About the Author
James E. Powell is the editorial director of TDWI, including research reports, the Business Intelligence Journal, and Upside newsletter. You can contact him
via email here.