The MPP Data Warehouse Takes to the Cloud
MPP data warehouse services from both Snowflake Computing and Teradata available on AWS.
- By Stephen Swoyer
- November 17, 2015
[Editor's note: This article has been updated for clarity; it was originally posted on 11/17/15.]
Between them, Snowflake Computing Inc. and Teradata Corp. articulate two distinctly different visions of the data warehouse in the cloud. As of early next year, however, both Snowflake's and Teradata's data warehouse services will be available via Amazon Web Services, or AWS. Snowflake's data warehouse-in-the-cloud is available on AWS today; Teradata says its own data warehouse service will be available on AWS in Q1 of next year
At some point next year, in fact, both Snowflake's and Teradata's massively parallel processing data warehouse services will beavailable via AWS -- and that is, or will be, something. Amazon already offers its own MPP data warehouse-as-a-service, Redshift, based on technology it acquired from the former ParAccel Inc. Snowflake, too, currently offers customers an MPP option for AWS. Teradata's take on AWS is a trickier issue, however.
An MPP version of Teradata is coming to AWS, officials promise. Just when hasn't been determined yet, however. Officials instead sought to emphasize the unprecedented nature of Teradata's news.
“For years, there's been this tight integration between Teradata database software and the [Teradata hardware] platform. What we're announcing is that Teradata database will be available on the AWS Marketplace for production workloads, so this is the first time Teradata has made its database software available [separately] for production workloads in a production environment,” says Brian Wood, director of cloud marketing for Teradata.
This is strictly true, albeit with qualifications. In the late 1990s, for example, Teradata offered customers the option of deploying its database software on non-Teradata kit -- e.g., on uni-processor Windows NT systems running on non-Teradata PC hardware. Teradata Database in this configuration was a uniprocessor, single-node-only play, which severely limited its scalability. Since 2009, Teradata has offered “Express” versions of its database for both Amazon's Elastic Compute Cloud (EC2) and VMWare. Teradata Express is intended for non-production deployments, however.
By contrast, Teradata in AWS has the potential to be the real MPP deal -- or as real of an MPP deal as the multi-tenant cloud environment will permit. When it launches in Q1 of next year, Teradata-in-AWS will support only single-node deployments. Multinode -- or MPP -- is coming, however, asserts Wood, who says that MPP-on-AWS option will be available in “late-2016.” Still unresolved, however, is how many MPP nodes Teradata-on-AWS will eventually support. If Teradata's MPP-on-AWS option isn't crippled by node- or volume-size limitations, and if it likewise replicates Teradata's best-in-class workload management in AWS, it has the potential to be a formidable cloud competitor.
These are all big “ifs.” Truth is, a lot about Teradata's move to AWS is as yet undetermined: Teradata officials say they haven't yet set pricing, for example.
Nor has Teradata said whether customers will have the option of deploying across Amazon's InfiniBand-based high-performance computing (HPC) cloud. It has said that subscribers can pay as they go and self-provision, which means they can spin up (or spin down) AWS instances of Teradata Database as needed, Wood confirms.
The move to EC2 and multi-tenancy marks a departure for Teradata's cloud strategy. Previously, Teradata had offered a public cloud service with a dedicated hardware option. In the context of AWS, Teradata Cloud is a multi-tenant service.
This was something -- multi-tenancy, resource sharing -- that Teradata had previously eschewed. More than this: Teradata had attacked multi-tenancy as inimical to its value proposition. Why the about face?
“There will be a spectrum of customers. There will be those who want maximum performance, maximum scalability, the fastest query, the most concurrency. That [level of performance] will continue to be in the engineered platforms, the EDW, the appliances,” Wood says.
“You can envision a middle ground, a managed environment, which is Teradata Cloud, a hosted type [of] offering. Again, purpose-built infrastructure, tightly integrated, optimized for just this [hosted cloud] use case. The other end of the spectrum would be a completely virtualized type of environment where maybe performance and scalability and capacity, [the importance of] that metric comes down, but [the need for] flexibility, elasticity goes up, especially flexibility in terms of integration with other on-the-fly data sources. That's where we see [Teradata in] AWS playing.”
AWS is just the beginning, too, Wood assures us. He wouldn't name names but said that other cloud environments or other prominent services could be in the mix as well.
“In our most recent earnings call, our CEO, Mike Kohler, talked about Teradata extending into public clouds and he's spent a fair amount of time kind of emphasizing that's the direction we're going,” he points out. “The AWS announcement will be the first of the various clouds."
Snowflake is no stranger to AWS: it uses Amazon's simple storage service (S3) as its storage substrate. Snowflake's Elastic Data Warehouse was built for the cloud, which means it was designed to exploit the advantages (and mitigate the constraints) of the cloud model, says Jon Bock, vice president of product marketing with Snowflake. Consequently, Snowflake's design permits users to dynamically expand and shrink compute clusters for queries, separate from storage.
Snowflake was in stealth mode in 2014 -- although it did do some promotional briefings -- and staged its official coming out this June. Even though Snowflake uses S3 as a storage substrate, it also uses solid-state drives (SSD) as cache; in this scheme, Snowflake pushes fetches to S3 -- which has indeterminate latency, i.e., from ones or 10s of milliseconds up to one second -- and moves data from S3 into its cached tier.
In this way, Bock says, Snowflake has tried to design around the constraints of the cloud, wherein multi-tenancy -- the virtualization and sharing of resources with other workloads, which inevitably entails resource contention -- is the rule. Designing for the cloud confers advantages, too. “A product differentiation that we had started working on a long time ago [has to do with] semi-structured data. Semi-structured data, JSON, Avro, and XML [files] were never really designed to be fitted into a relational database, but because we had an RDBMS that we built ourselves, we were able to build it to handle that data. We were able to build it to get the same performance [as a NoSQL platform] because the system understands the data natively,” Bock says.
Prior to its AWS announcement, Bock says, Snowflake didn't much run into Teradata as a cloud competitor. Nor does it encounter services such as Birst Inc. and GoodData Inc., which -- even though they market BI-platform-as-a-service offerings -- implement the equivalent of data warehouse architecture in the cloud.
“Vendors such as GoodData and Birst ... their target market is people who don't want to have a separate data warehouse, so they're targeting more of a mid-market scenario. We're for a scenario where you want the flexibility of a general-purpose data warehouse and the advantages that you get with [deploying that in] the cloud,” he says.
“You can scale [Snowflake] up, scale [it] down. In our environment, [scale-up/scale-down is] self-service: it's something you can do yourself, in seconds or minutes. In our environment, if you want a massive compute cluster, you can do that in minutes,” Bock explains.
With the dedicated hardware option for its Teradata Cloud service, Teradata offered performance and availability SLAs. Other cloud-based data warehouse-as-a-service offerings didn't -- and don't -- offer equivalent or commensurate SLAs. Snowflake, for its part, offers what Bock calls “availability” SLAs -- although he declines to compare or contrast availability SLAs in the context of the cloud with those in an on-premises environment. “We do have availability SLAs that customers can choose to buy into their contracts. For some customers, this matters a lot,” he notes.
“We don't have SLAs in the contract per se for performance, but we do give people the flexibility to apply more resources if they want or need to, so customers have the ability to apply more resources when and where they're needed in order to get additional performance.”