Teradata: Here, There, and Everywhere
For companies trying to balance the convenience of the cloud against on-premises performance advantages, Teradata Everywhere should be compelling.
- By Steve Swoyer
- October 28, 2016
Now as ever, beauty is in the eye of the beholder. Take the enhancements Teradata's touting in conjunction with its new "Teradata Everywhere" initiative.
The ability to shrink or expand a massively parallel processing (MPP) data warehouse without incurring significant downtime? A single MPP code base that spans on-premises environments, the managed and platform-as-a-service (PaaS) clouds, and virtual machines? For some beholders, stuff like this wouldn't merit a second glance.
For companies attempting to negotiate a phased shift to cloud, or balancing the low cost and convenience of the cloud against the performance advantages of the on-premises model, this is nothing less than a godsend.
Shrinking or expanding a database with little or no downtime is a hard problem. Starting with a single RDBMS code base -- particularly an MPP RDBMS code base -- and optimizing for different contexts and compute paradigms is no less difficult.
Viewed from a certain perspective -- namely, that of Teradata's customers -- solving these problems is fantastic.
"From our competitive analysis, this is an industry first. Teradata is the first to bring the same exact code base across every one of these platforms," says Imad Birouty, Teradata's director of technical product marketing. "What that guarantees for customers is choice over time. If I want to migrate workloads from one platform to another ... this gives [me] a seamless and transparent way to do so."
Teradata Everywhere Means What It Says
Teradata plans to "make sure the Teradata Database and [its associated] analytic solutions are available on customers' terms where and when they need it," Birouty confirmed.
This is much harder than it might sound. It entails refactoring or porting the Teradata database to run in MPP configurations in:
- On-premises environments
- Teradata's own hosted cloud
- The Amazon Web Services (AWS) PaaS cloud
- Virtual machine instances
Teradata in the AWS cloud is no less of an MPP platform than Teradata in the on-premises data center. Okay, it's slightly less of an MPP platform: initially, Teradata will support MPP configurations of up to 32 nodes in VMWare virtual machines and in AWS -- where X1 instances can support up to 128 virtual processors and 2 TB of physical memory. Support for up to 32 MPP nodes is coming to Microsoft's Azure platform, too. This is just a starting point, however, Birouty says.
"Up to 32 [MPP nodes] is really partly testing and partly what does [the cloud host's] infrastructure look like underneath? Tomorrow that's going to change. It's definitely not a limit of the Teradata database," he answered, in response to a question from an analyst.
Chris Twogood, vice president of product and services marketing, said Teradata has created bigger configurations in testing. "We're testing even now [at] 128 [nodes]; we're just not ready to release that for production [use]. We're going to go as fast as the infrastructure lets us get to."
The Challenge of Parallelism
As Teradata Technical Marketing Specialist Dan Graham told Upside in a separate conversation: parallelism -- i.e., the ability to simultaneously and efficiently distribute a workload across multiple compute nodes -- is a "nasty" technological problem. It gets even nastier in the multitenant cloud, where it's difficult, if not impossible, to control for the vicissitudes of compute, storage, and network performance.
"What everybody overlooks is that this parallel thing is nasty hard computer science all by itself. It doesn't matter if it's Hadoop ... Spark or whatever: we've been at it for 30 years and we're still improving our stuff. It's a never-ending battle. Parallelism really is truly exceptionally difficult."
It's even harder when -- as with Teradata Everywhere -- the goal is to support portability between and among contexts. If a customer wants to shift workloads running in an on-premises data warehouse to the cloud, "no one wants to rewrite an application just because they're moving from one platform [context] to another," Birouty says.
Teradata is focusing on enabling a predictable, consistent experience, irrespective of context: "If you have the same Teradata database running across these things, you want [similar] service-level agreements running across all of these things, too. Likewise, we're adopting cloud features, such as rapid system growth."
Quickly Shrink, Expand Data Warehouse Volumes
This last comes via Teradata's new MAPS architecture. MAPS supports table-level data distribution, which should permit a high degree of elasticity in scaling the Teradata database up or down.
This, too, is a very challenging problem.
In an MPP architecture, a data set is distributed across all of the nodes in the cluster, sometimes proportionally, sometimes not. In practice, most MPP database systems use a technique called hashing to efficiently distribute data across nodes. In this scheme, a DBA will select a unique distribution key (e.g., STORE_ID) which the database then uses to hash data into different values -- corresponding to node numbers -- and distribute it accordingly.
This can result in a phenomenon called skew that can kill performance in MPP environments. In some cases, hashing results in an unevenly distributed data set; in a worst-case scenario, a disproportionate volume of data can be concentrated in just a few nodes, with other nodes practically unused.
Whenever you shrink or expand an MPP database, you have to rehash the data set, which can be an extremely time-consuming activity. With MAPS, Teradata promises a 90 percent reduction in downtime. This number is actually much higher in practice, Birouty says.
"We don't have just one hash map; we have multiple hash maps. Basically, we're saying conservatively above a 90 percent reduction in downtime when you expand the system," he says.
"This is conservative. The majority of the sites that I looked at were 98-99 percent reductions. I know those sound like fantastic or crazy numbers, but they are real."
Elasticity Compared with Cloud Computing
In a separate discussion, Birouty favorably compared Teradata's new elasticity with that of the cloud model. That isn't quite accurate, however.
Some cloud data warehouse environments -- such as Snowflake -- do claim to support rapid database expansion or reduction. Others, such as Amazon Redshift, don't. In order to expand or shrink a Redshift database, a subscriber must first spin up a new (larger or smaller) Redshift cluster in order to populate it with the contents of the existing data set.
With MAPS, Teradata's on-premises elasticity is even more elastic than is the norm in the cloud.
This is just the tip of the Teradata Everywhere iceberg. Teradata announced several other intriguing products, features, and services at Partners 2016.
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at firstname.lastname@example.org.