TDWI Upside - Where Data Means Business

Analysis: Scaling SAP HANA in the Cloud

If any workload can make the most out of lots of processing power and RAM, it's SAP's HANA, an in-memory database. Will HANA's performance characteristics translate to a cloud VM?

Which is the largest cloud virtual machine (VM) of them all? Does it even matter?

Amazon and Microsoft seem to think so. Last month, Redmond and SAP announced a partnership to certify SAP's HANA in-memory database for Microsoft's Azure cloud services platform. There's an indisputable logic to it. After all, HANA is an in-memory database. The more memory you can feed it, the better, right?

Before answering that question, let's (briefly) recap what's at stake here.

For a little over a year now, Redmond has trumpeted its Azure G-Series tier as the "largest VM available in the cloud" -- with support for memory configurations of up to 448 GB. That's a huge memory configuration.

At roughly the same time, Microsoft and SAP were coming together to promote HANA-on-Azure, Amazon -- which also supports HANA on its Amazon Web Services cloud platform -- upped the biggest-VM-in-the-cloud stakes significantly. It announced X1, a monster new VM configuration for its Elastic Compute Cloud (EC2) service. X1 VM instances can be configured with up to 2 TB of RAM and 128 virtual CPUs. (Microsoft's GS-Series VMs top out at 32 virtual CPUs.)

Not only is Amazon's X1 sizing more than four times larger than Microsoft's largest GS-series sizing but Amazon also spoiled Microsoft's potential HANA-on-Azure coup.

Does any of this even matter, or is it all so much posturing? After all, how much demand -- if any -- is there for a 2-TB single-VM instance? The jury's still out on that question, but if any workload can make the most out of 2 TB of RAM, virtualized or no, it's SAP's in-memory HANA database.

You might ask whether (or to what extent) the benefits of in-memory database technology will transfer to the cloud. That's a great question, as it happens. The answer is: about as well as the benefits of a massively parallel processing (MPP) database, InfiniBand connectivity, and other high-performance technologies.

The same features -- namely, virtualization and multi-tenancy -- that make the cloud an efficient and cost-effective context in which to run general-purpose enterprise workloads also make it a less-than-optimal performer for analytics workloads. In-memory, MPP, InfiniBand, and other technologies have the potential to mitigate these performance issues.

A data warehouse in the cloud probably won't ever be as responsive, reliable, or available as an on-premises system, but a massive VM -- complemented with MPP, InfiniBand, and SSD storage -- can help to close these gaps. That said, some of the features that make HANA amazingly fast in an on-premises environment might not translate quite so well to the cloud. Why is that?

To find out, let's take a look at just what makes in-memory database technology fly.

For starters, an in-memory database such as HANA doesn't just run entirely in physical memory (RAM). Instead, HANA is designed to exploit all of the memory in a system -- including the on-chip caches used by modern Intel and AMD CPUs. These consist of the Level-1 (L1), Level-2 (L2), and Level-3 (L3) caches that are integrated into the CPU package itself.

On-chip caches range in size from 32 KB (for L1, per core) to 8 MB or more (for L3, shared among all cores). In announcing X1, Jeff Barr, chief evangelist with Amazon Web Services, touted the large L3 cache (45 MB, shared among 18 cores) that's built into the Xeon E7 8880 v3 chips used to power X1 on AWS.

An in-memory database such as HANA will try to make as much use of on-chip caching as possible because the CPU can both write data to and read data from its on-chip caches much more quickly than to physical RAM. To this end, HANA and similar in-memory technologies will "pin" chunks of data or queries in the on-chip cache. Instead of having to fetch and re-fetch data or queries from main memory, the CPU can retrieve it from the much faster local cache.

This is how an in-memory database works in an on-premises data center, where (at least for applications such as HANA) workloads aren't virtualized. Imagine HANA is running on a single on-premises server that's outfitted with 32 processor cores and 448 GB of RAM. In this scheme HANA has direct, unshared, non-abstracted access to the underlying hardware. There's a 1:1 mapping or relationship between the processor cores and the memory HANA sees and the physical resources of the underlying server.

In the generic cloud, this 1:1 mapping disappears. HANA shares access to virtualized processors and memory -- with virtualized L1, L2, and L3 caches.

Microsoft's G-Series and AWS' X1 tiers aren't "generic" cloud services, however. For example, if two organizations both spin up instances of HANA in dedicated X1 hosts, they won't be sharing space (more precisely, hardware) with one another.

They're each going to get their own dedicated VMs running on dedicated hardware. Their respective instances of HANA will still be running in a virtualized context, however. This means HANA will see virtual processors and virtual RAM. More to the point, HANA will pin SQL queries and chunks of data in virtual L1, L2, and L3 caches.

The good news is that hypervisors (the software and hardware running the VM) are incredibly sophisticated. A full explanation of just how and why they're so sophisticated is a topic for another article.

To cite just one example: servers that support multiple processor packages -- say, a system that accepts four 18-core Intel Xeon chips -- use a technology called NUMA (short for "non-uniform memory access") to share and allocate memory and other resources between processor cores. The hypervisor can minutely control how virtual compute and memory resources are allocated in a NUMA configuration.

At a physical level, NUMA maps physical processor cores or sockets to physical memory banks. Core 0 would ideally write data to or fetch data from Memory Bank 0, which is "local" to it. In a virtual context, however, the guest operating system (or the hypervisor itself) could spin up new threads on Core 1 -- or on other non-local cores.

Because local and non-local threads don't share the same on-chip cache or the same main memory pages, this would result in drastically degraded performance. The hypervisor is smart enough to map both virtual processor and virtual memory to local physical resources, however. HANA is optimized for NUMA. It's especially optimized to run on very large NUMA systems -- even a virtualized system with, say, 128 processors and 2 TB of RAM. Because hypervisors are smart about managing and optimizing for NUMA, HANA should scale pretty well in a single large VM too. In fact, it should scale better in a single large VM than in multiple clustered VMs.

"Fewer large nodes should generally outperform a larger number of small nodes given an equivalent number of processors and memory," says Mark Madsen, a research analyst with IT strategy consultancy Third Nature Inc. In other words, a single 2-TB/128-processor server or VM will perform better than four separate servers, each with 32 processors and 512 GB of RAM -- provided the software running on said system can take advantage of it, Madsen concedes.

"The only confounding variable is when you saturate the [interconnect or] bus in the system. Pending that, the fewer things you have, the less interprocess latency there is in going from processor to RAM, processor to disk, and so on," he points out.

Therefore, VM size does matter -- at least for workloads such as HANA. Another salient point is that HANA's in-memory design will probably help to offset some of the performance constraints (especially with respect to analytics workloads) that are endemic to the cloud.

There's one final point. Because HANA is an in-memory database and it performs best with lots of compute and memory capacity, it's an extremely expensive technology to procure, deploy, and maintain. HANA in the cloud, whatever its performance deficit with on-premises HANA, is much cheaper and easier to spin up, deploy, and manage.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.