TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Modernize and Govern: Unifying Your Data Strategy July 10, 2025
  - Expert Panel: Best Practices for Modernizing Your Data Environment July 14, 2025
  - Powering Data Science with AI-Driven Tools and Practices July 15, 2025
  - Smarter Marketing in Retail: How AI and Modern Data Foundation Drive Growth July 17, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Executive Summit AI Accelerate 2025, Brought to You by AI Boadroom & TDWI August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Analysis: Scaling SAP HANA in the Cloud

If any workload can make the most out of lots of processing power and RAM, it's SAP's HANA, an in-memory database. Will HANA's performance characteristics translate to a cloud VM?

By Steve Swoyer
July 7, 2016

Which is the largest cloud virtual machine (VM) of them all? Does it even matter?

Amazon and Microsoft seem to think so. Last month, Redmond and SAP announced a partnership to certify SAP's HANA in-memory database for Microsoft's Azure cloud services platform. There's an indisputable logic to it. After all, HANA is an in-memory database. The more memory you can feed it, the better, right?

Before answering that question, let's (briefly) recap what's at stake here.

For a little over a year now, Redmond has trumpeted its Azure G-Series tier as the "largest VM available in the cloud" -- with support for memory configurations of up to 448 GB. That's a huge memory configuration.

At roughly the same time, Microsoft and SAP were coming together to promote HANA-on-Azure, Amazon -- which also supports HANA on its Amazon Web Services cloud platform -- upped the biggest-VM-in-the-cloud stakes significantly. It announced X1, a monster new VM configuration for its Elastic Compute Cloud (EC2) service. X1 VM instances can be configured with up to 2 TB of RAM and 128 virtual CPUs. (Microsoft's GS-Series VMs top out at 32 virtual CPUs.)

Not only is Amazon's X1 sizing more than four times larger than Microsoft's largest GS-series sizing but Amazon also spoiled Microsoft's potential HANA-on-Azure coup.

Does any of this even matter, or is it all so much posturing? After all, how much demand -- if any -- is there for a 2-TB single-VM instance? The jury's still out on that question, but if any workload can make the most out of 2 TB of RAM, virtualized or no, it's SAP's in-memory HANA database.

You might ask whether (or to what extent) the benefits of in-memory database technology will transfer to the cloud. That's a great question, as it happens. The answer is: about as well as the benefits of a massively parallel processing (MPP) database, InfiniBand connectivity, and other high-performance technologies.

The same features -- namely, virtualization and multi-tenancy -- that make the cloud an efficient and cost-effective context in which to run general-purpose enterprise workloads also make it a less-than-optimal performer for analytics workloads. In-memory, MPP, InfiniBand, and other technologies have the potential to mitigate these performance issues.

A data warehouse in the cloud probably won't ever be as responsive, reliable, or available as an on-premises system, but a massive VM -- complemented with MPP, InfiniBand, and SSD storage -- can help to close these gaps. That said, some of the features that make HANA amazingly fast in an on-premises environment might not translate quite so well to the cloud. Why is that?

To find out, let's take a look at just what makes in-memory database technology fly.

For starters, an in-memory database such as HANA doesn't just run entirely in physical memory (RAM). Instead, HANA is designed to exploit all of the memory in a system -- including the on-chip caches used by modern Intel and AMD CPUs. These consist of the Level-1 (L1), Level-2 (L2), and Level-3 (L3) caches that are integrated into the CPU package itself.

On-chip caches range in size from 32 KB (for L1, per core) to 8 MB or more (for L3, shared among all cores). In announcing X1, Jeff Barr, chief evangelist with Amazon Web Services, touted the large L3 cache (45 MB, shared among 18 cores) that's built into the Xeon E7 8880 v3 chips used to power X1 on AWS.

An in-memory database such as HANA will try to make as much use of on-chip caching as possible because the CPU can both write data to and read data from its on-chip caches much more quickly than to physical RAM. To this end, HANA and similar in-memory technologies will "pin" chunks of data or queries in the on-chip cache. Instead of having to fetch and re-fetch data or queries from main memory, the CPU can retrieve it from the much faster local cache.

This is how an in-memory database works in an on-premises data center, where (at least for applications such as HANA) workloads aren't virtualized. Imagine HANA is running on a single on-premises server that's outfitted with 32 processor cores and 448 GB of RAM. In this scheme HANA has direct, unshared, non-abstracted access to the underlying hardware. There's a 1:1 mapping or relationship between the processor cores and the memory HANA sees and the physical resources of the underlying server.

In the generic cloud, this 1:1 mapping disappears. HANA shares access to virtualized processors and memory -- with virtualized L1, L2, and L3 caches.

Microsoft's G-Series and AWS' X1 tiers aren't "generic" cloud services, however. For example, if two organizations both spin up instances of HANA in dedicated X1 hosts, they won't be sharing space (more precisely, hardware) with one another.

They're each going to get their own dedicated VMs running on dedicated hardware. Their respective instances of HANA will still be running in a virtualized context, however. This means HANA will see virtual processors and virtual RAM. More to the point, HANA will pin SQL queries and chunks of data in virtual L1, L2, and L3 caches.

The good news is that hypervisors (the software and hardware running the VM) are incredibly sophisticated. A full explanation of just how and why they're so sophisticated is a topic for another article.

To cite just one example: servers that support multiple processor packages -- say, a system that accepts four 18-core Intel Xeon chips -- use a technology called NUMA (short for "non-uniform memory access") to share and allocate memory and other resources between processor cores. The hypervisor can minutely control how virtual compute and memory resources are allocated in a NUMA configuration.

At a physical level, NUMA maps physical processor cores or sockets to physical memory banks. Core 0 would ideally write data to or fetch data from Memory Bank 0, which is "local" to it. In a virtual context, however, the guest operating system (or the hypervisor itself) could spin up new threads on Core 1 -- or on other non-local cores.

Because local and non-local threads don't share the same on-chip cache or the same main memory pages, this would result in drastically degraded performance. The hypervisor is smart enough to map both virtual processor and virtual memory to local physical resources, however. HANA is optimized for NUMA. It's especially optimized to run on very large NUMA systems -- even a virtualized system with, say, 128 processors and 2 TB of RAM. Because hypervisors are smart about managing and optimizing for NUMA, HANA should scale pretty well in a single large VM too. In fact, it should scale better in a single large VM than in multiple clustered VMs.

"Fewer large nodes should generally outperform a larger number of small nodes given an equivalent number of processors and memory," says Mark Madsen, a research analyst with IT strategy consultancy Third Nature Inc. In other words, a single 2-TB/128-processor server or VM will perform better than four separate servers, each with 32 processors and 512 GB of RAM -- provided the software running on said system can take advantage of it, Madsen concedes.

"The only confounding variable is when you saturate the [interconnect or] bus in the system. Pending that, the fewer things you have, the less interprocess latency there is in going from processor to RAM, processor to disk, and so on," he points out.

Therefore, VM size does matter -- at least for workloads such as HANA. Another salient point is that HANA's in-memory design will probably help to offset some of the performance constraints (especially with respect to analytics workloads) that are endemic to the cloud.

There's one final point. Because HANA is an in-memory database and it performs best with lots of compute and memory capacity, it's an extremely expensive technology to procure, deploy, and maintain. HANA in the cloud, whatever its performance deficit with on-premises HANA, is much cheaper and easier to spin up, deploy, and manage.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Analysis: Scaling SAP HANA in the Cloud

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Analysis: Scaling SAP HANA in the Cloud

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career