Will In-Memory Processing Accelerate Your Azure SQL Workloads?
Microsoft argues that in-memory processing can accelerate OLTP and OLAP workloads. This is true on premises. It's also true -- with a few caveats -- in the cloud.
Last month, Microsoft announced the general availability of in-memory processing capabilities for its Azure SQL database service.
In-memory support is officially available for Azure SQL's premium database tier, as well as for premium "elastic" pools. (Customers can subscribe to Azure SQL as a standalone service or via an elastic pool in which multiple databases share resources.) Microsoft argues that in-memory technologies can accelerate OLTP and OLAP workloads.
"They help to optimize the performance of transactional (OLTP), analytics (OLAP), as well as mixed workloads," writes Sreedhar Pelluru, a senior programmer with Microsoft, on the SQL Azure blog. "With these technologies[,] customers have been able to achieve up to 30X performance improvement for transaction processing and up to 100X performance improvement for analytical queries."
This is empirically true in an on-premises context. It's also true -- with a few caveats -- in the cloud.
Virtualization and Data Warehouse Workloads
The first caveat has to do with virtualization, which can significantly affect OLAP processing performance. Virtualization is both a strength and a weakness of the cloud model.
On the one hand, the virtual machines that power cloud database services such as Azure SQL can scale to very large configurations. On the other hand, they're still virtual operating systems and virtual hardware running atop underlying physical resources.
It's a trade-off. In a virtualized infrastructure, applications and/or workloads are still guaranteed virtual hardware resources -- e.g., a certain number of processors, amount of memory, and amount of storage.
In some hosting configurations, however, these resources aren't necessarily local -- i.e., in the same server, rack, or data center. This is especially true of cloud storage. Most conventional cloud database services use the equivalent of an in-cloud storage area network for persistence.
This matters because analytics workloads are highly iterative. At the beginning of each iteration, the database has to reload the entire data set and crunch it, writing the results to disk. This results in a massive number of reads from disk.
This is where in-memory is crucial. Instead of reading data from and writing data to a virtual network storage pool -- which may or may not be located in the same physical racks (or for that matter in the same data center) as the underlying server hardware -- the data is read from/written to memory. When the database finishes an iteration, it writes out the results and reads them again -- only it's doing so from RAM, which is much faster.
In the distributed cloud, this really can significantly boost performance.
Data Constrained by Node Size
There's a second caveat: the data set has to fit into the memory on a single Azure SQL server instance or node.
"The challenge with most public cloud vendors is that they leverage commodity hardware with limited memory per node. If you have smaller data sets that can fit into the memory of a cloud node, even compressed, then it should perform like a similar node on premises," explains Steve Dine, a principal with data management consultancy Datasource Consulting.
"Where it gets dicey is when you have larger data that won't fit into one node. In-memory databases are still constrained with scaling ... by relatively slow networks, even with high-speed connections in a rack on premises. The network speeds at cloud vendors ... are even slower," he says.
Microsoft Embraces an Azure SQL-First Development Model
An Azure SQL database has several in-memory processing facilities. These consist of in-memory OLTP, clustered columnstore indexes for analytical processing, nonclustered columnstore indexes for mixed OLTP and OLAP workloads, and combined OLTP and columnstore indexes.
Azure SQL database derives these in-memory technologies from on-premises SQL Server. Until now, new features and functions for these technologies usually appeared in SQL Server first.
That's going to change, Microsoft says. "[G]oing forward, new capabilities for these technologies are released on Azure SQL Database first, before making their way into the next release of SQL Server," writes Pelluru.
For more about in-memory technology and cloud computing, see:
Analysis: Scaling SAP HANA in the Cloud
SAP Optimizes BW for HANA, Readies BW for the AWS Cloud