SQL Server 2016 Could Be a Powerhouse Platform for Real-time Analytics
Yesterday, we learned that Microsoft’s SQL Server 2016 database will launch in just a little over four weeks, on June 1st. What do we know about this latest -- and possibly greatest -- edition of SQL Server?
- By Steve Swoyer
- May 3, 2016
With a launch of Microsoft’s SQL Server 2016 database in just under a month (it’s due June 1st), enterprises want to know what to expect from this new edition.
According to Microsoft, SQL Server 2016 will be a speedier platform than its predecessor, SQL Server 2014. Redmond claims it has tuned and tweaked SQL Server 2016 to process queries about 25 percent faster than prior versions. Microsoft says this improvement is independent of other SQL Server 2016-related speed-ups, such as the new Query Store (which can be used to optimize slower-performing queries) and support for updateable in-memory columnstore indexes. Factor in these and a slew of other improvements and some kinds of queries will execute significantly faster.
Speaking of which, Microsoft first introduced support for columnstore indexes in SQL Server 2012. The columnstore index can be used to accelerate many kinds of decision-support workloads. In its first incarnation (SQL Server 2012), the columnstore index was relatively limited: it supported comparatively few data types; it could not be clustered; it was limited to 1,024 columns.
Last and most important, it was read-only.
With SQL Server 2014, most of these limitations went away -- but not all of them. For starters, Microsoft introduced support for so-called “clustered” columnstore indexes, which could be updated. Unfortunately, non-clustered columnstore indexes still could not be updated.
SQL Server 2016 changes this. It’s now possible to update non-clustered columnstore indexes in addition to their clustered kith. What’s more, thanks to another new SQL Server 2016 enhancement, developers can create in-memory columnstore indexes on top of memory-optimized database tables.
This capability is actually a combination of two enabling SQL Server technologies: Apollo, SQL Server’s built-in columnstore engine, and Hekaton, its in-memory engine for OLTP workloads. Admittedly, neither of these engines is new with SQL Server 2016; however, what you can do with them is new, different, and hugely valuable. It’s possible to create updateable in-memory columnstore indexes on top of what are, in effect, in-memory database tables. In other words, it isn’t just possible but comparatively easy to analyze fresh OLTP transaction data in real-time. Voila: SQL Server is repositioned as a creditable platform for real-time analytical processing.
In addition, thanks to Microsoft’s new SQL Server R Services -- the fruit of its late-2014 acquisition of the former Revolution Analytics -- SQL Server 2016 will boast in-database support for R, the popular programming environment for statistical analysis. The upshot is that SQL Server 2016 -- even in its non-massively parallel processing (MPP), SMP-only version -- should be a more creditable platform for data warehousing and, especially, advanced analytical workloads.
How much more creditable? After all, prior versions of SQL Server haven’t been data warehousing or analytical slouches. SQL Server Parallel Data Warehouse (PDW) is an MPP database system, comparable (at least in terms of its parallel processing capabilities) with MPP database platforms from IBM Corp. (Netezza) and Teradata Corp. Similarly, Microsoft’s SQL Server Analysis Services (SSAS) is now almost two decades old. Its hundreds of integrated functions and algorithms give SQL Server a powerful built-in facility for data mining and analytical processing.
SQL Server Reporting Services is now 12 years old. It gives SQL Server a pixel-perfect reporting engine for production reporting. Finally, there’s SQL Server Integration Services, or SSIS, a useful and reasonably powerful ETL -- or, depending on what you’re doing, ELT -- engine for SQL Server.
What does in-database R bring to the table? Both more and less than you might think. On the plus side, it makes it possible for customers to run R code in the SQL Server database engine itself. This won’t necessarily permit R code to run faster than in other contexts, in spite of what breathless marketers like to claim.
Marketing-types routinely tout the benefits of in-database analytical processing. This isn’t a claim that’s specific to Microsoft; instead, it’s making a virtue of something close to a necessity. SQL Server and other database systems implement hundreds of in-database analytical functions and algorithms. They don't do this because running R or SAS code in the context of the database engine is faster than running it in a standalone system. (The reverse is typically the case.) They do it because the requisite data is already there -- i.e., in the database.
Running R code against data that’s already in the database is faster than (1) extracting data from said database, (2) running R code against this data in a separate context, and (3) loading the results back into the database. With SQL Server R Services, R processing can now be performed in situ, where the data lives. For advanced analytical applications, this could be a huge boon because with machine learning and other advanced analytical practices, the ability to rapidly iterate (i.e., to build and test prototypes or hypotheses) is key.
The goal of iteration is, in a sense, to fail faster: by rapidly testing and failing, you identify what doesn’t work -- and more quickly arrive at what does. In-database R eliminates the need to extract data from the database, process it separately in an R environment or workbench, and then load it back into the database. This has the effect of radically compressing your fail-faster cycles. You can iterate more quickly.
Now, imagine running in-database R analytics against updateable in-memory columnstore indexes derived from live (in-memory) OLTP tables. You have the ability to bring predictive analytics, decision automation, and other kinds of advanced analytics to bear against real-time data.
SQL Server 2016 packs plenty of additional features, too. For example, thanks to a feature that Microsoft calls “Stretch Database,” SQL Server 2016 will blur the lines between cloud and on-premises storage. Stretch Database is a scheme by which data that’s infrequently accessed is transparently moved into the Azure cloud. (This can include entire database tables or just portions of tables.) DBAs can manually configure Stretch thresholds or entrust their management to SQL Server. SQL Server’s query optimizer decomposes queries and passes the relevant sub-queries to Azure, which processes them (in the Azure cloud) and returns the results.
In SQL Server 2016, Microsoft’s also taking care of some (long-overdue) house-keeping.
Row-level security comes to SQL Server for the first time. The new SQL Server also supports in-database JSON, thus permitting it to store JSON objects en bloc -- i.e., as JSON objects -- or by sharding their contents into columns and rows. Both of these features have been available in competitive RDBMS platforms for several years.
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at firstname.lastname@example.org.