TDWI Upside - Where Data Means Business

Does Google's Cloud Spanner Database Cheat Space-Time?

Cloud Spanner, Google's globally distributed database, has been released. One expert says it even breaks the speed of light.

Cloud Spanner, Google's ACID-compliant, globally distributed database, is now available.

This could be a very big deal. On paper, Cloud Spanner cuts the Gordian Knot: it enforces strict ACID compliance even in cases where nodes are geographically distributed and subject to high latency intervals.

In fact, Google describes Spanner as a "global" database: nodes can be deployed in far-flung configurations spanning, for example, the 12,000 miles between Beijing and Buenos Aires. Not only can Spanner deployments scale across regions or continents, they can scale across regions or continents while enforcing ACID compliance.

Experts are impressed. They say Cloud Spanner's ability to scale in geographically distributed deployments, its strict enforcement of ACID safeguards, and its support for relational and polystructured data make it unlike any other database on the market.

"If you are claiming ACID compliance over a multi-region datacenter, then you are either going to have [a database] that's really slow -- or something that breaks the speed of light," says an expert in relational database internals with a prominent data sciences services firm who spoke on condition of anonymity.

"Spanner breaks the speed of light," the expert told Upside.

Cheating Space-Time

Latency doesn't just have implications for ACID compliance but for the performance of any type of distributed system. In other words, Google's claim that Spanner enforces strict ACID compliance is only slightly more incredible than its claim that Spanner is able to scale performance in globally distributed deployments.

"High latency isn't just [an issue] for ACID compliance. In the first place, you want to have as low-latency a network as possible for any distributed system, but for ACID, there is also a per-transaction need for doing things across that network," this expert says. In order to address this need, most RDBMSs implement some variation of the two-phase commit protocol (2PC) to safeguard ACID transactions.

The problem is that 2PC is, by nature, a little chatty.

"[ACID-compliant databases] can't fire and forget: they have to wait to find out if something happened. This requires synchronous communication," he points out. "For this reason, [distributed RDBMSs] have low latency requirements. The problem is it takes about 10 ms for light to travel between Oregon and California. Across the United States, it's close to 50 ms."

Latency of this kind can and will cripple most distributed RDBMSs, this expert says.

How Spanner Is Different

For one thing, Spanner's "hardware" requirements are a bit unusual: in addition to the requisite compute, storage, and network resources, Google says it requires atomic clocks and GPS receivers.

"We've carefully designed Cloud Spanner to meet customer requirements for enterprise databases -- including ANSI 2011 SQL support, ACID transactions, 99.999 [percent] availability and strong consistency -- without compromising latency," writes Dominic Preuss, Google's Spanner project manager, on Google's Cloud Platform Blog.

"As a combined software/hardware solution that includes atomic clocks and GPS receivers across Google's global network, Cloud Spanner also offers additional accuracy, reliability and performance in the form of a fully-managed cloud database service," he adds.

For those who can follow it, Google's seminal 2012 paper ("Spanner: Google's Globally-Distributed Database") describes how Spanner does what it does. The RDBMS expert says that even though he understands what Google is saying about how Spanner scales and maintains strong, ACID-like consistency, he still doesn't quite understand how it cheats space-time. "I get how this allows them to resolve transactions, but it definitely doesn't make the data available on the other side of the planet faster than the speed of light," he points out.

"The transaction can complete locally, and it will be honored globally, but the data doesn't magically move to the other place [e.g., from Beijing to Buenos Aires] as a result."

No More Back to BASE-ics?

Most distributed databases achieve a loose version of ACID consistency. Technically, they're what are called eventual consistency databases: in place of ACID's guarantees of atomic, consistent, isolated, and durable transactions, they achieve what has come to be called "BASE:" basically available, soft state, eventual consistency. In the BASE model, transactions are eventually reconciled. This may happen almost instantly -- e.g., if nodes are colocated with one another and communicating via high-speed transport -- or it may take much longer.

So far, Spanner is the only database of its kind, says Mark Madsen, a research analyst with information management consultancy Third Nature. Madsen has been talking about Spanner since Google's research team first published about it in 2012. It was that exciting, he says.

Now that Cloud Spanner is generally available, Madsen seems no less enthused. "Just having a single-domain ACID-compliant SQL database is a big thing," he argues.

"Spanner stands to be the Redshift of the market. If I was betting a long-term strategic winner in databases, this would be it, assuming it all works and the hard edges aren't too hard."

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.