Revealing the Mysteries of Enterprise NoSQL
How is NoSQL different from other databases? What about scalability and reliability of databases and how quick can they come online? We take a close look at several important aspects of today's database technology.
- By James E. Powell
- August 13, 2013
How is NoSQL different from other databases? What about scalability and reliability of databases and how quick can they come online? To learn more about the ever-growing importance of database technologies for BI professionals and IT alike, we turned to Sam Bisbee, director of technical business development at Cloudant, a company that offers a distributed database service.
BI This Week: What is the biggest difference between traditional, relational databases and NoSQL databases that IT professionals have the most difficulty understanding?
Sam Bisbee: Relational databases were designed in the 1970s based on IBM researcher Ted Codd's white paper published that year. Those databases were designed for a few people using them in a central location, a model that began to degrade during the dot-com bubble of the late 1990s. This launched a new school of database design that was spearheaded by respected technical companies such as Google and Amazon, and is at the heart of today's NoSQL databases.
Today's Web and mobile applications can have millions of concurrent users spread around the globe, and their needs are different depending on the application. For old-school ERP systems, data consistency is critically important -- meaning it's more important that data reflects the latest changes, even if you have to wait a long time for it. For applications such as mobile gaming -- where data can be stored locally on mobile devices and that synchs up with centralized (or even distributed) databases -- your data must be instantly available and updatable for the large number of people who need it. Therefore, the user experience drives the criticality of speed over consistency.
"Eventual consistency" is a term associated with many NoSQL databases, and it might seem like a dirty word in the IT world (at least, if you talk to the big relational database vendors). It doesn't mean your database needs 15 minutes to reflect the latest changes. With all the fiber in the ground in the U.S. and investments in infrastructure, these changes may only take a matter of seconds to propagate, even for a large distributed database system that spans hundreds of nodes across tens to hundreds of data centers. It's also worth re-evaluating the assumption that for NoSQL we have to dismiss the idea of strong consistency altogether. Not all NoSQL products are eventually consistent, but it is telling that so many have chosen to optimize for that use case.
How can IT professionals ensure database scalability and reliability when applications have enormous surges of activity?
Database sharding and horizontal scaling address scalability and reliability issues, much in the way people have been scaling application and stateless services for years. At Cloudant, we deliver a NoSQL database as a managed service. If we need to add more servers, then we do, rebalancing the shards across the new servers to alleviate load. Even though monitoring and recovery are automated in our case, we still cycle shifts 24x7 to cover support and operations for our customers. That is a huge relief for customers who have experienced having to play DBA in 24x7 shifts, both for their costs and their culture.
Implementing support rotations probably isn't a popular notion when it comes to DBAs. Most enterprises will have the budget and personnel to implement support rotations, but if your team is smaller and you're doing in-house database administration, that's going to be on you, and you'll be losing sleep over it.
How quickly can databases delivered as a service come online?
Our multi-tenant service will get you running on Cloudant in under a minute. The dedicated Cloudant service that requires a new cluster of bare metal servers can be spun up as fast as a few hours. We have done a lot of work with our IaaS partners such as Rackspace and SoftLayer to optimize our build process.
A more important consideration is how quickly can databases scale when surges in demand require it. It's a critical question to ask any database provider.
Some database providers have tried to address reactionary scale by over-provisioning a buffer of spare servers in production. This scenario quickly becomes expensive and untenable. Even bursting into virtual or public cloud environments is tricky -- that promise was made and delivered on for stateless application servers, but to date the cloud has largely failed databases.
Simply put, this is a hard problem and IT departments need to decide whether building a data layer is in their organization's core competency.
How easy is it for IT people to import and export data from different sources?
JSON document data stores are supported with parsers for many languages. Most or all languages have a JSON library available that will easily map its native constructs into JSON. In addition, importing data that's in a markup language like XML is a natural fit.
You can also write scripts to convert relational data to JSON. The trick in your NoSQL database of choice thus becomes a matter of capturing the relationships between that data. That's the heavy lifting we have traditionally relied on relational databases to handle for us, but with NoSQL, it requires more up-front thought. The point isn't to dump a bunch of data in there and assume you'll be able to fish it out later because it's NoSQL.
If you are an IT person working with your DBAs to convert a relational database into a NoSQL document store, the biggest area to focus their data-modeling minds on will be joins. If your company runs important queries that rely on multiple join clauses, that's the first place to look to determine how to build structure around your JSON documents. Document stores such as CouchDB and MongoDB have no concept of performing joins or enforcing foreign key constraints. With document stores, you can nest an arbitrary number of key-value pairs within a JSON document -- you are no longer restricted to two-dimensional data structures such as rows and columns. It makes sense to let your current set of queries dictate -- in some part -- how you structure your data. It will go a long way in performance tuning down the road.
What best practices can you recommend to make importing and exporting easier?
There are several ways to move data into and out of cloud databases, so it really depends on what you are trying to accomplish. However, most should support a RESTful HTTP API, so the characteristic will be familiar to anyone who has used a Web service or an API before. If not, learning a little about RESTful Web services will help DBAs dealing with these new databases.
In addition to Cloudant's RESTful API, there is, of course, the operational API for normal database transactions, where you can move single pieces or batches of data in a familiar way. This also supports the storage of attachments, or binary blobs, in addition to your JSON operational data. Other NoSQL document stores should offer similar means.
One of the most interesting features for importing and exporting data on CouchDB is the replicator system database. It allows you to easily move data between servers that support the CouchDB replication protocol (Apache CouchDB, Cloudant, PouchDB, TouchDB) over HTTP, in both directions.
What products or services does Cloudant offer that are relevant to this discussion?
Cloudant offers a globally distributed database-as-a-service for loading, storing, analyzing, and distributing application data for large and fast-growing Web and mobile applications. We deliver a managed service that helps developers eliminate the delays and costs of in-house database administration. People want to build products, not maintain databases. We help programmers and enterprises accelerate time-to-market. That's the real exciting part for us.