RESEARCH & RESOURCES

NoSQL Data Structures: One Structure Does Not Fit All

The traditional relational database is not the only answer to every database application. It's time to gain experience with non-relational (and non-legacy) NoSQL and NewSQL database technologies.

With the increased importance of big data and the desire for organizations to collect and analyze orders of magnitude of data than they could have economically done so just a few years ago, the interest in what is often referred to as NoSQL databases is rapidly growing. Many organizations have adopted classic and proven relational products such as IBM DB2, Microsoft SQL Server, the Oracle Database, and Sybase ASE. as their database standard. However, their decision about which NoSQL database (if any) to deploy and choose as their standard for analyzing "big data" has yet to be determined.

Prior to the rise of relational databases, non-relational legacy database products such as IBM's IMS, Software AG's Adabas, Cincom's Total, andCullinane's IDMS (now owned by CA) as well as file structures such as ISAM and VSAM dominated the market. Even though relational databases were first touted as best-suited for analytics while earlier products such as IMS were still being targeted at transaction-intensive operational environments, for the most part (exceptions include column-based relational offerings and products such as Teradata) they are now considered general-purpose technology suited for both transaction processing and analysis.

One of the benefits of a relational database is that it is likely, for the most part, to conform to a recognized ANSI SQL standard. Although most database vendors offer extensions to the standard as a point of differentiation (as well as a way to lock in the installed base!), organizations whose SQL code conforms to the standard should be able to migrate from one relational product to another. However, this is not the case with NoSQL data structures. That said, there are many benefits of NoSQL data structures, including flexible schemas that allow new fields to be added dynamically.

Of particular importance is NoSQL's ability to handle and quickly process vast amounts of structured and unstructured data. NoSQL can easily add, with minimal administrative effort, additional servers and scale out (sharding) to massively parallel distributed configurations to accommodate increased processing demands.

NoSQL data structures typically also relax the ACID (atomic, consistent, isolated, durable) properties that are a fundamental component of almost all relational databases. However, a non-ACID-compliant database runs the risk that the results of updates to the data could be, at any point in time, inconsistent and suspect. Recognizing this concern, some NoSQL vendors claim that their products have ACID properties which can be enabled, albeit with a performance hit. In fact, the desire to scale out, maintain strict ACID compliance in update-intensive applications, and fully support SQL has led to the creation of a new class of distributed relational offerings, appropriately named NewSQL.

One concern I have with NoSQL data structures is that there is no well-defined standard other than the fact that they originally did not use SQL as their interface. This is no longer a universal truth -- many now consider NoSQL to mean "not only SQL." There are well over 100 offerings that describe themselves as NoSQL, with the majority of them falling into categories that include key/value, big table, document, and graph.

Unfortunately, the ability to migrate from one category to another (e.g., document to graph) is far from trivial. My advice to any organization trying to decide which NoSQL database to deploy would be to not choose a single database but to assume that, over time, the organization will use several NoSQL categories and perhaps even deploy multiple products within each category. The specific choice will be based upon its suitability for each particular application.

In any organization, relational, NoSQL, and NewSQL data structures should not be mutually exclusive alternatives. I have long advocated that organizations plan on an overall data warehouse architecture rather than attempt to build a single enterprise data warehouse. In fact, for an enterprise of reasonable size, I consider the effort to build a single enterprise data warehouse as akin to attempting to boil the ocean.

With regards to relational, NoSQL, and NewSQL database offerings, I advocate organizations consider and perhaps deploy products that fall into each category. For example, an organization might use a NewSQL database for a transaction-intensive order processing operational system, archive data to a Hadoop Distributed File System (HDFS) data store, and then deploy MapReduce (or one of the many emerging SQL-like front ends) to extract data for populating a relational data warehouse, data mart, or analytic appliance.

If your organization currently has limited NoSQL or NewSQL experience, I highly recommend quickly gaining experience with non-mission critical prototype applications. Although I do not expect to see the demise of relational databases that some others are predicting, I believe that the traditional relational database is not the only answer to every database application. This is certainly the time for organizations to gain experience with non-relational (and non-legacy) NoSQL and NewSQL database technologies.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.