Five Database Requirements for Digital Transformation
A cloud database must be designed and developed from the ground up to have these five critical characteristics.
- By Chris Anderson
- March 11, 2019
Data is the lifeblood of modern businesses. Some people call data the newest "natural resource" or the "new oil." Business executives across every industry have come to realize that data is critical to allowing them to run their organizations effectively and efficiently. They have also come to realize that older methods of managing data are no longer sufficient. The amount of data that companies need to deal with, the speed at which new data is being generated or new data sources are being added to their analyses, and the many different types of data that are being used have overwhelmed the previous generation of data management tools.
Digital transformation is a competitive requirement for companies. Originally, digital transformation meant moving from analog to digital systems. Most companies have already done that, and their operations are primarily digital. Today, the term digital transformation has evolved to mean something more. From an IT standpoint, it refers to the continuous modification of IT systems to take full advantage of the state-of-the-art in technology. From a business perspective, it means delivering more engaging customer experiences -- where technology plays a key role.
Digital transformation has become a priority for many enterprises as a way to avoid becoming irrelevant by another brand's digital disruption. We have all read the stories about how Uber, Airbnb, Netflix, and others have used emerging technologies to create new business models and disrupt their respective markets. Companies can either face disruption from a new or existing competitor or transform themselves to stay relevant and grow market share -- and perhaps cause some disruption of their own.
The primary driving force today for such transformation is the cloud. Some aspects of IT are woefully outdated, such as databases. Many enterprises continue to use the same database they've been using for the past 10 to 20 years, trying to morph that database into a modern tool through new deployment models such as the cloud. That has proven to be a losing proposition -- what is needed is a database designed and developed from the ground up to have several critical characteristics, including:
- Global scale, cloud-readiness
- Mainframe-class consistency and reliability
- Operational simplicity
- Flexible development
- Uncompromising security
Global Scale, Cloud-Readiness
One of the biggest transformational challenges facing enterprises today is the movement from on-premises computing to cloud computing.
Consider traditional relational database systems. They were great for running a company's financial systems -- think of SAP or Oracle Applications ERP systems running on an Oracle or IBM database on a mainframe located at the company's headquarters. They worked well when all of the interactions with those systems were coming from employees at headquarters. Today's enterprises have gone global, with employees, partners, and (particularly) customers accessing systems around the globe.
The latest phase of digital transformation centers on the cloud. Companies are moving from centralized or regional data architectures to truly global architectures and the foundation for those are the cloud -- whether public, private, or a hybrid. Successfully implementing a cloud architecture is not about taking centralized applications and putting them on AWS, GCP, or Azure. Applications and databases designed for on-premises use do not migrate easily to the cloud. The apps and databases need to be designed and written from the ground up to run in the cloud.
Cloud computing is about virtualization and orchestration of services. It's increasingly about microservices and serverless operations. Those are extremely difficult constructs to add to software that was originally designed to run on a single machine or single cluster of servers. However, in this new phase of digital transformation, companies are realizing that they (and their IT budgets) cannot build and maintain systems to keep up with the growing global demand for their products and services, and they see value in relying on cloud professionals like AWS, Google, and Microsoft to provide them with robust, resilient, reliable platforms for those products and services. Gartner believes that by 2023, 75 percent of all databases will be on a cloud platform (see Critical Capabilities for Operational Database Management Systems).
Many enterprises, however, are hesitant to put all their eggs in one cloud-provider basket for fear of lock-in and becoming victim to extortionist pricing. The ability to run multicloud is a requirement for them -- if one vendor gets too difficult to deal with, they can move their application and data to another one. That means not choosing Amazon- or Google- or Microsoft-only products and services but rather products and services that run on (and across) Amazon, Google, and Microsoft and their own private clouds.
This also implies that the database itself should be able to run across multiple locations (regions or availability zones) at the same time. To support a growing global audience with maximum performance (and minimum latencies), it should be able to support both reads and writes to all locations, so the requirement for global scale and being cloud-ready leads to the need for distributed databases with multimaster or masterless architectures.
Mainframe-Class Consistency and Reliability
Data must be consistent to be trustworthy. When you retrieve a bit of data from your database, can you be sure that that data is correct or does the value depend on which node or cluster you've queried? Databases that provide eventual consistency cannot provide that high degree of data correctness (and thus the trustworthiness of the data), and leave developers mired in uncertainty, guessing what's in the database. Eventual consistency isn't suited for critical workloads; strong consistency is required for data integrity. New systems such as FaunaDB (from the company I work for) and Google Spanner utilize the latest advances in computer science to offer strong worldwide consistency in the cloud.
Operational systems need to stay up and running, but if they fail, they must recover quickly and gracefully. Those attributes rely on good design as well as maturity in the design, implementation, and testing of the system. Many of today's transactions are still being handled on mainframes because companies need their operational systems to stay up and running, but transactions and cloud-based distributed databases haven't been reliable. We have seen this with some of our own customers -- these are companies that had continued to run centralized applications on mainframes until just recently, simply because they hadn't been able to find any distributed systems that could provide the reliability and data integrity they required.
Trustworthiness of systems and data is becoming critical to the use of those systems and that data. To extend the "data is the new oil" metaphor, refined data that is found in databases is like gasoline that runs engines, and we know what happens when you add low-quality gasoline to your gas-powered car -- your engine doesn't run well, if at all. The system (the engine itself) has to be trusted to keep running, and the gas (the data) needs to be trusted as well to make sure the engine reliably and consistently does what it is supposed to do.
Another requirement, a very practical one, is simplicity. Just as enterprises are realizing it is better to rely on cloud providers for the complex global IT delivery mechanism, they are also realizing that their application developers are not database experts and they shouldn't need to write code that performs core functions that should be handled by a database management system. Among those functions is consistency -- companies using Cassandra, for example, must worry about writing specialized application code on top of Cassandra when striving for consistency. That is not simple.
Managing databases is another task that should be simple -- keeping a database up and running is often too formidable a task and takes a team of DBAs. Data replication, data backup, database restarts, data restores -- all are key requirements of any enterprise-class DBMS and for simplicity reasons should happen as automatically and seamlessly as possible. In many systems, they are not. Developers are flocking to fully-managed serverless databases such as FaunaDB and DynamoDB.
The cloud adds another chaotic dimension to database operations. A cloud database must ensure availability and consistency, even in the shifting winds of the cloud: clock skew, virtual machine pauses, and unplanned outages. Legacy database management systems were built for another era, and few of the databases purpose built for the cloud have made operational simplicity a priority. Big clouds like Google or Amazon have little need to simplify infrastructure management, as they can always find an expert.
Successful companies adapt to changing business conditions quickly. They can update their customer applications to offer new, compelling features more quickly than their competitors. They can analyze data in real time to provide direction for changes in strategy, product, or services. They can improve existing systems in days or hours instead of the years or months that it used to take.
Systems need to be capable of performing well under existing business conditions, but also be mutable enough to allow companies to change what they do and how they do it quickly and easily. Data flexibility is the number-one driver for adopting new databases. A database that can handle flexible data (and impose constraints when necessary) allows developers to focus on the most important business logic.
The cloud transformation is part of that. Instead of buying a data center's worth of servers and network equipment (and amortizing over a decade), they can now register for new services from a cloud vendor and have a brand-new data infrastructure fully implemented in a few minutes and change it every day if they wish.
Applications need agility, too, which includes the way they access and manipulate data in a database. Perhaps you need a relational model for one application but a graph model makes more sense for another application. If your database has the flexibility to support both types of queries, you have the agility to change how you deal with your data at the application level (which can be fast) rather than at the database level (which can be a long, slow, and complex process).
Database security breaches continue to be front page news. Most of the breaches seem to be due to lapses in judgment or procedure rather than due to structural flaws in the software, unless you consider a default setting of no security as a structural flaw. Some databases were designed in a different age, when those databases were strictly internal use tools, so the provider often left it as an exercise for the user to implement strong security if they wanted it.
For example, we hear about a new data breach due to unsecure MongoDB configurations almost every day. The companies that deployed those databases pay the price when bad actors discover their sites and hold the data for ransom. Security needs to be strong and it needs to be automatic, not an afterthought that can be overlooked in the rush to get it into production.
Digital transformation is a constant today. There are certain capabilities that data systems should possess to make the current phase of transformation as seamless as possible. Cloud-native design, support for global scalability, simplicity in operation and application development, system maturity and all the corresponding characteristics, and (last but certainly not least) security, are all critical for today's successful data-driven companies.