How Big Databases on Demand Are Paving the Way for Analytics on Demand
Cloud computing plus a new generation of big data analytics DBMS are enabling big databases on demand.
By Mike Lamble, President, XtremeData, Inc.
Data has become a pervasive and abundant raw resource that yields competitive advantages. Management wants more, quicker, and deeper insights from increasingly larger data sets. Organizations have to move fast and leverage data in ways never imagined, pose questions that have never been asked, provide answers faster, augment and analyze new data sources in minutes versus months, and experiment with big data sets rather than samples.
Unfortunately, building big databases -- from hundreds of gigabytes to hundreds of terabytes or more -- typically requires lead times of weeks or months and large capital outlays that often include seven or eight digits. At the bottom of the Maslovian value pyramid of big data analytics is the computing equipment and a database management system (DBMS). Business domain-specific statistical models are at the pyramid's pinnacle, and data warehouses and data marts are somewhere in the middle. Although the data infrastructure adds the least competitive differentiation, it adds as much as 50 percent to the “cost per answer” and “time to answer.”
A Better Way
The combination of cloud computing and a new generation of big data analytics DBMS is enabling big databases on demand. This means that enterprise-class databases for data warehouses, data marts, and analytic sandboxes can be set up in hours and populated and put to use within hours or days. They can be decommissioned even more quickly so users pay only for what they use. Cloud computing allows customers to procure capacity on demand and expand elastically. Although most big data analytics DBMSes don’t scale in cloud environments, there are solutions that are breaking this barrier.
For analytics organizations, this means that data scientists can create and tear down multi-terabyte analytics databases without specialized database skills using a click-driven user interface. It means that end users can turn on or turn off the database in order to control usage costs. They can also add or remove computing resources to match business workloads and meet service-level requirements. Query performance can be fast and predictable by expanding or contracting computing resources, and internal IT’s role vis-à-vis infrastructure will be diminished.
Equally important is the fact that big databases on demand are considerably less expensive than adding large-scale analytics capacity to in-house data centers. Savings result from economies of scale that cloud providers achieve through resource pooling and sharing and from price competition among cloud providers.
The Next Big Thing: Analytics Database-as-a-Service
Over the last two decades, massively parallel (MPP) DBMSes have emerged as a platform of choice for enterprise-class big data analytics because these systems can scale to deliver consistent query performance in the face of growing user and data volumes. More recently, data warehouse appliances (proprietary hardware coupled with massively parallel DBMSes) have emerged as a cost-effective MPP solution. However, big databases on demand can supersede appliances as a “better, cheaper, faster” alternative for many applications.
This is only the beginning. Once big analytics databases are available in public clouds, look for the database-as-a-service model (DBaaS) -- a delivery model that is taking hold for transaction databases -- to extend to analytics. In the DBaaS model, a service provider owns and maintains the database and ensures continuous availability and service-level compliance for its customers. The value proposition is better and faster service at a lower cost than in-house alternatives.
Joe Emison, CTO and co-founder of Buildfax, a progressive implementer of cloud-centric data solutions, puts it this way: “Our core competency is knowing what data to generate, store, and analyze; we do not need or want to be experts in deploying, maintaining, and scaling large-scale analytical database hardware and software. The benefit in DBaaS is that we can avoid hiring or contracting talent and paying for large capital expenditures around establishing an analytical database.”
Among the fastest growing services at Amazon AWS and Google Cloud are their DBaaS services. Several early-stage companies are offering DBaaS exclusively, and platform-as-a-service providers are also offering DBaaS. So far, these offerings focus mostly on transaction-oriented databases for operational needs. All of them reference an underlying database engine, most often MySQL or Oracle. Now that a new class of MPP DBMS is achieving limitless scalability in cloud environments and big analytic databases on demand are a reality, it won’t be long before cloud and platform-as-a-service providers wrap their services around these databases, and new companies emerge offering analytics DBaaS.
What About Hadoop?
Today, the preponderance of big data-driven analytics applications are delivered via SQL database engines rather Hadoop (aka NoSQL) solutions. Hadoop’s footprint in the area of data collection, staging, and ETL is growing at a rapid pace, especially for new data types such as Web-, device-, and machine-generated data. It is redefining customer expectations in terms of scalability, resiliency, and affordability.
However, Hadoop solutions are in catch-up mode for query performance, SQL support, and interoperability with widely used BI tools. These gaps are being addressed by leading distributors. The new generation of analytics DBMSes -- ones that can support big databases on demand -- will need to integrate well with Hadoop, supporting streaming flows of data with the Hadoop Distributed File System (HDFS).
There are frequently noted adoption hurdles to big databases on demand, particularly among larger companies in contrast to those that are growing up the cloud. One of the reasons often cited is the time and cost associated with moving data from its current location(s) to a cloud. Also, data security and privacy concerns are often-cited barriers to hosting enterprise data in the cloud. Finally, run times and query performance are less predictable on most clouds, public or private, than on “bare metal.”
Leading cloud providers are working overtime to address these issues. A variety of high-speed options are being introduced to more tightly couple in-house data centers with clouds. New capabilities and assurance levels for system performance and data security are being introduced, and more are sure to come.
Big databases on demand are a promising innovation in the enterprise information management landscape. They are enabled by the marriage of cloud computing and next generation analytics DBMS, which are radically scalable and auto-optimizing. Not only does this exciting development reduce the costs and compresses cycle times for big data analytics, it also provides self-service capability to business-oriented BI professionals.
Mike Lamble is president of XtremeData, Inc., providers of a high-performance DBMS for big data analytics deployable in the cloud and on-premise. You can contact the author at [email protected]