TDWI Checklist Report | Evolving Big Data into a Mature Platform

January 12, 2015

Big data technologies have rapidly evolved to address the absorption, organization, and analysis of growing volumes of different types of structured and unstructured data. Volume growth continues to accelerate, spurring C-level executives to drive the exploration and rapid adoption of big data technologies.

This trend is borne out in a recent TDWI Best Practices Report on managing big data. According to the report, 40 percent of the surveyed respondents noted that big data activities are underway, with half of those being committed projects, deployed, and some relatively mature. Although an additional 37 percent responded that big data management is “under discussion,” 60 percent are confident that big data management solutions will be in production within the next two years.1

Although one aspect of the strategy involves acquiring hardware and software to support big data initiatives, challenges exist in adoption, application migration, and staffing (the dearth of employees with the requisite big data design, development, and analytics skills). Although a variety of technologies can be categorized as big data, much of the investigation (as well as speculation regarding the business benefits) focuses on the open source Hadoop ecosystem and its various enterprise-class variants.

The open source software stack for big data is evolving rapidly, and many business and technology leaders are considering Hadoop’s value proposition. Many are still experimenting to assess its usability and suitability as part of the enterprise environment, and adoption has been solid among those using the ecosystem for fundamental storage augmentation such as capturing log data, extending a data warehouse, and possible use as a platform for queries and reporting.

However, a growing community is broadening its use to sophisticated analytics for monitoring, prediction, and prescription. These users seek to take the system beyond the prototypical adoption patterns, and the developer community is responding. The maturation of the componentry within Hadoop (such as improvements to the execution model integral to YARN, evolving tools such as Spark that enable in-memory cluster computing, and SQL engines with scalable performance such as Impala, Stinger, and Spark SQL) reflects a better understanding of true business application performance requirements.

This imminent migration among early adopters away from the concept of Hadoop as a platform solely for storage extension (“data lake”) and toward a more effective platform for real-time analytics implies the need for a mature big data environment that flexibly balances performance with oversight and governance. Ultimately, however, performance will become the most critical need motivating innovation in this space, leading serious adopters to seek systems that can simultaneously serve multiple analytics batch and low-latency workloads that are suited to in-memory, supercomputer-class computation.

By reflecting on the trajectory of big data adoption, this TDWI Checklist Report examines how selecting the right components will guide your transition and integration strategy to realize a mature big data platform that can be integrated within a production information technology environment.

Your e-mail address is used to communicate with you about your registration, related products and services, and offers from select vendors. Refer to our Privacy Policy for additional information.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.