Executive Summary: Hadoop for the Enterprise
- By Philip Russom, Ph.D.
- March 31, 2015
Hadoop began its journey by proving its worth as a Spartan but highly scalable data platform for
reporting and analytics in Internet firms and other digital organizations. The journey is now taking
Hadoop into a wider range of industries, use cases, and types of organization. Hadoop is again
challenged to prove its worth, this time by satisfying the stringent requirements that traditional IT
departments and business units demand of their platforms for enterprise data and business
applications.
Hadoop for the enterprise is driven by several rising needs. On a technology level, many
organizations need data platforms to scale up to handle exploding data volumes. They also need a
scalable extension for existing IT systems in warehousing, archiving, and content management.
Others need to finally get BI value out of non-structured data. Hadoop fits the bill for all these needs.
On a business level, everyone wants to get business value and other organizational advantages out of
big data instead of merely managing it as a cost center. Analytics has arisen as the primary path to
business value from big data, and that’s why the two come together in the term “big data analytics.”
Hadoop is not just a storage platform for big data; it’s also a computational platform for business
analytics. This makes Hadoop ideal for firms that wish to compete on analytics, as well as retain
customers, grow accounts, and improve operational excellence via analytics.
For these and other reasons, Hadoop adoption is accelerating. TDWI survey results show that
Hadoop clusters in production are up 60% in two years. Almost half of respondents have new
Hadoop clusters in development, and these will come online within 12 months. At this rate, 60% of
users surveyed will have Hadoop in production by 2016, a giant step forward.
Adoption is accelerating because most users (89%) consider Hadoop an opportunity for innovation
According to this report’s survey, Hadoop’s leading benefits include improvements to analytics, data
warehousing, data scalability, and the handling of exotic data types, in that order. Leading barriers
are inadequate technical skills, weak business support, security issues, and weak open source tools.
All these barriers (and others) are being corrected by user best practices and advancements from both
open source and vendor communities.
As Hadoop broadens across the enterprise, its ownership is shifting from departments and
application teams to central IT. This makes sense when IT provides Hadoop clusters as shared
enterprise infrastructure. The people working on these clusters are most often data scientists, data
architects, data analysts, and developers. These people are rare and expensive on the job market, so
most organizations train existing employees in Hadoop skills instead of hiring them.
Best practices for enterprise Hadoop are coalescing. Developers employ a mix of programming and
high-level tools, though they prefer the latter. Most clusters are on premises today but going to
clouds soon. Developers complain of poor SQL and relational functions on and off Hadoop today,
but vendors and open source contributors are working aggressively on improvements.
The leading future use cases for enterprise Hadoop (according to survey respondents) are enterprise
data hubs, archives, and BI/DW. Half of respondents expect to improve existing Hadoop clusters by
integrating them with data quality and master data management tools.
This report accelerates users’ understanding of the many new products, technologies, and best
practices that have emerged recently around Hadoop. It will also help readers map newly available
options to real-world use cases, with a focus on mainstream enterprise uses, while respecting triedand-
true IT practices and delivering maximum business value.
Actian Corporation, Cloudera, EXASOL, IBM, MapR Technologies, MarkLogic, Pentaho, SAS, Talend, and Trillium Software sponsored the research and writing of this report.
About the Author
Philip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 600 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him by email ([email protected]), on Twitter (twitter.com/prussom), and on LinkedIn (linkedin.com/in/philiprussom).