RESEARCH & RESOURCES

Executive Summary: Hadoop for the Enterprise

Hadoop began its journey by proving its worth as a Spartan but highly scalable data platform for reporting and analytics in Internet firms and other digital organizations. The journey is now taking Hadoop into a wider range of industries, use cases, and types of organization. Hadoop is again challenged to prove its worth, this time by satisfying the stringent requirements that traditional IT departments and business units demand of their platforms for enterprise data and business applications.

Hadoop for the enterprise is driven by several rising needs. On a technology level, many organizations need data platforms to scale up to handle exploding data volumes. They also need a scalable extension for existing IT systems in warehousing, archiving, and content management. Others need to finally get BI value out of non-structured data. Hadoop fits the bill for all these needs.

On a business level, everyone wants to get business value and other organizational advantages out of big data instead of merely managing it as a cost center. Analytics has arisen as the primary path to business value from big data, and that’s why the two come together in the term “big data analytics.” Hadoop is not just a storage platform for big data; it’s also a computational platform for business analytics. This makes Hadoop ideal for firms that wish to compete on analytics, as well as retain customers, grow accounts, and improve operational excellence via analytics.

For these and other reasons, Hadoop adoption is accelerating. TDWI survey results show that Hadoop clusters in production are up 60% in two years. Almost half of respondents have new Hadoop clusters in development, and these will come online within 12 months. At this rate, 60% of users surveyed will have Hadoop in production by 2016, a giant step forward.

Adoption is accelerating because most users (89%) consider Hadoop an opportunity for innovation According to this report’s survey, Hadoop’s leading benefits include improvements to analytics, data warehousing, data scalability, and the handling of exotic data types, in that order. Leading barriers are inadequate technical skills, weak business support, security issues, and weak open source tools. All these barriers (and others) are being corrected by user best practices and advancements from both open source and vendor communities.

As Hadoop broadens across the enterprise, its ownership is shifting from departments and application teams to central IT. This makes sense when IT provides Hadoop clusters as shared enterprise infrastructure. The people working on these clusters are most often data scientists, data architects, data analysts, and developers. These people are rare and expensive on the job market, so most organizations train existing employees in Hadoop skills instead of hiring them.

Best practices for enterprise Hadoop are coalescing. Developers employ a mix of programming and high-level tools, though they prefer the latter. Most clusters are on premises today but going to clouds soon. Developers complain of poor SQL and relational functions on and off Hadoop today, but vendors and open source contributors are working aggressively on improvements.

The leading future use cases for enterprise Hadoop (according to survey respondents) are enterprise data hubs, archives, and BI/DW. Half of respondents expect to improve existing Hadoop clusters by integrating them with data quality and master data management tools.

This report accelerates users’ understanding of the many new products, technologies, and best practices that have emerged recently around Hadoop. It will also help readers map newly available options to real-world use cases, with a focus on mainstream enterprise uses, while respecting triedand- true IT practices and delivering maximum business value.

Actian Corporation, Cloudera, EXASOL, IBM, MapR Technologies, MarkLogic, Pentaho, SAS, Talend, and Trillium Software sponsored the research and writing of this report.

About the Author

Philip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 600 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him by email ([email protected]), on Twitter (twitter.com/prussom), and on LinkedIn (linkedin.com/in/philiprussom).


TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.