RESEARCH & RESOURCES

Executive Summary: Integrating Hadoop into Business Intelligence and Data Warehousing

Hadoop promises to assist with the toughest challenges in BI today, including big data, advanced analytics, and multi-structured data. Download this TDWI Best Practices Report to learn how to integrate Hadoop into your business intelligence, analytics, data integration, and data warehousing technology stacks.

Apache Hadoop is an open source software project administered by the Apache Software Foundation (ASF). The Hadoop family of products includes the Hadoop Distributed File System (HDFS), MapReduce, Pig, Hive, HBase, and so on. These products are available as open source from ASF, as well as from several software vendors. The number of vendor products that integrate with Hadoop products increases almost daily. In this report, the term “Hadoop” usually means the entire Hadoop family of products, regardless of their open source or vendor origins. Some discussions focus specifically on HDFS.

Business intelligence (BI) professionals’ interest in Hadoop has been driven up in recent years because Hadoop has proved its usefulness with the toughest challenges in BI today, namely big data, advanced analytics, and multi-structured data. For that reason, TDWI anticipates that Hadoop technologies will soon become a common complement to (but not a replacement for) established products and practices for business intelligence (BI), data warehousing (DW), data integration (DI), and analytics. Therefore, a wide range of user organizations need to prepare for Hadoop usage. Although it’s true that Hadoop can be valuable as an analytic silo, most organizations will prefer to get the most business value out of Hadoop by integrating it with—or into—their BI, DW, DI, and analytics technology stacks.

According to this report’s survey, users with hands-on Hadoop experience say it’s still immature and needs serious improvements in security, administrative tools, high availability, and real-time operation. These and other problems are being addressed by the open source community of technical users, which continues to infuse innovation into existing Hadoop products as well as introduce new ones via ASF’s incubation process. The pace of Hadoop innovation has accelerated because a number of software vendor firms now contribute to Hadoop’s open source. The first wave of support for Hadoop technologies by vendor tools and platforms is already in place, with subsequent waves coming soon. The number of technical users conversant in Hadoop is increasing steadily.

According to this report’s survey, the Hadoop products most commonly used today are (in priority order) MapReduce, HDFS, Java, Hive, HBase, and Pig. Those poised for greatest future adoption are Mahout, Zookeeper, and HCatalog. All of these have compelling use cases for BI, DW, DI, and analytics. In fact, survey respondents who have hands-on Hadoop experience say they’ve already integrated Hadoop with analytic tools, DWs, reporting tools, Web servers, analytic databases, and data visualization tools—showing that Hadoop is already established as a component within BI/DW technology stacks. Of these respondents, 78% feel Hadoop is a complement to a DW, not a replacement. Enabling big data analytics is the leading benefit of Hadoop, whereas a lack of Hadoop skills is the leading barrier. BI/DW aside, a few respondents also anticipate using Hadoop as a live archive (23%) or as a platform for content management (35%).

Only 10% of organizations surveyed have a Hadoop implementation in production today, but a whopping 51% say they’ll have one within three years. If this trend pans out, Hadoop will impact at least half of BI/DW environments soon. Hence, users need to prepare for Hadoop usage now.

The purpose of this report is to accelerate users’ understanding of the many new Hadoop-based products that have emerged in recent years. The report also maps newly available Hadoop options to real-world use cases. This information can help user organizations successfully integrate Hadoop technologies into their BI portfolios and practices with maximum business value.

Cloudera, EMC Greenplum, Hortonworks, ParAccel, Pentaho, SAP, SAS, Tableau Software, and Teradata sponsored the research for this report.

About the Author

Philip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 600 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him by email ([email protected]), on Twitter (twitter.com/prussom), and on LinkedIn (linkedin.com/in/philiprussom).


TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.