December 13, 2011
Despite all the hubbub and hype around Hadoop, few business intelligence (BI) and data warehousing (DW) professionals know much about what Hadoop is, how it does what it does, or in which situations they should deploy it. Because of the newness and complexity of Hadoop, there are several points of confusion that are holding back BI/DW professionals and other people:
- Hadoop is multiple products. As you'll see in this Checklist Report, Hadoop is a family of open source products and technologies overseen by the Apache Software Foundation.
- Hadoop is an ecosystem. In addition to products from Apache, the extended Hadoop ecosystem includes a growing list of vendor products that integrate with or expand Hadoop technologies.
- Apache Hadoop is open source. Its open source software library is available through Apache. For users who want a more enterprise-ready package, a few vendors now offer Hadoop distributions that also include administrative tools and technical support.
- Hadoop manages big data. The Hadoop file system excels with big data that is file based, including files that contain nonstructured data.
- Hadoop enables advanced analytics. Hadoop is excellent for storing and searching multi-structured big data, but advanced analytics is possible only with certain combinations of Hadoop products, third-party products, or extensions of Hadoop technologies.
- Hadoop differs from traditional BI and DW. In particular, the Hadoop family has its own query and database technologies. These are similar to standard SQL and relational databases, such that BI/DW professionals can learn them quickly.
Hadoop and related technologies have been with us for over five years now, but BI/DW professionals have only recently started exploring them, motivated by the rise of big data analytics. The business advantages of big data analytics are the leading reasons why BI/DW professionals need to know more about Hadoop now. Despite the short-term confusion, TDWI anticipates that Hadoop techniques will soon become a common complement to older BI/DW approaches.
To help BI/DW professionals and other people prepare for the eventual widespread use of Hadoop and its extended ecosystem, this Checklist Report drills into common points of confusion. It clarifies these points and reveals the true value of Hadoop for BI, DW, big data, and analytics.