October 29, 2013
User organizations are aggressively adopting HDFS and other Hadoop technologies for data warehousing (DW), data integration, and analytics. According to a recent TDWI survey about Hadoop, only 10% of respondents report having the Hadoop Distributed File System (HDFS) in production today, but a whopping 63% expect to deploy HDFS within three years.
A number of trends are driving Hadoop adoption. Organizations want more business value from big data in the form of insights gained from analyzing big data managed on Hadoop. Hadoop complements data warehouse platforms, data integration, and analytic tools, handling massive data volumes and diverse, multi-structured data in scalable and cost-effective ways that traditional platforms cannot. Finally, users are moving toward multi-platform environments for DW, data integration, and analytics, and Hadoop is a welcome addition because it excels with workloads for massive data, ETL, and new analytic algorithms.
Most of the organizations adopting Hadoop are completely new to it, so they need to educate themselves quickly about emerging best practices. This TDWI Checklist Report will assist with that education by beginning with an overview of the rapidly evolving Hadoop ecosystem. The checklist of best practices presented here can help users make sustainable decisions as they plan their first Hadoop deployments.