TDWI Checklist Report | Seven Steps to Faster Analytics Processing with Open Source: Realizing Business Value from the Hadoop Ecosystem
December 9, 2015
Excellence in analytics is a competitive advantage in nearly all industries. Technology trends are moving in a positive direction for organizations seeking to expand the business impact of analytics. New technologies can help organizations democratize the analytics experience so that more managers, operational employees, and other users can engage in faster data-driven decision making.
On the front end, visual analytics and data discovery tools are enabling users to move beyond limited and static data views typical of traditional business intelligence (BI) reporting and spreadsheet applications. Nontechnical users and developers, data engineers, and data scientists working with advanced tools and techniques are pushing to get past traditional BI and data warehousing barriers to access and to interact with a broader range of data, including real-time data streams. They have little time to wait for long extract, transform, and load (ETL) or other preparation processes to complete before they can touch the data. Their urgency is driving innovation toward faster, easier, and more flexible data integration, preparation, and processing.
Open source projects are spawning many key innovations. The Hadoop ecosystem, which developed out of a series of ongoing Apache Software Foundation projects, is maturing and expanding. Hadoop ecosystem technologies could supplement or supplant many traditional BI and data warehousing technologies and practices that may have worked for classic BI querying and reporting but struggle in the brave new world of highly iterative analytics, where questions often lead to follow-up questions as part of data discovery. Old ways also struggle when business-critical analytics processes require ready access to big data and need better performance and flexibility to support the variety of analytic models.
Organizations should evaluate Hadoop ecosystem technologies for how they can contribute to giving users easier, more interactive, and more integrated experiences with data. They should examine how open source technologies and frameworks can reduce delays in preparing and processing data for users, developers, and data scientists who are seeking to employ advanced analytics. This Checklist will discuss seven key considerations to help organizations focus their evaluation and develop a strategy for gaining value from open source technologies to support faster, more powerful, and more flexible analytics.