Transition Point: Integrating Hadoop into the Enterprise
Although many are moving toward broader adoption of big data for enterprise use, barriers and concerns remain. Is Hadoop part of your enterprise’s plan for modernization?
- By David Loshin
- May 12, 2016
In the past year we have reached a transitional stage in the movement toward adoption of big data and Hadoop as part of the enterprise application framework. Many companies have now accepted big data technologies, and we are beginning to see these organizations develop more applications that leverage the Apache ecosystem components -- including Hadoop-based and Spark-based capabilities.
Even if many limit the use of this technology to relatively straightforward applications, such as data warehouse augmentation or the creation of a data lake for data set landing, we should recognize this as the first step toward a broader adoption of the Hadoop stack in ways that will be fully integrated into the enterprise.
It is not clear whether the Hadoop approach is necessarily destined to replace the existing data management infrastructure. This is especially true for cases in which there are significant investments in existing platform technologies, such as the venerated enterprise data warehouse.
The conservative approach to modernization involves incrementally selecting business processes that are due for reconsideration, reassessing their functional and data requirements, and determining whether they can be better supported using newer technologies. This slow and steady approach suggests that at least for the short- and medium-term, the enterprise will embrace a hybrid structure that includes both conventional and emerging architectures as well as cloud-based or SaaS systems.
In some cases, there are barriers to adopting these emerging technologies, such as concerns about having the right training. Some enterprises are also wary of unconventional architectures, potential issues surrounding security, or challenges of integration and interoperability.
These are all legitimate issues, and they are all compounded by the concern that the rate of change in the Apache open source ecosystem is not just high, but that it is continuing to accelerate -- to a point where it is difficult to keep up with innovations.
Organizations that are implementing new platforms must have a plan to manage the systems necessary to run the company even as some of the business processes supported by those systems are being readied for modernization.
This means that even when using the conservative approach to selective renovation and modernization, there are key complexities (such as dual operations of both legacy and renovated systems, enforcing data use and protection policies, cross-platform data integration, and performance tuning) that must be simultaneously addressed to keep the store running while the renovations are taking place.
My research company, DecisionWorx, is looking for your input to help us understand where your organization is in your Hadoop journey, and what your perceptions are regarding the challenges of modernization using Hadoop.
Please complete our new survey on Hadoop productionalization to provide feedback about your corporate experience in evaluating different software distributions and vendor offerings, the relative ease of design and development of applications, expectations about cost, and other facets of integrating this innovative technology into the enterprise. Results of the survey will be shared with Upside readers in a future column.
About the Author
David Loshin is a recognized thought leader in the areas of data quality and governance, master data management, and business intelligence. David is a prolific author regarding BI best practices via the expert channel at BeyeNETWORK and numerous books on BI and data quality. His valuable MDM insights can be found in his book, Master Data Management, which has been endorsed by data management industry leaders.