RESEARCH & RESOURCES

A Road Map to Hadoop Success: Going from Zero to Enterprise Data Hub

For enterprises just beginning to explore Hadoop, we offer a five-point road map to unlocking the full potential of big data.

By Jorge A. Lopez, Director of Product Marketing, Syncsort

Hadoop, along with its related projects, is proving its merits, moving from hype to real value across a wide variety of enterprise applications. Recently, I met with organizations in different industries to discuss how they are capitalizing on the benefits of this framework. A financial services organization is using Hadoop to analyze data generated by core banking applications running on the mainframe to help address regulatory compliance requirements. A healthcare services organization found in Hadoop the cornerstone of its new enterprise architecture that allows them to accelerate medical and pharmacy claims eligibility amid the fast changes in healthcare regulations and the increasing amount of patient data collected every day.

These companies, among many others, are realizing the benefits of Hadoop and the competitive edge it provides. However, for enterprises that have not engaged in Hadoop yet, figuring out what use cases to tackle first is a daunting task. Which ones will provide faster and bigger results or have the highest rate of success? In a previous article I provided insights into one of these use cases -- offloading data and ELT workloads from the data warehouse into Hadoop -- but that is just the beginning. Organizations need a longer-term road map to help them advance their Hadoop journey and unlock the full potential of big data. Here is a high-level view of what that road map may look like.

Offload batch and ELT workloads from the data warehouse and mainframe systems into Hadoop. As mentioned earlier, beginning by solving the intractable problem of the overloaded data warehouse will get you off to a good start, providing the tools, sponsorship, and funding necessary to progress.

Develop an active archive. Storing information in critical systems, such as data warehouses and mainframes, comes at a very high premium. Although these systems are costly, specific regulations do not allow you to simply throw away your stored information. This is why many companies today end up spending millions of dollars simply to lock up their data and back it up to tape. Hadoop provides a better alternative to store all your corporate data by reducing cost as well as making it readily available for the enterprise -- shedding light on this dark data.

Build your enterprise data hub. After offloading batch ELT workloads and implementing your active archive, you've already won a big data battle and are on the path to build a central repository for all corporate information. After all, you've already built a set of new and important skills within your organization beginning with the most familiar sources and workloads. This is a good point to start adding new data sources -- given Hadoop's tremendous flexibility in handling semi-structured and unstructured data -- to the likes of machine-generated data, logs, clickstream data, social media feeds, and more.

Extend business intelligence through data discovery and exploration capabilities in Hadoop. As your Hadoop cluster continues to grow and you're storing more data than ever, the formerly "dark" or locked data is now easily accessible. Additionally, you have mainframe transactional, social media, and clickstream information in one location. This accessibility opens the door to new ways to explore information and discover new insights that blend all these data sources together, allowing a more agile and ad hoc process than the traditional data warehouse. This type of analysis will not replace your conventional reports and executive dashboards, but provides endless opportunities, where experimentation is easier, mistakes are less costly, and insights are equally valuable.

Deliver next-generation analytics. At this point, you've made it to what's considered the "sexiest" part of Hadoop. The complex algorithms that decipher customer preferences, anticipate market demand, and identify and prevent crises before they happen -- something IT organizations have had their eye on from the very beginning. Once you reach this level, you've learned multiple lessons, and when done properly, attained savings during the initial phases of implementation. This knowledge and savings will enable you to experiment with a whole new class of analytics.

Your enterprise has a multitude of ways to integrate Hadoop into your existing IT architecture. This blueprint is not a one size fits all; each company needs to choose a path that will match their skill sets and maturity stages. Although there is no universal strategy, there is one aspect that remains true for everyone: look beyond the buzz, learn from what your peers are doing with Hadoop, and build your own road map. Hadoop is certainly evolving at a rapid pace, but having a plan in place will help you navigate the difficulties of big data.

Jorge A. Lopez is the director of product marketing at Syncsort. You can contact the author at [email protected] or follow him on Google+.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.