RESEARCH & RESOURCES

Taking the Big Data Leap: A Business Case for Hadoop

Don't be tempted to start your Hadoop journey by exploring fancy new analytics. Offloading existing ELT workloads could be your ticket to a bright Hadoop future.

By Jorge Lopez, Director of Product Marketing, Syncsort

Recently I attended a presentation by management expert Dr. Geoffrey Moore. Better known for his book Crossing the Chasm, Dr. Moore explained how pragmatists, for the most part, wouldn't adopt disruptive technologies until they were in pain. The pain Dr. Moore describes includes inadequate solutions, falling behind the competition, and changing market conditions.

This is the perfect description for the current business environment. Today, every organization of meaningful size and scope has to become a big-data-driven organization -- that is, if they want to remain relevant in the marketplace. IT departments from even the most traditional enterprises are feeling the pain. Business users ask for fresher data but batch windows are taking longer. Users require longer data history but costs are shortening data retention windows. Users demand shorter faster query response times but databases are getting slower as they compete with batch workloads for CPU and I/O resources. The list goes on. This is the pain that is pushing organizations to take the leap into big data technologies and, more specifically, Hadoop.

When organizations are pushed to big data, they cannot view it in isolation. They've spent a tremendous amount of time and money building traditional architectures that do a good job ingesting, processing, and distributing data. At the same time, big data is imposing a new set of requirements that these architectures cannot manage. Therefore, any feasible proposition needs to provide a framework to integrate big data technologies into the existing architectures. This approach identifies and leverages the knowledge that IT teams have accumulated over the years while maximizing the benefits of big data technologies to deliver new insights and significantly lower IT costs.

How can IT groups build a solid business case for Hadoop? I recommend following Dr. Moore's advice: pick a niche market with an "intractable problem" and offer a complete solution. Translating into our world of data warehousing, it seems like many of the pains we've described are caused by growing ELT workloads driving increased database spending. This, in turn, forces organizations to make tradeoffs, such as supporting more data sources, but only at the expense of shorter retention windows or faster query delivery, but only for relatively smaller datasets.

With the right tools, Hadoop can deliver a complete solution to this "intractable problem." Organizations can offload ELT workloads from expensive data warehouses into Hadoop. This way, Hadoop can integrate into the existing architecture, becoming a massively scalable and cost-effective staging area for all corporate data. With many organizations spending millions of dollars a year in database capacity just to process ELT workloads, the savings alone can justify adopting and building a Hadoop cluster.

Just look at the cost of managing 1TB of data -- estimates for Hadoop range anywhere from $500 to $2,000; estimates for a high-end data warehouse can range from $20,000 to $200,000. One may wonder, why use all this premium capacity to land, sort, aggregate, and join data?

As big data technologies go mainstream, traditional businesses that have a well-planned data architecture -- such as retail, telcos, insurance, and financials -- will require frameworks that recognize this reality. Dismissing existing architectures, especially with these types of organizations, can easily become the recipe for short-lived big data initiatives. Although IT departments might be tempted to start their Hadoop journey by exploring fancy new analytics, the easiest and fastest way to build your cluster and skills might be in the more mundane world of solving existing problems. Offloading ELT workloads could be your ticket to a bright Hadoop future.

Jorge A. Lopez is the director of product marketing at Syncsort. You can contact the author at jlopez@syncsort.com or follow him on Google+.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.