The End of Hadoop as We've Known It?
Gartner's Merv Adrian dropped a bombshell last month, predicting that Hadoop as we know it may soon cease to exist.
- By Steve Swoyer
- August 15, 2016
Last month, Gartner Inc.'s Merv Adrian dropped a bombshell on attendees at the 15th annual Pacific Northwest BI Summit: Hadoop as we know it may soon cease to exist.
Adrian wasn't predicting the end of the Hadoop ecosystem -- or of the technologies that collectively comprise the Hadoop stack. He was anticipating the obsolescence of Hadoop as a marketing differentiator -- and with it, the end of an era.
Cloudera, Hortonworks, and MapR used to trumpet their Hadoop platform underpinnings in their marketing efforts. That's changed. The Hadoop pure plays aren't running away from their identification with the Hadoop platform, Adrian conceded -- but they aren't playing it up anymore.
Hadoop Vendors Moving From Disruptors to Distributors
As Adrian sees it, the metamorphosis of the Hadoop platform vendors from bull-in-the-china-shop disrupters to would-be china shop proprietors is very nearly complete.
"If you went on the Web pages of the independent leading Hadoop distributions in the last couple of months, there's a good chance you didn't see the word 'Hadoop' on the first screen or even the second screen as you scrolled down," said Adrian, somewhat hyperbolically.
(As of July 27, Hortonworks' website does refer to Hadoop at least twice -- including on the first screen, albeit in reference to a Hadoop Summit event. That of MapR Technologies, by contrast, does not. Cloudera's website has a reference to Hadoop on its first screen.)
His point is that the erstwhile Hadoop platform vendors are "repositioning themselves as modern data management vendors." Cloudera, Hortonworks, and MapR now aspire to much more than cost-effective scalable storage -- or, for that matter, low-cost, general-purpose parallel compute capacity -- Adrian said.
A cynic might use the term "overweening" to describe the ambition of a Hadoop platform vendor that aspires to data management. After all, Hadoop is still an impoverished data management platform, at least vis-à-vis traditional -- or conventional -- data management technologies.
Indeed, Adrian's observation produced a snigger or two from attendees. However, he argued that what's happening with the Hadoop vendors transcends the vagaries of marketing.
It's part of something much bigger. Call it the ongoing Renaissance of open source software (OSS).
Open Source Is an Increasingly Practical Choice
"[The Hadoop vendors'] message is evolving rather dramatically. I think what this reflects is not only them, but [it reflects] the entire open source globally interconnected movement for the development of alternative software," Adrian argued. He suggested that OSS development provides an increasingly viable "alternative to the stuff we've been using for decades and are paying a lot of money for."
"The software industry as a whole hasn't learned that you can have a much, much bigger development organization than just the people who work for you and you can get enormous benefit out of shepherding and governing and integrating and testing code built by other people," he said.
"[Using OSS] also makes you more agile. You can bring products to market more quickly than using your own engineering. You can literally pick up a piece of open source software and get pretty far with it. You might want to do some additional engineering work, but that also helps you differentiate [your work] from other people who are using exactly the same software," Adrian continued.
Will Hadoop Move to a New Phase of the Hype Cycle?
Gartner sometimes gets maligned for its "Hype Cycle." This is unfair -- to a degree.
After all, the Hype Cycle merely formalizes the adoption curve to which most (if not all) technologies ultimately conform. First, there's a period of hype or irrational exuberance (more or less irrational depending on political, economic, and other conditions), followed by a correction (more or less severe depending on how greatly a technology was hyped), followed by a period of fairly predictable growth.
At last year's Pacific Northwest BI Summit, Adrian suggested that Hadoop was just then entering Gartner's "Trough of Disillusionment" -- the correction that inevitably follows hype and irrational exuberance. Twelve months later, Gartner says that Hadoop is still moving through the trough -- although Adrian's gut tells him it might be close to resuming something like normal growth.
"[Gartner's consensus is that] Hadoop is now squarely in the trough. I would argue that it's probably moved up the slope a little bit. We think we are basically on the upslope, [which means we're] moving out of the trough," Adrian said.
He went on to explain that sense of upward movement is connected to "real use by real [companies] in sizable numbers that have moved beyond experimentation and pilot [projects and are] actually putting things in production. We're actually starting to get the ... negative feedback [about SQL interfaces for Hadoop] because now people are expecting these things to be usable."
Adrian also said a growing number of the Hadoop-related inquiries Gartner's data team receives have to do with how to get value out of Hadoop. This isn't necessarily a good thing, however.
In the past, Adrian explained, "the outcome that most [customers] were looking for was to demonstrate that the thing could work. Now we're getting people who are actually trying to get value and results out of [Hadoop]. Now they're saying 'This costs way more than I thought it would. It's not delivering nearly what I thought it could deliver, and I can't get all of the users [I thought I could] on it at once. This is what the trough of disillusionment is about, and it reflects where we are in the market."
Inquiries About Hadoop Indicate Growth
One positive sign is that non-traditional adopters are now asking Gartner about Hadoop-related technologies, Adrian said. "We're hearing increasingly from mainstream buyers that haven't really built anything significant with Hadoop before. These are people who don't stick their neck out" before a technology has matured. "We're also hearing a lot from second-timers. People who built a successful project. Now they're coming back and saying, 'What else can we do with this?' -- literally."
Elsewhere, Adrian said that non-pure-play vendors such as IBM and Microsoft frequently come up in client inquiries to Gartner. About 12 percent of client inquiries to Gartner's data team had to do with Hadoop last year, Adrian said.
Cloudera and Hortonworks were the two most asked-about Hadoop vendors. IBM, which has its own Hadoop distribution, was number three. MapR -- another prominent Hadoop pure play -- was fourth, followed by Microsoft (which codeveloped a Hadoop distribution for Windows with Hortonworks and offers a Hadoop service, HDInsight, for Azure) at number five.
That said, Amazon is the single biggest resource for Hadoop in the market. Amazon sells more Hadoop capacity and hosts more Hadoop instances than every other player combined.
"There is a helluva lot more Amazon out there than most of us thought. We all kind of knew it was going on but [Amazon was] so opaque in their financial reporting," he said.
"Today, [Amazon] has more users of Hadoop than all of the other vendors in the market combined. Period. There are thousands of active users with [Amazon's] EMR [Elastic MapReduce]," Adrian pointed out. "The most any of the indies will tell you is 'we're getting close to 1,000 [commercial users] now.' We have some 800 to 900 [users] numbers from some of the pure-play guys, and IBM, Microsoft, none of them have given us specific numbers publicly about what they have."
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at email@example.com.