TDWI Upside - Where Data Means Business

Companies, Users, and Surveys Agree: Spark Is Hot, Hot, Hot

Spark is a hot topic right now. So hot, in fact, it's surpassed Hadoop as the open source technology with the most widespread appeal. Whether it can sustain this momentum is another matter.

Apache Spark is hot, hot, hot. So hot, in fact, it's surpassed Hadoop as the open source technology with the most widespread appeal.

Sure, Spark uptake still pales in comparison to Hadoop, but indicators suggest Spark's popularity is still waxing -- even as Hadoop's wanes.

Gartner Inquiries Chart Growth

One indicator comes via Gartner researcher Merv Adrian, who notes that client inquiries about Spark to Gartner's data team have exploded in 2015 and 2016 -- even as inquiries about Hadoop are growing at a much slower rate, if not flattening.

Right now, Adrian notes, inquiries about Hadoop comprise about 12 percent of all inquiries to Gartner's data team; Spark-related inquiries, by contrast, account for about two percent.

Last year at this time, Spark-related inquiries made up only a fraction of a percent of client inquiries, Adrian explained. "[That] percentage does not reflect the [market] momentum. Spark is rising faster than Hadoop did two to three years ago," Adrian told attendees at the 15th annual Pacific Northwest BI Summit.

"In the last six months, it's ... almost [like a] hockey stick, rising very, very rapidly. Vendors aren't calling us to ask us about integrating Spark into their stacks. The people who are calling us -- we are actually getting a ton of interest from ... clients [i.e., corporate customers] right now."

Big Investments in Spark, Independent of Hadoop

That level of interest begs a question: are companies replacing Hadoop with Spark or spinning up Spark clusters in the context of Hadoop? "The degree to which Spark is being deployed independent of an HDFS underpinning is still relatively small but it is nonzero and it is growing," Adrian told attendees. He cited IBM's entry into the Spark space as a catalyst for interest, if not adoption.

"There's massive interest [in Spark] ... and lots of investment. IBM's not the only one massively investing," he said, noting that even though Big Blue only has two active Spark committers -- active, acknowledged developers who are adding to the software -- IBM continues to contribute a significant amount of resources (and code) to Spark development.

"IBM has made huge investments, but ... the rest of the community is racing to keep up."

Surveys Reflect Spark's Popularity

In addition to Adrian's metric -- client requests to Gartner -- a growing collection of other data points suggest Spark is sizzling. Consider the results of a survey of 50,000 developers published earlier this year by Stack Overflow, a user-curated question-and-answer portal for IT professionals.

Spark was the number two "Top Trending Tech" in Stack Overflow's tally. Spark developers, meanwhile, tied with Scala developers for bragging rights in the "Top Paying Tech" category, with the average Spark or Scala coder earning $125,000.

(The Stack Overflow survey didn't distinguish between general-purpose Scala development and Scala development on Spark. Scala is one of several languages -- including Python, Java, and R -- used to program in the Spark environment.)

"Developers with mathematics backgrounds ... who know Scala, Spark, or Hadoop get paid more than their peers," the Stack Overflow survey concluded.

That isn't all. Earlier this year, high-performance ETL specialist Syncsort published the results of a survey of 250 data architects, IT managers, developers, data scientists, and business analysts.

Syncsort's survey found that interest in Spark far outpaced interest in other big data-related technologies. For example, in response to the question "Which of the following compute frameworks are of most interest?" 67 percent of respondents cited Spark. That outstripped interest in any other compute framework, including the venerable MapReduce engine, cited by 55 percent of respondents.

(Respondents were permitted to select from among multiple options -- including the Tez framework, which supersedes legacy Hadoop MapReduce. Tez was cited by 19 percent of respondents.)

"Although MapReduce has a larger installed base, Apache Spark is generating significant interest as a powerful compute platform for analytics. This is a scenario that is bound to repeat itself again and again as the number of tools in the Hadoop ecosystem continues to expand," write the authors of the Syncsort survey report.

Is Spark Heading for a Fall?

Gartner is well known for its "Hype Cycle" -- in particular, for the "trough of disillusionment" component of said Hype Cycle. From the thousand-foot-high perspective of the Hype Cycle, every (greater or lesser) peak is a prelude to a (greater or lesser) plunge.

The question, as Adrian sees it, is whether Spark is still peaking -- or ready to plunge. "Spark is up over the [Hype Cycle] peak now. It could hardly be more hyped than it is, but it's really beginning to move for the first time into actual user usage as opposed to vendor usage," he said. He noted that it's at this point (i.e., where the metaphorical rubber meets the road) that disillusionment often starts to set in.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.