Interest in Spark Shows No Signs of Cooling Off
The Apache Spark cluster computing framework is widely used in the enterprise, and most adopters expect to increase their use of Spark over the next year, according to a recent survey of big data professionals.
The Apache Spark cluster computing framework is widely used in the enterprise, and most adopters also expect to increase their use of Spark over the next year. Among companies that aren't using Spark, 40 percent expect to deploy it at some point in the near future.
Those are a few of the findings from a recent survey of big data professionals. IT consultancy The Taneja Group surveyed more than 6,900 qualified technical and managerial professionals who actively work with big data technologies to get a feel for what plans, if any, they have for Spark. According to the Taneja Group, 54 percent of respondents are already using Spark. Of these, nearly two-thirds (64 percent) plan to increase their use of Spark in the next year.
Top Use Cases for Spark
ETL data processing is the top use case for Spark today, endorsed by more than half (55 percent) of respondents. Other top use cases include real-time stream processing (cited by 44 percent of respondents), data science (33 percent), and machine learning (33 percent).
Spark's use in more traditional applications -- such as customer intelligence (cited by 31 percent of respondents) and business intelligence (BI) and data warehousing (29 percent) -- aligns with a recent survey from Spark commercial parent company Databricks. This found that a combined 68 percent of respondents use Spark to support BI or customer intelligence applications.
Why are companies adopting Spark? One reason is performance: 74 percent of respondents cited Spark's speed. Nearly half (48 percent) like it as a versatile platform for advanced analytics processing, although 42 percent prefer it for stream processing. Thirty-seven percent like Spark because -- with support for Scala, Python, Julia, R, and other languages -- it's easy to code for.
Cloud Environments and Skills Shortage
Today, more than half of respondents use Spark in on-premises environments. By contrast, less than a quarter (23 percent) use Spark in the platform-as-a-service (PaaS) or infrastructure-as-a-service (IaaS) cloud. That will change, however: Taneja Group expects cloud deployment to increase to 36 percent. This is also true of software-as-a-service (SaaS) Spark. Today, a meager 3 percent of Spark deployments are SaaS; this could approach 10 percent, according to Taneja Group.
Adopters seem to be having the same problems with Spark they had with Hadoop -- namely, a critical shortage of big data-related skills. To wit: 60 percent of respondents cited the lack of big data skills as a major Spark-related challenge. More than one-third cited complexity -- with respect both to learning about Spark and integrating it into their operations -- as a significant challenge.
Challenges notwithstanding, Spark seems to be coming along just fine, the Taneja Group report concludes. "[I]t's clear that Spark has gained broad familiarity within the big data world and built significant momentum around adoption and deployment," the report says. "The data highlights widespread current user success with Spark, validation of its reliability and usefulness to those who are considering adoption, and a growing set of use cases to which Spark can be successfully applied."