TDWI Articles

Spark Comes to IBM's System z -- But Who's Buying?

If you have a System z mainframe, IBM says it has a deal for you.

If you have a System z mainframe, IBM Corp. says it has a deal for you.

Big Blue recently announced its new z/OS Platform for Apache Spark, a scheme for running the Spark open source cluster computing framework on IBM's mainframe operating system, z/OS. "The new technology we've just announced ... allows you to use the open source Apache Spark solution for doing federated analytics [on the mainframe]," says Kathryn Guarini, vice president for offering management with IBM's z/OS and LinuxOne units.

"There's so much value that our clients find with the data and the transactions already on the platform," Guarini continues. She recites IBM's talking points -- e.g., System z is still used by 92 of the 100 largest global banks -- and mentions the widespread usa of IBM's IMS, VSAM, and DB2 as data sources, along with the staying power of database systems from third-party vendors such as CA Technologies Inc. (IDMS and Datacom/DB) and Software AG (Adabas).

Spark has emerged as a powerhouse platform for conventional SQL analytics (via its Spark SQL interpreter), streaming analytics, and an enormous variety of NoSQL analytics. Consistent with this, the new z/OS Platform for Apache Spark gives mainframe shops a means to host data processing and analytics workloads in situ -- co-located with IMS, DB2 for z/OS, and other critical mainframe data sources.

For example, an organization could use Spark running in z/OS to support ad hoc analysis on data in IMS or IBM's VSAM storage. It can permit near-real-time analysis on the data generated by IBM's CICS Transaction Server. Finally, Spark on z/OS can be used as a high-performance ETL engine to prepare extracts of mainframe data for consumption in other contexts.

"For clients who have their data sources in the z/OS environment, traditional z/OS data sources [such as] IMS, VSAM, DB2 for z/OS ... these data sources can be accessed most quickly and securely by running the Apache Spark solution in the z/OS environment," Guarini argues.

If you're familiar with the minutiae of IBM's mainframe capacity pricing structure, you're probably wondering: why? Why would anybody want to allocate costly mainframe MIPS to something like this, especially when it's already possible to run Spark in the context of Linux on System z?

The first reason, Guarini says, is that running Spark in z/OS -- in the same context as mainframe data sources -- permits "faster insights so that our clients can run their analytics faster." Data doesn't have to be moved or shunted from one context to another, thereby removing latency.

The other reason is that Spark on z/OS is priced to move.

First, a quick primer for readers who aren't up to speed on IBM's 65-year-old platform. In most cases, it costs significantly more to run workloads in z/OS itself than in other contexts.

For example, IBM offers several "specialty engines" -- low-cost workload-specific "processors" that run in a separate context -- for zSeries. These include IBM's Integrated Facility for Linux (IFL) for hosting Linux workloads, its zSeries Application Assist Processor (zAAP) for Java workloads, and its zSeries Integrated Information Processor (zIIP) for data-processing workloads.

Specialty processors are a kind of sop to encourage customers to spin up new workloads on their mainframe systems. They are likewise intended to get traditional mainframe shops to sustain or to increase their existing investments in mainframe hardware.

Okay, so specialty engines are affordable. You understand that, but Big Blue isn't offering its new z/OS Platform for Spark as a dedicated specialty engine, is it?

No, or rather, sort of. Guarini says the net effect is that z/OS Platform for Spark workloads are "eligible" to run in the low-cost context of the zIIP.

"This is a solution that is eligible for the specialty engine," she says. "The offering itself is available for no fee, without restriction. IBM only charges for the service and support, which is optional, [as well as for] the [additional] hardware and memory solutions ... that may be required for clients that don't have that extra capacity available already."

The word "eligible" sticks out like a sore thumb in that response, but Guarini stresses that this isn't an example of IBM being evasive or cagey.

Rather, "eligible" means that z/OS Platform for Spark workloads can run completely, in their entirely, in a zIIP engine. ("zIIP," in this context, is actually shorthand for both the zIIP and the zAAP. Lest the acronyms confuse you, zIIPs are for data-processing workloads and zAAPs are for Java workloads. IBM mainframers, understandably, are conversant with both acronyms.)

"We say this solution is 'eligible,' but that's just how we word it. The answer's basically 'Yes' to this question. This is a workload that you can deploy in a zIIP specialty engine, which means you're priced by the [processor] core, rather than by the MIPS. We've actually combined both [the zIIP and the zAAP] into the 'zIIP' solution right now," Guarini says.

Mainframe shops will probably parse the finer points of Guarini's use of the conditional "can." Sure, an organization can deploy Spark in a zIIP. It's possible to do so. In practice, however, is this feasible for all possible workloads at all times for all possible use cases? This is overthinking the issue, however.

Traditionally, IBM has been more inclusive than not in accommodating workloads in its specialty processor engines. Shortly after it announced the zIIP, for example, CA and other vendors retrofitted traditional (IDMS, Datacom/DB) and non-traditional (e.g., its NetMaster Network Management for TCP/IP and its Unicenter CA-SYSVIEW Realtime Performance Management) products to run in the context of the zIIP.

IBM was, in so many words, cool with this. Other mainframe-centric ISVs (such as BMC Software Corp.) quickly followed suit. These products are zIIP-ified to this day. Big Blue's intent is to make it as cost-effective as possible to run new workloads on Big Iron. This isn't a philanthropic so much as a strategic priority. Guarini makes this explicit: "IBM expects the majority of clients to deploy z/OS Spark on zIIP processors given the significant cost advantage in doing so."

The z/OS Platform for Spark's ultimate raison d'etre is flexibility, Guarini argues.

"Spark has gained so much interest in the industry in just the last year. Clients can choose to deploy [Spark] in whatever platform makes sense in their environment. If clients have huge sources of data on the mainframe, [the z/OS Platform for Spark) makes a lot of sense," Guarini concludes. "What that does is prevent clients from having to move that data off the platform."

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.