WebAction Unleashes Big Striim
WebAction is now known as "Striim." Why the name change? The streaming space has become crowded, and Striim itself aims at Something More -- streaming analytics and streaming integration.
- By Stephen Swoyer
- November 10, 2015
Like virtually every other data-oriented player today, streaming analytics specialist WebAction Inc. was in attendance at last month's Strata + Hadoop World conference in New York, New York. Just one week prior, at the TDWI Conference in San Diego, WebAction officially unveiled a new product name -- "Striim" -- as well as new product marketing collateral.
Co-founder and CTO Steve Wilkens hastened to reassure attendees that Striim-the-product was more or less the same as WebAction-the-product. The same great taste -- i.e., all of the same bits -- the same pricing and licensing arrangements, the same streaming-analytic logic. The works.
Why the name change? "We wanted to give our platform a brand, something that was much more in keeping with what the platform does, so it is very obvious to people that we are a streaming platform so we changed the name of the product from WebAction to Striim. We went with Striim instead of 'Stream' because we want the 'i's to actually mean 'integration and intelligence,'" said Wilkes, who stressed that the name of WebAction-the-company wasn't changing.
Why the need for a costly and momentum-sapping rebranding in the first place? It's because the streaming space has become extremely crowded, Wilkes asserted -- and because WebAction's new strategy with Striim departs from its traditional emphasis on streaming analytics.
"We've been talking about a streaming analytics platform for the last three years, but in talking with our customers, we really realized that the streaming integration part was very difficult to do. In fact, we really weren't doing it. To do the streaming analytics, you need both the streaming integration and the streaming intelligence aspects. A lot of companies just sort of punt on that," he explained.
"There's a lot of confusion [in the marketplace], too, about [streaming analytics and integration]. There's this idea that you can just put your data on Kafka. We have lots of different connectors that can obtain streaming data [in any context]. For Kafka, or for a message queue in general, this data is naturally streaming, but if you look at databases, a lot of the streaming analytics solutions don't even consider database change as a source of streaming data."
Wilkes is a veteran of the former GoldenGate Software, which Oracle Corp. acquired in 2009. GoldenGate's specialty was data replication -- in the sense of both data mirroring and change-data-capture (CDC). CDC is an efficient means of replicating or backing up database changes. Instead of mirroring the contents of a database in toto, CDC technologies replicate only the changes or "deltas." (CDC usually works by monitoring database transaction logs.) By feeding a continuous flow of bits (i.e., database deltas) in real time, a CDC technology such as GoldenGate functions as a streaming source. Striim, Wilkes says, can consume and analyze CDC feeds from an OLTP data store in real time. It's especially optimized for GoldenGate's CDC technology.
This isn't surprising. Several other members of WebAction's brain trust, including co-founders Ali Kutay and Sami Akbay, are also GoldenGate veterans. "Because we had that [GoldenGate] history, we understood if you really want to understand what's happening in your enterprise, you need to have that insight into what's going on in these [OLTP] databases. By incorporating change-data-capture as part of the [Striim] platform, we do just that. We can do change-data-capture from Oracle, SQL Server, DB2, and other databases. We can also read GoldenGate 'trails,' so if you already have [GoldenGate] in place, we can piggyback on that and stream that into our platform," he said.
The case of CDC demonstrates another Inconvenient Truth, Wilkes argued: not all potential sources of real-time data are exposed as streams. Some, like database transactions, need a little help. Another good example is machine log-files, which are typically processed in a near-real-time micro-batch paradigm. Striim can perform the equivalent of nearer-real-time, almost-live, streaming and analysis, he claimed. "One way people do this is they'll scoop up log files, dump them into Hadoop, and do analysis on them there, but we have readers that will sit on the end of the log file, wait for the records to be written, then stream [those changes] directly in [to Striim] live. That's the basis of the integration -- the connectors to get to the streaming data," Wilkes said.
"While the data is moving, streaming through our platform, you can apply lots of operators to it. You can work with our platform through a SQL-like language ... [which] lets you do a lot of things you normally do with SQL, but [in the form of] continuous queries. As the data is flowing through your apps, you can filter, aggregate, and transform data, even enrich it," he continued.
On the other hand, there's no shortage of streaming-analytics products. In the open source software (OSS) space alone, there are at least a dozen streaming-oriented projects. These include Apache Kafka, an event-messaging system used for streaming ingest; Apache Storm, an engine that can perform stream processing and analysis; Apache Spark Streaming, the streaming library of the Apache Spark distributed computing framework; Apache Apex, a new project that aims to comprise a unified platform for batch and stream processing; and Apache Flink, a project that's similar in scope to Apache Apex. Most of these projects are focused on or by the Hadoop platform, although (as with, for example, Spark Streaming) Hadoop isn't in any sense a prerequisite for their use.
They exploit Hadoop because it's a convenient and, most important, a relatively inexpensive platform in which to store and process data. One other thing: most of these projects are themselves "free." Their source code can be freely downloaded from the Web or check[ed] out via Git and other repositories. Doesn't this depress the potential market for a commercial vendor such as Striim?
Not at all, says Wilkes, who claims that Hadoop is, contextually speaking, a poor platform for streaming analytics. "Dumping data into Hadoop makes it very difficult to query. People seem to have forgotten the lesson as to why data warehouses were built ... and why ETL was so big: you don't query off of your OLTP database. It's designed for transaction processing, not for ad hoc query. Data warehouses came about (1) so you could [query against OLTP data] in a reasonable time frame, and (2) so you'd have the history [against which] to compare this [OLTP] data," he says.
"With Hadoop, some [organizations are] just dumping raw data into Hadoop and then they're trying to run queries against it Without history of any kind. Some of these [queries] can actually be incredibly difficult to do in Hadoop. Say you're querying against Weblog data with Customer_ID in it. If you want to write a query against this data that says 'Show me the activity on my top 100 customers based on their history, what they've done in the past,' it's impossible to do that against just Weblog data. You need to enrich that [Weblog] data with data from the warehouse, or from other sources. In Striim, we can do this as the data is flowing through. This makes that a lot easier."
This, Wilkes maintains, is the value-add of a commercial platform such as Striim.
"You have the ability to do things such as correlate multiple data streams in real time, correlate data in [overlay] windows in real time, and look for patterns of events. There are many other features -- such as predictive analytics and machine learning -- and they're all [expressible] with the same SQL-like language. If you can write a filter, you can write these more complex things, without having to write Java code, package up a JAR file, and deploy that into a cluster."