Interana Accelerates Event and Time-Series Analysis
Start-up Interana proposes to marry the best aspects of the self-service model with best-in-class event and time-series processing at scale.
- By Stephen Swoyer
- July 21, 2015
It seems as if a star -- or start-up -- is born just about every week in the Silicon Valley.
Take Interana Inc., which launched late last year at Strata + Hadoop World. Interana's is a checklist case of having the goods. Pedigree? Check. CTO Bobby Johnson logged time with Facebook, where he wrote Scribe, a log aggregation server. (Johnson was on the same internal Facebook team that helped create both Hive and Cassandra.) Co-founder Lior Abraham is another Facebook veteran. While there, he wrote "Scuba," the distributed in-memory data store that Facebook uses to manage stats data about its internal systems. In other words, Interana has an impressive pedigree.
How about funding? It has that, too. This January, for example, it nabbed $20 million in Series B venture capital funding. This was on top of the $8.2 million in Series A funding it received two years ago.
What else does a hot new SV start-up need? Killer technology. That's always a plus, and killer tech is just what Interana claims it has, according to CEO Ann Stimmler Johnson, who studied electrical engineering at Caltech and worked for Intel.
"[Our] product itself allows you to visually explore how customers behave and how products are used. You can explore and analyze the behavior of anything over time. We built it to allow for ad hoc exploratory analysis. We took a design-first approach and were very thoughtful about how you have an interface [built for] looking specifically at how things move through time," Johnson told BI This Week at Strata + Hadoop World.
Which begs the question: Interana does what, exactly? How does it differentiate itself from other visual analytic discovery tools (such as Qlik Sense, TIBCO Spotfire, or Tableau, to name just three) or from streaming-analytics offerings (such as Splunk)?
Like Splunk, Interana claims to be built for processing event or time-series data at massive scale. Its back end is anchored by a columnar data store, which it exposes to its visual front-end tool via an analytics abstraction layer. Speaking of that front end, Interana, unlike Splunk, boasts what it describes as a self-service "visual interactive" front end -- an interface that is more like a Qlik Sense or Tableau than not.
"It's really designed for people who are frustrated with having to use multiple layers of tools to answer questions," says Stimmler Johnson, who cites the efforts of Interana's head of product design, Alonzo Canada, to design an interface that promotes or abets analytic investigation, chiefly by making it easier to conduct what might be called epi-analysis in response to new questions.
"People who want to make decisions based on event or time-series data have to go through this whole stack. In most cases, the amount of time it takes from [asking] a question to [getting] the answer, by far the longest portion of that is communicating with the data person," she argues. "From writing the query to running the query, we've written the whole stack so that we can address this and other inefficiencies. [Interana is] a single stack that helps you ask and answer questions about events at any single point in time."
Isn't this why we have data warehouses -- to cut through the stack by comprising the stack?
No, says Johnson: the data warehouse isn't cut out for time-series. It isn't just a tough but a bad fit. What's more, as the success of data warehouse-less self-service BI offerings from Qlik, Tableau, and others demonstrates, the DW is more often an impediment to (rather than an accelerator of) analysis. Increasingly, business analysts and power users are tapping Qlik or Tableau to access data themselves instead of waiting for IT to bring new data sources online. Interana proposes to marry the best aspects of the self-service model with (thanks to the expertise of Bobby Johnson, Lior Abraham, and others) best-in-class event and time-series processing at scale.
"The data warehouse became big at a time when mostly what companies wanted [to report on or analyze] was relational data. Event data, time-series data, isn't at all like relational data. Time series is actually points [i.e., samples] taken at a measured period of time. This just isn't analogous to single-state [i.e., relational] data," says Stimmler Johnson. "Event data is bigger than single-state data already and it is growing faster. As storage gets cheaper and cheaper, it's going to grow even faster.
She cites an emerging example. "There's this whole industry around 'growth' hackers. These [are people who] are looking at growth, looking at retention, looking at conversion engagement so that they can see what works and why. You can use SQL data [i.e., from relational systems] to help with this [kind of analysis], but you can't do this it with SQL. All of the answers come from event data."
Clarifies Bobby Johnson: "SQL has no [built-in] notion of time series. If you try [to express time series in a SQL statement], you end up with a really complicated query, which the query optimizer can't do anything sensible with. At best, you're going to end up with a lot of intermediate steps."
The issue is as much one of volume as of data structure, Bobby Johnson continues.
"What's going on [when you do time series in a data warehouse] is that you're taking data in a reasonable format but it's very big, it's at massive scale, so you're having to crunch that down in a very application-specific way, and that's why you have all of those [intermediate] steps. What we've done is we've built something that's fast and flexible enough that you can deal with this stuff raw."