Metanautix: One Analytics Engine to Rule Them All
Start-up Metanautix claims to automate data prep, consolidate query processing, and function as a single analytics engine for all query needs.
- By Stephen Swoyer
- February 17, 2015
A new crop of analytics query tools proposes to accelerate the hardest part of "doing" analytics -- namely, the work of selecting, preparing, and moving data sets to and from diverse systems -- by consolidating data prep and analytics query in a single, central engine.
Start-up Metanautix Inc. is one such example. It aims to automate data prep, consolidate query processing, and function as a single analytics engine for all consumers, requestors, or initiators.
Metanautix has an impressive pedigree. Its founder is Google Inc. veteran Theo Vassilakis, who was part of the team that built "Dremel," the distributed query technology at the heart of Google's BigQuery technology. (BigQuery itself is the basis for the open source Apache "Drill" project.) Vassilakis recruited fellow technology luminaries, including co-founder Toli Lerios, Metanautix's CTO, who was a senior software engineer with Facebook, and chief privacy officer Jim Adler, a respected expert on information security and privacy issues. (Adler is a member of the Department of Homeland Security's "Data Privacy and Integrity" advisory committee.) There's a lot of known-good brainpower, and it shows.
To paraphrase concert promoter Bill Graham -- who once said something similar about the Grateful Dead -- Metanautix isn't necessarily the best at what it does: it's the only company that does quite what it does. Alpine Data Labs Inc., for example, does something at least somewhat similar, at least insofar as it focuses on analytics and automates key aspects of data preparation. Start-up Bright Vine Inc. likewise shares similarities with Metanautix in that both companies enable the equivalent of a distributed query capability. However, Alpine Data Labs focuses on a different market -- that of Hadoop-based analytics, with an emphasis on collaboration -- and Bright Vine is tackling a distinctly different, more specialized problem.
In other words, no one does quite what Metanautix does, which is what, exactly?
From a business intelligence (BI) and data warehousing (DW) perspective, Metanautix Quest fulfills three core functions. First, like Google's Dremel, it's a kind of distributed query engine. Second, it's a data preparation and movement engine, much like a dedicated ETL tool. Third, it's a high-performance analytics processing platform.
In the first scenario, a hypothetical Tableau or MicroStrategy analyst might use Quest to query against multiple (distributed) data sources; in the second and third scenarios, that same analyst might want to analyze a blended or joined data set, in which case she'd use Tableau to query against Quest itself. As far as analysts are concerned, all they're doing is interactively exploring data sets in Tableau; they don't know that something called "Quest" is anywhere in the mix.
In the background, of course, Quest-as-data-compute-engine does its thing -- automating the extraction, movement, and loading of relevant data from the remote systems and into Quest. "Our goal is not to say 'Hey, you have to always leave the data where it is.' ... Sometimes you have to move the data, sometimes you can't move the data. You should be able to start querying right away and if later you want to move the data, we make it easy [to do that]," says Vassilakis.
"[With] a lot of systems, if you want to query the data, you have to move it into me. Because of the way we work, we're saying you should never move the data unless you want to. The point of a data compute engine is that -- honestly -- networks have become much better, CPU has gotten much better, and computation is going to matter much more than storage."
Metanautix stands out for yet another reason: it's SQL Loud and SQL Proud. Even as most upstart players opt to deprecate SQL -- in their messaging, at least -- in favor of procedural languages (for example, Java, Python, and Scala) or other, vendor-specific languages, Metanautix champions SQL as an extensible language for machine learning and text analytics, as well as for other non-traditional use cases. The upshot, says Adler, is that people can do a lot more with SQL than they might otherwise have realized.
"We've been doing a lot of stuff with SQL, most of which people hadn't even thought was possible. We've done k-means in SQL so that people can do clustering. We've done machine learning and text analytics in SQL. It's a very extensible language," Adler told BI This Week at O'Reilly Inc.'s Strata+ Hadoop World conference in New York.
"A lot of people think of those algorithms as being very complicated, but the k-means algorithm is like eight SQL queries. One of the reasons we love SQL is that most people don't think of it as a programming language; it has this kind of in-between status," Adler continued.
"Most people get intimidated by [procedural] code, but because analysts and even many business people are familiar with SQL, they don't think of it as code. They might not get it immediately, but because [SQL is] declarative, it's intelligible to them, so when we show them eight SQL queries, they're like, 'Sure, I think I can understand that.'"
Quest performs some of the same tasks as a DBMS, but Metanautix positions it as what Vassilakis calls a "data compute engine" -- and not a database engine.
The difference is anything but terminological, he argues: "A database engine wants to store your data [locally] and then compute it. A data compute engine says you're storing [data] somewhere, you should be able to access it wherever it is without moving it [across the network] and without consolidating all of it in one place [as with a data warehouse]."
Metanautix isn't a SQL-only play. It can query against -- or, more precisely, exchange data with and/or analyze data from -- non-ODBC sources including REST applications, flat files, CSV files, and spreadsheets) or even, multi-structured file types, such as documents, images, and videos. In most cases, you'd use procedural code -- R statistical or data mining code or (a technology that's seeing much greater use) Python code or libraries -- to analyze non-traditional or multi-structured file types.
"If you use a UDF [user-defined function], you can basically use the SQL function name to invoke the other piece[s] of code -- Java or Python [code], then pull together R code or even a little piece of some other library," Vassilakis explains. "For example, you can use SQL to [express Kmeans, but if you already have KMeans and you're using it already in the organization, and it's implemented in Python or Java or whatever, you can embed that inside the system."
He says Metanautix can run in different contexts, e.g., on dedicated (on-premises) hardware, on virtualized (on-premises) hardware, in (virtualized, external) cloud contexts, and even on laptops. Metanautix also offers Docker container support, says Metanautix. ("Docker" is an open source software container that's much like a micro-virtual machine. It's a way to virtualize the libraries and other resources that an application needs in a single container without the need for a guest virtual machine operating system.)
"Docker is popular with some [customers], yes. We've made it a priority to offer flexibility [with respect to] deployment," Vassilakis confirms. "A lot of companies look at their deployment model as being their business model. They say, 'If you want to do business with us, you must put the data in the cloud or put it in our [on-premises] system, or something else.' The technology investment in being able to run the things in lots of different places is about making the business work better for the customer so that they don't have to adapt to us, we adapt to them."