Start-up Bright Vine Touts Something (Mostly) New
A new MPP query engine called Bright Vine has what it claims is a can't-miss pitch.
- By Stephen Swoyer
- November 11, 2014
Bright Vine brings together three ParAccel veterans who are highly respected data warehousing and relational database leaders: Rick Glick, former ParAccel who played an key role in the development of that company's vaunted query optimizer; Rick Cole, whom Glick describes as the real brains behind ParAccel's vaunted query optimizer; and Dean Arnold, a distinguished technologist with expertise in very large database design and optimization. Both Glick and Arnold logged time with Teradata Corp. Glick was Teradata's CTO of database engineering, Arnold was a Teradata software engineer/consultant, and Cole is a veteran of IBM Corp. and the former Informix.
Bright Vine has brain power to spare. How will the company use that experience to "disrupt" the status quo?
The answer, Glick says, has to do with the Babel of databases and data stores: certain databases are good for certain things, as are data stores and analytic database engines. Platform heterogeneity is the rule, not the exception, with the result that most large companies have multiple relational databases, multiple non-relational data stores of multiple types, and (in some cases) multiple analytic databases. In most cases, getting the right data to the right place at the right time isn't just hard, it's insufficient because data lives on diverse systems, each of which has its own specific strength.
In the context of terabyte-scale data volumes, the paramount goal is to move as little data as possible; as a result, the work of data engineering (preparing, transforming, and conforming data) must be pushed down or out to the platforms on which data lives instead of moved to and consolidated in a middle tier. (This middle tier, commonly associated with a dedicated ETL engine, is a vestige of technological and architectural constraints that have since been obviated.) Bright Vine proposes to do just this. On the one hand, it's a query optimizer that decomposes and figures out how best to run a SQL query. On the other hand, it's a massively parallel distributed or federated query engine; it figures out where best to run a query.
"It's all about cooperative analytics, [which is] the notion is that everybody on the planet has not one database, not two, but n number of databases, and n number of analytic engines, but nobody is pre-integrating them and bringing them together in one environment," says Glick. "The goal is to bring them all together and make them work as a single system: take a request, decompose it, and [send its decomposed parts to] different databases that have different functions and do different things."
This isn't exactly a new idea. In a sense, it's what data federation proposed to do, and it's similar to what data virtualization (DV) aims to do. Moreover, Bright Vine's approach smacks of what Teradata is undertaking with QueryGrid, inasmuch as it aims to optimize for:
- connectivity to diverse sources, i.e., using optimized, bi-directional adapters
- the query performance of a given or specific platform -- e.g., if you have an IBM PureData (Netezza) analytic database, Bright Vine will optimize for that engine; and, for customers that have multiple relational, NoSQL, and analytic engines
- the idiosyncratic strengths of individual platforms
With respect to DV, says Glick, the comparison is inappropriate. Bright Vine is a massively parallel processing (MPP) optimization technology; it isn't designed as a federated query engine; still less is it an abstraction layer. Instead, it's an MPP engine with federation-like capabilities -- somewhat similar to what the former ParAccel tried to do with its On Demand Integration (ODI), scheme. "Every [data engine] will have parallel connectors between them, all of those data engines will act as peers, [and the Bright Vine technology will] put the work at the right place at the sub-query level. The idea is to take any analytic query and decompose it [so it can be optimized for] the specific capabilities of the engine that you have and the location of the data," he explains.
As for the comparative merits of Bright Vine versus Query Grid, Glick contends that there's a stark difference. "The biggest architectural difference is that they will have a Teradata system in the middle that all data moves through. With our approach, if you'd need to send data from Cassandra to a graph engine, it would move directly from Cassandra to the graph engine," Glick argues.
"With Teradata there is a hub, so the data would move from Cassandra to Teradata to the graph engine and then back to Teradata to be returned to the user. The presumption is that most of the data join and aggregation [workloads] will be done in Teradata," he continues. "The other difference is that with Teradata, the query writer basically decides where the data resides and [where the] analytics will be done. Most of the optimization is done not by Teradata but by the query writer. [These aren't] horrible assumptions, but our goal is to do a greater degree of optimization."
One rub is that Bright Vine is having trouble generating hype, Glick concedes: he and his partners are still courting venture capital (VC) backing. Bright Vine has most of the funding it needs, he stresses -- but not all. Another problem is that Bright Vine isn't necessarily the only game in town. Well-funded start-ups such as Metanautix Inc. purport to target an at least facilely similar market niche. Even though Metanautix's aim isn't as ambitious as is that of Bright Vine -- particularly with respect to query decomposition and optimization -- it could be just good enough.
Glick seems undeterred: there's a pressing need for exactly what Bright Vine aims to do, he says -- and nobody is doing exactly what Bright Vine aims to do.
"Between us, as founders, we probably have 60 to 70 years' worth of experience that we're bringing to bear, and all of us have done kind of well in our respective fields. What we're doing [with Bright Vine] is fill a need that's only going to become more acute," Glick says.