Teradata's QueryGrid Comes into Focus
Teradata debuts key deliverables in its QueryGrid initiative.
- By Stephen Swoyer
- December 16, 2014
At its annual Partners conference, Teradata Corp. announced a new network (or "influencer") analysis offering, Teradata Connection Analytics, touted a new data integration consulting services offering ("Data Integration Optimization"), and announced a new, feature-packed version of its Teradata Database platform. Among the most eagerly anticipated of new unveilings, however, was the official debut of key deliverables in Teradata's QueryGrid initiative.
With QueryGrid, Teradata aims to knit together its own ecosystem of SQL database platforms -- in addition to Teradata Database, Teradata markets its Aster Discovery Platform, which it acquired from the former Aster Data nearly three years ago -- and also to accommodate NoSQL (increasingly shorthand for "Not-only-SQL") platforms such as Hadoop and MongoDB. Teradata already markets a dedicated, Teradata-branded Hadoop appliance of its own, which can be configured with commercial Hadoop distributions from vendors such as Cloudera Inc., Hortonworks Inc., and MapR Technology Inc. Teradata also announced a partnership with MongoDB earlier this year.
What Teradata's proposing to do with QueryGrid sounds straightforward enough. Under the covers, however, it's comparatively complicated. In time, QueryGrid will support native, bi-directional connectivity -- for both query and data movement/integration -- between and among Teradata Database and Teradata's Aster Discovery Platform to sources such as DB2, Oracle, and SQL Server, among others. That's hard.
In addition, QueryGrid will be able to decompose queries in order to figure out how and where best to run them. That's much harder. If you embed a graph function as part of a SQL statement, for example, Teradata Database wouldn't know what to do with it. QueryGrid, however, will be smart enough to redirect that query over to Aster. That not all. Ultimately, QueryGrid will also do something much, much harder. A graph function can't run in Teradata Database, but some workloads -- or some components of a SQL statement -- will perform better in Teradata Database than in Aster or Hadoop. QueryGrid will be able also to optimize for the idiosyncratic strengths of different (Teradata) platforms in order to maximize parallelism.
The key words in that paragraph are will and will be. At this point, to speak of QueryGrid is to speak in terms of possibilities. It almost has to be, given Teradata's ambitiousness. Last month, for example, Teradata announced QueryGrid support for Teradata Database-to-Teradata Database and Teradata Database-to-Aster connectivity. (Sources say that Aster-to-Teradata Database support is basically working, albeit not officially supported.) Earlier this year, however, Teradata also trumpeted upcoming QueryGrid connectors designed to replace and/or supplant a pair of existing, standalone connectors, namely, SQL-H (which it inherited from Aster) and its Unity Source Link connector for Oracle.
As designed, Teradata's legacy connectors focus on moving data out of Hadoop and Oracle, respectively, and into Teradata. Tto that end, they support unidirectional connectivity only. In other words, if a business analyst wants to analyze data sitting in an Oracle database, one primary option is to use Unity Source Link to move that data en bloc to Teradata Warehouse so that it could be prepared -- i.e., reduced and possibly conformed -- for analysis.
With bi-directional connectivity, it's now possible to push that ETL processing down to Oracle such that you're only moving a relevant subset of data. (Teradata delivered QueryGrid support for Oracle in early October, coincidental with Oracle OpenWorld.) Teradata officials promise that there are more goodies still to come. For example, Birouty concedes, bi-directional Aster-to-Hadoop connectivity -- i.e., the QueryGrid-optimized replacement for Teradata's SQL-H connector -- "is still a road map item."
"In the future, we're looking at [QueryGrid support for] DB2 and SQL Server, and we're also working on a connector for MongoDB, which should be available in the first quarter [of 2015]," says Imad Birouty, director of technical product marketing with Teradata. Birouty points to MongoDB's prominence -- it's arguably almost as well known as Hadoop, and [in 2013] it famously received $150 million in fresh venture-capital funding -- as the reason Teradata is fast-tracking it.
"When it comes to the NoSQL databases, they're kind of the top of that [class], and there isn't a lot of overlap [between Teradata and] Mongo. They're leaders in their space for NoSQL, for operational NoSQL; we're leaders in our space for analytics. We said, 'We make a good team, let's work together.' The [QueryGrid] connector will let our systems work together, so you'll be able to push and pull data between Teradata and MongoDB. For them, it's great because [their customers] can do some great deep analytics in Teradata and then push that back into MongoDB."
A Patchwork Quilt
There's reason to believe QueryGrid will mature quickly. Thus far this year, Teradata has made several data integration-themed acquisitions, picking up start-ups Revelytix Inc. and Hadapt Inc., along with ThinkBig, a big data-focused services firm. While none of these technologies (or services components) is likely to be productized in the form of new QueryGrid-themed deliverables, they'll likely contribute to the maturation and feature set of QueryGrid itself.
Take Revelytix, which markets "Loom," a product name that Teradata has decided to bring forward. Loom provides lineage tracking and metadata management capabilities for data stored in the Hadoop Distributed File System (HDFS).
"[Revelytix's] goal was ... to demonstrate that the data lake does have value. If you put your data in[to a Hadoop data lake] from your operational systems, you can access it [and prepare it] for multiple uses. That's good, but if you can't keep track ... [of] lineage, if you can't keep [track of] metadata, if you can't keep statistics about the data, then really you're just building something that's going to be a support nightmare later on.
"That's the challenge," Birouty explains. "So Revelytix is going to capture the technical and non-technical metadata, they're going to use the Hive metastore and extend that, but they're also going to collect statistics, interrogate the data, profile the data, keep statistics about the data, and so on. Today, there isn't really anyway to do this in Hadoop."
Hadapt made a splash at last year's Strata + Hadoop World 2013 conference, touting an ANSI-SQL-on-Hadoop query facility. Its momentum seemed to stall in 2014 -- until Teradata snapped it up this summer, that is. "They had expertise and they built some product, but they never really commercialized it for SQL-on-Hadoop. We thought, 'We're not really sure what the future of SQL-on-Hadoop is, but maybe there's something there [in Hadapt]. They have a lot of smart people, they have expertise in SQL and expertise in Hadoop, which we want, so we brought them in-house," says Birouty, who declined to discuss Teradata's product plans for the Hadapt technology.
"The work that we're doing with them is still under [non-disclosure agreement]. We look at them and say, 'Do we want to do something with SQL-on-Hadoop or do we want to [use the Hadapt technology to] build some kind of virtualization layer?'"
When pressed, Birouty declined to expand on this statement. Teradata could use Hadapt's ANSI-SQL-on-Hadoop technology as an all-purpose connector into Hive, Spark SQL, and other SQL-on-Hadoop implementations. Whether or not -- or how -- this shows up in QueryGrid is anybody's guess.
"We're making a patch-work quilt [with QueryGrid], and we're adding patches to the quilt. It's not easy, and it's going to take time, but look at what we've accomplished already," he concludes.