Analysis: A Closer Look at Teradata Database 15 and Query Grid
The centerpiece of recent Teradata's recent announcements, QueryGrid, will likely take considerable time to flesh out and realize in practice.
- By Stephen Swoyer
- June 3, 2014
Teradata also announced QueryGrid -- a kind of federated query capability that will ultimately knit together its Teradata Database, Aster Discovery systems, and Hadoop appliances -- and a new, brawnier Teradata Active EDW 6750 data warehouse appliance.
Query Grid won't be available until Q3 of this year, according to the company.
In-Database JSON, and QueryGrid: Federated Query, Big Data-style
Teradata first touted in-database support for JSON files late last year. At TDWI's World Conference in February, Teradata veteran Dan Graham talked about how Teradata 15 will be able to ingest JSON objects en bloc -- i.e., as native JSON objects, or shredded, via a name-value-pair function.
The QueryGrid technology Teradata outlined is ambitious in scope. It is also largely incomplete -- and probably will be for the foreseeable future. The concept, as outlined by Teradata's Imad Birouty, sounds like your standard-issue pipedream: a user will be able to use QueryGrid to kick off a multi-stage analysis (involving, for example, a graph function in Teradata Aster and MapReduce processing in Hadoop) with just a few points and clicks (or swipes and taps).
To that end, Birouty positioned QueryGrid as a "direct replacement" for two of Teradata's existing standalone connectors: SQL-H (technology that it inherited from Aster) and its Unity Source Link connector for Oracle.
"[QueryGrid] takes them to the next step with more functionality, more intelligence, and then a path to develop several more, a long list of additional connectors," he told industry analysts in a briefing in April. "Some of our guiding principles here [are that] we want to make sure that for the user, all of the processing is integrated ... certainly within the Unified Data Architecture [UDA], but also going out [of] the UDA, reaching out to different systems," he continued, adding: "We want to be able to run the right analytic on the right platform."
It's a neat idea, but -- as with any neat idea -- it's going to take some time (and a lot of work) to actually implement. At launch, QueryGrid will officially support bi-directional connectivity between Hadoop and Teradata Warehouse and between Oracle platforms and the Teradata Database. Teradata sources told BI This Week that QueryGrid will unofficially support bi-directional connectivity between Hadoop and Aster. (It's possible, it works, but Teradata isn't officially supporting it, this representative told us.)
Previously, SQL-H and Unity Source Link had supported unidirectional connectivity only. In the case of SQL-H, this meant that an Aster user couldn't actually push query processing down to Hive, the SQL-like interpreter for Hadoop. Instead, SQL-H was a means to extract data from Hadoop and process it -- using in-database MapReduce and other tools -- on Aster. That's still an option, at least for smaller, more portable data sets. In fact, Teradata and other vendors -- such as Actian -- like to claim that their in-database implementations of MapReduce are significantly faster than Hadoop MapReduce.
Teradata's Unity Source Link connector for Oracle likewise couldn't push processing down into the Oracle database; now it does. In this regard, what Teradata is doing with QueryGrid is consistent with the practical physics of big data: because of the size of big data sets, it's important to minimize data movement. This means pushing processing out to data, not vice-versa.
"Another principle ... is the idea of being efficient with push-down processing ... [that is,] processing the data where it resides. Minimize data movement. If I have a terabyte of data sitting in Hadoop, don't go pulling that entire terabyte and bring it over to Teradata and process it there," Birouty commented. "Process [the data] where it resides and minimize the duplication associated with it, [and] do it in a bidirectional fashion."
QueryGrid connectivity isn't quite as automated as an analyst might want it to be, however.
Consider that pipedream scenario in which QueryGrid kicks off a graph function in Aster, simultaneously schedules a MapReduce job in Hadoop, waits for both jobs to complete, schedules another MapReduce job (in Aster or Hadoop) to consolidate the results of both analyses, and pushes the results of that job back to Teradata Warehouse, where it becomes grist for additional analysis. Under the covers, someone still has to code the rules and the logic for all of that.
The idea is that once someone (a business analyst, for example) figures out a useful function, BI developers can use Teradata's QueryGrid technology to easily put it into production.
Birouty himself outlined an example in which an analyst wants to run a graph function on a table in Teradata; today, IT -- BI developers, ETL developers, DBAs -- must intervene to make this happen. At some point, Birouty said, QueryGrid will be able to automatically push that processing down into Aster, which has built-in graph functions. (Given what Birouty has said about data movement and the size of data sets, the table in question had better be a small one, however.)
This isn't to pick on Teradata. The problems it's trying to solve and the kinds of scenarios it's trying to address are hard. Teradata has a road map for beefing up QueryGrid's connectivity story -- although, at this point, it hasn't made it public. Birouty suggested that Teradata would focus "asynchronously" -- as it had with SQL-H and Teradata Unity Source Link -- on delivering QueryGrid connectivity between Teradata and other, non-Teradata platforms.
Brawnier, More Memory-able Active EDW Systems, Too
Elsewhere, Teradata's Active EDW 6750 MPP appliances ship with many capacity increases. Teradata benefits from Intel Corp.'s own innovation -- each Active EDW 6750 system now ships with dual 12-core variants of Intel's 64-bit Xeons, for a total of 24 cores per node -- as well as from improving economies of scale in both the solid state drive (SSD) and memory markets.
The new EDW 6750 packs in 2.5 times more SSD capacity -- 40 SSDs per node, as distinct from 16 -- and eight times as much memory capacity. In fact, each Active EDW 6750 node can be stuffed with 512 GB of memory. (A single cabinet can house up to three active Active EDW 6750 nodes and one standby node. That's 1.5 TB of memory per active node, with 512 GB available as standby.)
Betsy Huntingdon, product marketing manager for Teradata's data warehouse appliance systems, spins this as a vindication of Teradata's "Intelligent Memory" strategy, which -- instead of loading all database data and indexes into physical RAM -- tries to make intelligent decisions about which data to load into memory, how (or at what levels) it caches data, and how it performs certain kinds of operations.
Citing Teradata's own customer research, she argued that Intelligent Memory isn't constrained by physical RAM limitations and can boost performance for most workloads: "Most of our customers' I/Os are coming from data that's already stored in memory, [but with Intelligent Memory], you're not paying for a full in-memory device like a HANA, [and] you're still getting the performance of in-memory. You're still throwing your colder stuff that you're not using frequently on[to] more cost-effective hard drives, but all of your work is being done in memory."