Teradata Gets Unorthodox
At TDWI's recent Executive Summit in Las Vegas, Nevada, Teradata veteran Dan Graham got downright heretical, at least from the perspective of data management orthodoxy. The problem, Graham argued, is that DBAs tend to be a little bit too conservative, too orthodox. That must and will change, he says.
- By Stephen Swoyer
- April 8, 2014
At TDWI's recent Executive Summit in Las Vegas, Nevada, Teradata's Dan Graham got downright heretical, at least from the perspective of data management (DM) orthodoxy.
Teradata is ingesting JSON objects -- en bloc, as native JSON objects, or shredded, via a name-value-pair function -- and storing them in the Teradata Warehouse. "I'm talking about JSON objects going straight into the data warehouse," Graham reiterated in a follow-up interview.
Teradata will support JSONs in two ways, Graham explained: (1) shredding a JSON object -- via a name-value-pair function -- into pre-defined rows or columns or (2) landing it into a BLOB column and storing it as variable character text (varchar). In the latter case, this means loading unstructured data into the DW, Graham said: "Drop it into the column and you have unstructured data in the data warehouse again, [be it] XML, JSON, or Weblog [data]."
In some quarters, what Graham's describing -- i.e., putting multi-structured information (because JSON objects have semantic structure) in the data warehouse -- is tantamount to heresy. At the very least, it's unorthodox. In a sense, however, "orthodoxy" is something of a double-edged sword. The same DM orthodoxy that prescribes a very specific role for the data warehouse -- viz., as a controlled, managed repository for structured, consistent data -- also prescribes a very specific role for NoSQL: namely, as a landing zone or staging area for multi-structured data.
NoSQL advocates have taken up arms and successfully militated against this role, however: Hadoop is a textbook example of this -- over the last 36 months, its DM feature set has grown by leaps and bounds -- but proprietary NoSQL platforms, such as MarkLogic, are no less militant. (MarkLogic offers support for what it calls "SQL views." In its most ambitious marketing efforts, MarkLogic even pitches its NoSQL platform for traditional decision support-like workloads.)
"With the three kinds of unstructured data [that Teradata supports] in the data warehouse, this notion that structured data goes over here and unstructured goes over there -- that makes no sense anymore," Graham maintained. "And this isn't [to say] that we're supporting just these three kinds [of multi-structured data-types]. If there's real demand from our customers, we'll put in another kind."
Teradata Built It, But Will Customers Use It?
So will Teradata customers actually opt to store JSONs -- in whole or in part -- in their Teradata Warehouse systems? To put it another way, will Teradata customers buck up and shrug off DM orthodoxy -- and years of Teradata messaging -- and opt to store JSONs (in whole or in part) in their Teradata Warehouse systems?
Graham believes so. The problem, he concedes, is that DBAs -- in the Teradata world and elsewhere -- tend to be a little bit too conservative. You know: orthodox.
That must and will change, he argues.
"This old notion of if you want to put in a new field and it's going to take you three months to get it through the governance committee, and it's going to take a month before you can get it operating in the system, and then it's going to take another month before you can actually use it [in production] -- that's the traditional 'data priest' controlling schema changes, and that doesn't work," he said.
"The question here is when do you want agility and when do you want more governance? Depending on what you need, you can choose to shred [JSON objects] or [to store them as] unshred[ded]. We can shred the JSON when it comes in into rows or columns or we can do a partial shred, which is what we recommend because you can hang onto this agility.
From a data management perspective, this is a low- or no-impact change, Graham argues: "It's going to flow straight through the ETL process without any changes to the [ETL] scripts or the schema. It can be done so easily that the DBA doesn't even know it's happening. In 24 hours, the new data is in the system and all that the sysadmin has to do is make sure that the view will allow the business intelligence tool or the Tableau tool to access it. This is almost an overnight schema change."
At What Cost Agility?
Ask any of Teradata's competitors or would-be competitors, and they'll claim that Teradata is priced at a premium relative to their own offerings. On a price-per-TB-of-capacity basis, they seem to have a point: notwithstanding Teradata's efforts to push down its per-TB pricing for low-end systems such as its Teradata Data Mart Appliances -- which aren't massively parallel processing (MPP) engines -- Teradata almost always compares poorly (again, on a per-TB basis) with competitive offerings. (Teradata's recent Extreme Data Appliance 1700, claims to hit a price point of $2,000-per-TB, but it's positioned as an analytic discovery or deep analytics platform, not as an enterprise data warehouse.)
What's more, competitors will tell you that out of all of Teradata's offerings -- i.e., Teradata Warehouse, the Teradata Aster Discovery platform, and the Teradata appliance for Hadoop -- Teradata Warehouse is by far the priciest option, at least on a price-per-TB basis.
This seems to beg a question, then: why would a Teradata customer want to store JSON data in Teradata Warehouse, particularly if Teradata itself offers more affordable alternatives -- such as its Teradata-branded Hadoop appliance? Graham, for his part, rejects this as a strawman. For one thing, he maintains, comparing Teradata with other platforms on a price-per-TB basis is misleading. On a price-performance basis, Graham argues, Teradata Warehouse is much cheaper than are competitive platforms for decision support or data warehousing workloads.
Industry veteran Mark Madsen, a research analyst with IT strategy consultancy Third Nature Inc., agrees. "[You've got to] watch the price-per-TB rhetoric. The key is not the price-per-TB, which [Teradata's competitors] love; the key metric is price-performance, which is very hard to measure in a comparative sense [because it] requires a [proof of concept] or a prototype," Madsen says.
"When I look at price-performance comparisons, Teradata is almost always cheaper."
He cites research he did for (full disclosure) a Teradata-commissioned whitepaper on Oracle database-to-Teradata migrations. In most cases, Madsen says, organizations that moved from Oracle to Teradata achieved an 8-10x increase in performance at about a 10 percent increase in cost.
Price-performance aside, Graham contends, Teradata didn't cook up in-database support for JSON objects on its own: it isn't a checklist or marketing item. He points to Teradata's Product Advisory Council, which consists of representatives from some of its largest customers. Support for native JSON objects was requested by these and other customers, he argues.
"This is about agility. We have to teach DBAs about agility. They have to learn that sometimes the business is more important than [data management] purity. Their job really is to serve the business. Doing [DM] to some form of perfection -- some of them have honed that system a little too well. It's like what [race car driver] Mario Andretti said: 'If you feel like things are under control, you're not going fast enough.'"