Analysis: Is OLTP and DW on the Same Platform Such a Good Idea?
Is the notion of a combined platform for OLTP and data warehousing an idea whose time has come?
- By Stephen Swoyer
- December 9, 2014
According to database giants IBM Corp. and Oracle Corp., the information management platform of the future must be able to support both OLTP and analytic workloads in the same system.
Conveniently enough, both IBM and Oracle market combined (a) OLTP databases and (b) data warehouse systems -- IBM's new DB2 "Cancun" release and Oracle's Exadata/Exalytics -- that address the OLTP/analytic use case in a single platform. Nancy Kopp-Hensley, director of strategy and marketing for IBM's database systems, says IBM's new DB2 version 10.5.0.4 release includes a feature called Shadow Table that exploits DB2's "BLU" columnar in-memory feature to enable simultaneous querying against OLTP data. "You now can do real-time reporting against your transactional data without impacting your transactional importance," Kopp-Hensley asserts.
It's all about getting "closer to the actuality" of what's happening in a business or (non-profit) organization, she says.
"Back in the day, we used to build ODSs [operational data stores], which were usually very targeted to business needs. We did this when we needed real-time analytics around what was happening now in the business -- [that is] with logistics, supplies, sales, and so on," she says. "The point [with an ODS] is that we got this [information] much faster than we did with the data warehouse, but [even with an ODS] we would still have to extract the data from transactional systems and move it over to a system where we could run the analytics."
IBM and Oracle aren't alone. SAP AG -- which markets both discrete OLTP and analytic database platforms (in Sybase Adaptive Server Enterprise and Sybase IQ) and an all-in-one OLTP and analytic platform (in the case of HANA) – likewise touts the advantages of a combined platform for OLTP and data warehouse (DW) workloads. What's more, SAP, like Oracle, has an additional interest in doing so: namely, a huge investment in the operational systems -- what used to be called "enterprise resource planning" (ERP) applications -- that function as upstream OLTP sources for the bulk of BI and DW reporting and analytic activities.
SAP doesn't see it quite this way. According to Tom Traubitz, senior director of product marketing and analytics strategy with SAP, IBM and Oracle actually offer the equivalent of "weak tea:" both have effectively retrofitted "existing row-based, transactional" DBMS platforms by adding columnar indexing and "partial" in-memory technologies to address the combined OLTP/data warehouse (DW) use case.
In fact, SAP markets a built-from-scratch DBMS -- its ubiquitous HANA platform -- that it says addresses the same use case. "Performance is improved by adding these technologies, but not so much that [these vendors] are not primarily recommending separate OLTP and Analytics," Traubitz argues. "Moreover, the improvement they are gaining is not nearly as much as a system like HANA, which is built from the ground up to leverage core technologies like in-memory, columnar, and others."
The Ayes Have It?
There's by no means a consensus in this regard, however. A vendor such as Teradata Corp., which focuses primarily on data warehousing and analytics, rejects this claim. The centerpiece of Teradata's Unified Data Architecture (UDA) looks much like the centerpiece of a traditional Teradata environment: viz., the data warehouse, be it a "logical data warehouse" (in which a logical representation of an enterprise data warehouse is stitched together from disparate data sources), an extended data warehouse (eDW) as TDWI's Philip Russom puts it, or some other DW-like system-of-record.
"The enterprise data warehouse provides [this] central repository for all of your data. If that isn't your goal, you're going to end up with tons and tons of data marts," Teradata's Imad Birouty told BI This Week last year. "[A]t some point, you're going to have the desire to go toward an EDW. Everything that really belongs in a relational data warehouse should go into the EDW."
The EDW, in Teradata's terms, should be physically instantiated as a separate system from the upstream OLTP operational systems that feed it. Even if the technological basis for this separation is no longer (as much of) a critical limiting factor, other considerations -- such as best practices for good governance -- support a clear separation, Teradata officials have said.
They're not alone. When BI This Week asked Actian CTO Mike Hoskins about the promise or viability of an all-in-one platform for OLTP and DW, he was skeptical. Actian markets both OLTP (Ingres) and analytic database engines (the former Vectorwise and ParAccel databases, now rechristened Vector and Matrix respectively), but Hoskins ruled out a combined pitch.
"We're very happy with our core businesses. They are tremendous producers of customer satisfaction. The sun has set on some technologies. I mean OLTP," Hoskins told BI This Week last year, after Actian's acquisition of ParAccel.
Oracle's Exa- platform aims at a more explicit combination of OLTP and data warehouse workloads: Oracle really is shopping its Exa- systems, all of which run its Oracle 12c database, as a combined one-stop shop for most (if not all) information use cases. Kopp-Hensley insists that IBM's approach with DB2 Shadow Tables is more pragmatic. Shadow Tables isn't applicable for all applications or use cases. Instead, it's an application- or use case-specific alternative. For a few applications or use cases, it makes sense. Customers can take advantage of it if or when they determine it addresses their needs.
"Nobody ever really built the data warehouse to be 100-percent real time. You can get to 100-percent real time, but [as a function of cost] this is just out of reach for most customers. Most of them are doing basic reporting and analytics, and it's still not real time. That's fine for the majority of what they want to do. What you do want to do is identify those specific business needs [that can benefit from real time] and leverage the [Shadow Tables] capabilities for those."
DB2's Shadow Tables feature effectively moves fresh transactional data into BLU, DB2's columnar in-memory facility. SAP takes a somewhat different approach with HANA. Because of HANA's in-memory design and its native/optimized connectivity to SAP's Business Suite, it's possible to move transactional data directly from SAP operational applications and into memory, Traubitz argues. This eliminates the need for the extensive preparation that introduces so much latency in the traditional DW model, he claims.
For example, Traubitz points out, ETL processes can consist of multiple jobs -- each of which entails its own sequencing and dependency issues -- between systems, which adds latency. ETL also entails computationally intensive preparation (via joins and other transformations) so that data can be loaded, staged, and physically instantiated in a data warehouse system.)
"We can dispense with aggregates, star schemas, lots of ETL processes between transaction and analytics, etc. because we have that two-to-three orders of magnitude improvement in performance to buy us the changes in the way people work the database system," he explains.
Industry expert Mark Madsen, veteran data warehouse architect, recognized data warehouse scaling specialist, and author of an early work on streaming data warehouses (Clickstream Data Warehousing), isn't buying it.
As a special-purpose or fit-for-purpose alternative -- such as that which IBM outlines with its DB2 Shadow Tables -- a combined OLTP and DW platform could be advantageous; as a general-purpose prescription for a Way Forward, Madsen grouses, it's bound to reprise the mistakes of the past.
"Simultaneous OLTP and analytics in the same system makes me think [the companies espousing it] are ignorant of the realities of why one does a data warehouse in the first place," he says. Besides, he asks, didn't we go down this road once before with ERP?
As with ERP, the new model proposes to consolidate multiple OLTP databases -- e.g., order-entry, shipping, logistics, finance, etc. -- on a single system, which can also double as an integrated source of record for analytics. "Remember, ERP 'solved' this very problem of integration [by using] one database instead of ten [databases] for ten [different] functions. Now it's one big ERP database with 6,000 tables [i.e., in the case of SAP] -- and we still need a data warehouse, not just because of performance for query, which is bad enough, but for other reasons."
If ERP aimed to be a silver bullet -- i.e., a fix for the multiple, disintegrated, dys-interoperable systems that used to (and which still to some extent do) plague large organizations -- Madsen sees in-memory as a New Fix. "If you say 'now in-memory solves the ERP problem!' you are taking the problem of why ERP fails to inform as a problem of performance, [but] that has little to do with it. The problem is complexity, abstraction, design, etc. ERP in-memory will not solve this problem."