Logical Data [Warehouse] Architecture
What will tomorrow's information enterprise look like? One proposed architecture is the logical data warehouse, or LDW. The "D" in LDW might be something of a misnomer, however.
- By Steve Swoyer
- March 21, 2016
What will the information enterprise of tomorrow look like? One proposed architecture is the so-called logical data warehouse (LDW). However, the "W" in LDW might be something of a misnomer.
It isn't that the concept of a logical data warehouse doesn't make sense. It's that the system it describes is a lot more than just a data warehouse. To be sure, the data warehouse -- be it a physically instantiated system (or systems) or its logical equivalent -- is still in the mix. It's one of several connected data sources, all of them knit together via a synthetic fabric of some kind.
In a case study presentation at TDWI's latest conference in Las Vegas, that synthetic fabric was provided by Denodo Inc.'s data virtualization (DV) software. Think of DV as a service-aware version of data federation, a technology that's been with us for close to 20 years. DV works by creating the equivalent of a virtual abstraction layer for two or more distributed data sources. The idea is that data from these sources -- e.g., the values from this column in that table in an OLTP system and the values from that column in that table in an ODS -- can be joined together and presented in a virtual "view."
Got a data warehouse and/or a bunch of data marts? Don't have a data warehouse? No problem!
In the former case, you'd use DV to build synthetic "business views" -- basically, the equivalent of a presentation layer for a BI tool -- of the data in those systems. From the information consumer's perspective, it's as if they're querying against a single data source. In the latter case, you'd use DV to build the same business views -- except, in this case, you'd point them at OLTP and other sources. You're eliminating the interstitial data warehouse and effectively virtualizing its functionality.
"We didn't get rid of our data warehouses [or] data marts; we explicitly embraced them," Mark Eaton, an enterprise architect with Autodesk Inc., a Denodo reference customer, told attendees at his presentation. "By having these published views from the logical data warehouse, we're using these abstractions. If today I'm actually fulfilling those views that I'm getting from the data warehouse, tomorrow it might be better served from using SparkSQL [a SQL interpreter for the Spark cluster computing framework] on the data lake. I can get much better performance, scalability, and so on."
DV can be used to knit together operational systems, relational databases, data warehouses, and, increasingly, REST-ful cloud services to other sources -- and vice versa. As Eaton observed during a Q&A following his presentation, DV not only gets at strictly structured data sources, but at poly-structured sources such as NoSQL repositories and REST-ful applications, too. It also consolidates data quality (profiling, cleansing, matching, de-duping), master data management (MDM), and a number of other critical quality or governance features -- including data masking. In the DV model, it's possible to transform, cleanse, or mask data in-flight, i.e., on access.
A DV approach gets data architecture close to a kind of architectural ideal, Eaton argued. "We didn't just start out building a logical data warehouse: we actually had to come up with a philosophy [first]. If you look at the way you built data pipelines in the past with an operational data store or a data warehouse, [you used] ETL pipelines, [which] are painfully expensive ... to maintain and very brittle. Things that you change upstream [break them]," he told attendees.
"In a perfect world, you would never move data at all. If you had infinite bandwidth, infinite capacity in the [upstream or source] systems of record, you'd do all of your reports and run all of your analytics right where the data is."
DV also permits Autodesk to design its information architecture for performance, scalability, and availability, Eaton said. "By providing the abstraction layer ... it allows you to actually pick best-of-breed systems to fulfill the [service-level] contracts of the published views. Today, you might be using your ODS to deliver a view ... [but] down the road, you [might] realize 'We actually have this data in our big data ecosystem -- Hadoop [or] Spark -- and we get better service and better scalability on those platforms. [We can] actually change the [way that's implemented in DV] to point to a better best-of-breed system to deliver the contract on that service,'" he said.
Is there a data warehouse -- be it a conventional, standalone data warehouse or a data warehouse-like query engine -- at the heart of Autodesk's LDW architecture?
Yes, says Eaton -- for the present. The first version of Autodesk's LDW knits together its data warehouse and data mart assets, along with its upstream systems and its Hadoop-based data lake. The next version of its Denodo-powered DV abstraction layer will center on the Spark cluster computing framework -- and on SparkSQL, a SQL-compliant interpreter/query engine for Spark.
Autodesk won't miss its existing SQL Server- and Oracle-based data warehouses, Eaton told Upside.com. "We were using both Microsoft SQL Server and Oracle [for data warehouse services]. We had a very, very traditional [warehouse architecture], with fact tables and dimension tables. When you move that to SparkSQL, the fact that you're doing the vast majority of your processing in memory alone means that you were getting at least a 10x performance increase," he said.
"SparkSQL 1.6 is pretty darn close to language maturity. It isn't fully ANSI-standard [SQL], but it supports enough [of the ANSI standard] to do some really complicated stuff."
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].