Bringing Architecture Back
TDWI Research's Philip Russom describes a logical architecture that knits together the multi-platform present (with its panoply of databases) into a diverse data warehouse environment, or DWE.
- By Stephen Swoyer
- May 13, 2014
According to a new Best Practices report from The Data Warehousing Institute (TDWI), few organizations maintain a strictly "pure" EDW -- i.e., a central data warehouse that functions as a repository for all analytic information and that isn't supplemented or complemented by another platform or platforms.
The enterprise data warehouse (EDW) is not dead; but, for many organizations, it's evolving into the DWE.
According to the report's author, TDWI research director for data management Philip Russom, few organizations maintain a strictly "pure" EDW -- i.e., a central data warehouse that functions as a repository for all analytic information and that isn't supplemented or complemented by another platform or platforms.
All the same, an EDW of some kind exists in almost half (46 percent) of all organizations. These systems are being extended by supplementary or complementary platforms, such as operational data stores (ODS), data marts, SQL analytic appliances, or workgroup-level data sources. Just 15 percent of respondents to a November, 2013 TDWI survey had what they described as a completely "pure" EDW environment, although nearly one-third (31 percent) had a core EDW that they'd supplemented with "edge systems" -- such as analytic database systems, NoSQL repositories, and streaming databases.
This is why Russom wants to turn the familiar "EDW" acronym on its head, proposing, instead, a data warehouse environment, or DWE.
"[M]any [edge systems are] designed by vendors or users for workloads for which the average DW is not optimized, especially workloads for new forms of big data, processing for advanced analytics, managing tens of terabytes of source data, and data for real-time operations," writes Russom in the report, Evolving Data Warehouse Architectures in the Age of Big Data.
"The modern [DWE] ... still includes an enterprise data warehouse ... but the EDW is complemented by several other types of data platforms," he continues. "A DWE includes the usual marts, ODSs, and staging areas, as well as newer standalone platform types for DW appliances, columnar databases, NoSQL databases, Hadoop, real-time technologies, and various analytic tools."
Data warehouse (DW) groups implement platforms such as Hadoop in order to extend their existing data warehouse systems. Hadoop enables new kinds of practices, workloads, and techniques that can't (cost-effectively) be instantiated in a traditional DW. From a data management (DM) perspective, this usage pattern is conceptually the same because the data warehouse and its constellation of resources is still the focal point of decision support and data management (DM).
"A small but prominent community of DW professionals is integrating the Hadoop Distributed File System (HDFS) and other Hadoop tools into their DW environments. There are many areas within a DW architecture where HDFS shows promise, namely data staging areas, data archives, and repositories of big data for analytics," Russom writes. "Yet, relational technologies are still required for most reporting, OLAP, and performance management functions supported by the average DW, so relational databases, SQL, and other relational technologies are not imperiled by Hadoop."
Architecture by Any Other Name
The centerpiece of Russom's report is a lengthy consideration of data warehouse architecture. It begins with a seemingly obvious question: What do we mean by "data warehouse architecture?" Put another way, Russom asks what makes up a "data warehouse architecture." Most respondents -- 71 percent -- view a data warehouse architecture as "standards for data models, data quality, metadata, interfaces, development methods, and so on;" and almost as many (66 percent) view architecture as "a logical design for data structures and the patterns or relationships among them." (A majority -- 56 percent -- views architecture in terms of a physical model; just 12 percent is still fighting the Inmon/Kimball wars of the 1990s.)
Most respondents (79 percent) say their primary data warehouse system has a coherent architectural design; of these, more than half (54 percent) say this architecture is evolving at least moderately, A sizeable minority (22 percent) say their DW architecture is undergoing dramatic change.
Russom also explores the business and technology drivers pushing organizations to make (incremental or, occasionally, dramatic) changes to their DW architectures, as well as the perceived role that architecture plays in making or breaking the success of a DWE. Respondents cited a wide range of technical and business drivers: on the technical tip, challenges and drivers include demand for advanced analytics (cited by 57 percent of respondents) and real-time access to data (41 percent), along with increasing data volumes (56 percent). From a business perspective, respondents cited a familiar litany of challenges, including competitiveness (45 percent), "fast-paced" business processes (43 percent), centralizing control (30 percent), compliance requirements (29 percent), funding (29 percent), merger-and-acquisition activity (18 percent), and so on.
The list of technical challenges and drivers is a peculiar mix of old and new: nearly one-in-three respondents (30 percent) cited good, old OLAP as a key engine for DW architectural change, which outpaced new-fangled drivers such as non-relational data (25 percent), data virtualization (23 percent), cloud adoption (21 percent), and streaming data (15 percent).
A Multi-Platform Multi-verse
The DWE on its own isn't rationalized: it's a heterogeneous collection of platforms, tools, and services -- an orbiting constellation of systems -- that extends the core data warehouse architecture.
In other words, the DWE as such isn't always deliberately conceived and implemented. In many cases, a DWE accrues: it grows over time as data and additional standalone platforms are added -- as, for example, when a BI group or business unit identifies a need that its existing data warehouse system and/or edge systems can't address.
This growth may or may not have been managed intelligently. The result can be a loose configuration of systems that, to some extent, just works. Russom's vision of a "multi-platform DWE" -- like Gartner Inc.'s "logical data warehouse" and Enterprise Management Associate Inc.'s "hybrid data ecosystem" -- is an attempt to knit together and rationalize this configuration.
"[D]ata and processing for SQL-based analytics are regularly offloaded to DW appliances and columnar DBMSs. A few teams offload workloads for big data and advanced analytics to HDFS, MapReduce, and other NoSQL platforms. It's not just DWs; some users are offloading ETL jobs from an expensive data integration server to less expensive Hadoop," he writes. "The result is a strong trend toward distributed DW architectures, where many areas of the logical DW architecture are physically deployed on standalone platforms instead of the core DW platform."
That list of highly diverse workloads helps explain why users deploy multiple data platforms. In a multi-platform physical topology, users with diverse workloads can choose the platform that is best designed or optimized for a particular workload. Other reasons (albeit, less desirable reasons) for multiple data platforms include "system accrual" (as described earlier), mergers and acquisitions, uncontrolled evolution, the proliferation of analytics silos, and the vagaries of sponsorship and funding.
There's no silver bullet for knitting together a multi-platform DWE -- nor (at this point, and given the complexity of what's involved) could or should there be. "Integration within the DWE can take many forms, including shared dimensions, data sync, federation, ETL jobs, data flows across DWE platforms, real-time interfaces, and so on," Russom points out. "Unless the platforms of a DWE are integrated at appropriate levels, the DWE is just a bucket of silos. A unifying architectural design is more efficient technically and more effective for business users."
Russom's 39-page report can be downloaded at no charge here.