RESEARCH & RESOURCES

Executive Summary: Evolving Data Warehouse Architectures

In the early days of data warehousing, most data warehouses (DWs) were centered around a single-instance database, plus a few “edge systems” for data marts, operational data stores (ODSs), and data staging. Over the years, TDWI has seen a strong trend toward more and more edge systems, many of them designed by vendors or users for workloads for which the average DW is not optimized, especially workloads for new forms of big data, processing for advanced analytics, managing tens of terabytes of source data, and data for real-time operations.

The modern data warehouse environment (DWE) still includes an enterprise data warehouse (EDW), but the EDW is complemented by several other types of data platforms. A DWE includes the usual marts, ODSs, and staging areas, as well as newer standalone platform types for DW appliances, columnar databases, NoSQL databases, Hadoop, real-time technologies, and various analytic tools. Of course, some organizations have multiple warehouses, as well. Given the rising complexity, data warehouse architecture is more critical than ever in order to make sense of, govern, and optimize the complicated multi-platform DWEs that many user organizations are building.

When asked what components a DW architecture should include, users answering this report’s survey identified data standards (71%), a logical design (66%), and a physical plan (56%). The user consensus is that DW architecture includes multiple layers and components, and success usually entails balancing a logical design (mostly data models and standards) with a physical plan (mostly a portfolio of servers and how they integrate).

The leading drivers for change in DW architectures are advanced analytics (57%), big data management and leverage (56%), and real-time operations (41%). Eighty-four percent feel that DW architecture is an opportunity when it achieves the leading drivers. The top barrier to success is the current lack of skills with big data and advanced analytics that most DW teams suffer.

Completely pure DW architectures are rare. For example, only 15% of respondents have a single EDW with no additional data platforms. Even when a DW has a well-defined architecture, users will ignore the plan for some data domains, platforms, or workloads. That’s why 31% have a core EDW, but with additional platforms in the DW environment, typically for analytic, big data, and real-time workloads. An increasing number of DW architectures are hybrids; for example, it’s common that Inmon and Kimball approaches are applied within the same DW environment.

A small but prominent community of DW professionals is integrating the Hadoop Distributed File System (HDFS) and other Hadoop tools into their DW environments. There are many areas within a DW architecture where HDFS shows promise, namely data staging areas, data archives, and repositories of big data for analytics. Hadoop is promising because it handles data types that the average DW cannot, namely unstructured data (text, audio, video) and semi-structured documents (XML, JSON). Yet, relational technologies are still required for most reporting, OLAP, and performance management functions supported by the average DW, so relational databases, SQL, and other relational technologies are not imperiled by Hadoop. Instead, TDWI expects them to be commonly integrated with Hadoop technologies in DW environments as early as 2016.

To help users prepare for new DW architectures, this report quantifies trends in data warehouse architectures and catalogs newly available, relevant technologies. The report also documents how successful organizations are evolving their architectures to leverage new business opportunities for big data. The goal is to provide data warehouse professionals and their business counterparts with the information they need before planning the next generation of their logical data warehouse architecture and its physical deployment.

Actian, Cloudera, Datawatch Corporation, Dell Software, HP Vertica, and MapR Technologies
sponsored the research and writing of this report. 

  • Download the full, 39-page report
  • Register for the Webinar

About the Author

Philip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 550 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him by email (prussom@tdwi.org), on Twitter (twitter.com/prussom), and on LinkedIn (linkedin.com/in/philiprussom).


TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.