Executive Summary: Evolving Data Warehouse Architectures
- By Philip Russom, Ph.D.
- April 1, 2014
In the early days of data warehousing, most data warehouses (DWs) were centered around a single-instance
database, plus a few “edge systems” for data marts, operational data stores (ODSs), and data
staging. Over the years, TDWI has seen a strong trend toward more and more edge systems, many of
them designed by vendors or users for workloads for which the average DW is not optimized,
especially workloads for new forms of big data, processing for advanced analytics, managing tens of
terabytes of source data, and data for real-time operations.
The modern data warehouse environment (DWE) still includes an enterprise data warehouse (EDW),
but the EDW is complemented by several other types of data platforms. A DWE includes the usual
marts, ODSs, and staging areas, as well as newer standalone platform types for DW appliances,
columnar databases, NoSQL databases, Hadoop, real-time technologies, and various analytic tools.
Of course, some organizations have multiple warehouses, as well. Given the rising complexity, data
warehouse architecture is more critical than ever in order to make sense of, govern, and optimize the
complicated multi-platform DWEs that many user organizations are building.
When asked what components a DW architecture should include, users answering this report’s
survey identified data standards (71%), a logical design (66%), and a physical plan (56%). The user
consensus is that DW architecture includes multiple layers and components, and success usually
entails balancing a logical design (mostly data models and standards) with a physical plan (mostly a
portfolio of servers and how they integrate).
The leading drivers for change in DW architectures are advanced analytics (57%), big data
management and leverage (56%), and real-time operations (41%). Eighty-four percent feel that DW
architecture is an opportunity when it achieves the leading drivers. The top barrier to success is the
current lack of skills with big data and advanced analytics that most DW teams suffer.
Completely pure DW architectures are rare. For example, only 15% of respondents have a single
EDW with no additional data platforms. Even when a DW has a well-defined architecture, users will
ignore the plan for some data domains, platforms, or workloads. That’s why 31% have a core EDW,
but with additional platforms in the DW environment, typically for analytic, big data, and real-time
workloads. An increasing number of DW architectures are hybrids; for example, it’s common that
Inmon and Kimball approaches are applied within the same DW environment.
A small but prominent community of DW professionals is integrating the Hadoop Distributed File
System (HDFS) and other Hadoop tools into their DW environments. There are many areas within a
DW architecture where HDFS shows promise, namely data staging areas, data archives, and
repositories of big data for analytics. Hadoop is promising because it handles data types that the
average DW cannot, namely unstructured data (text, audio, video) and semi-structured documents
(XML, JSON). Yet, relational technologies are still required for most reporting, OLAP, and
performance management functions supported by the average DW, so relational databases, SQL, and
other relational technologies are not imperiled by Hadoop. Instead, TDWI expects them to be
commonly integrated with Hadoop technologies in DW environments as early as 2016.
To help users prepare for new DW architectures, this report quantifies trends in data warehouse
architectures and catalogs newly available, relevant technologies. The report also documents how
successful organizations are evolving their architectures to leverage new business opportunities for
big data. The goal is to provide data warehouse professionals and their business counterparts with
the information they need before planning the next generation of their logical data warehouse
architecture and its physical deployment.
Actian, Cloudera, Datawatch Corporation, Dell Software, HP Vertica, and MapR Technologies
sponsored the research and writing of this report.
- Download the full, 39-page report
- Register for the Webinar
Philip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 600 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him by email (email@example.com), on Twitter (twitter.com/prussom), and on LinkedIn (linkedin.com/in/philiprussom).