Defining the Logical Data Warehouse
It's a logical or virtual layer of the DW architecture that integrates the physical layers of architecture under it.
- By Philip Russom, Ph.D.
- October 20, 2015
In recent years, the concept of the logical data warehouse (LDW) has been mentioned frequently by all kinds of people and organizations. Unfortunately, few discussions attempt a definition under that assumption that we all know what it is and why it's valuable. Allow me to correct that omission by providing some baseline definitions and benefits.
A well-designed data warehouse architecture has multiple layers, and each layer may have its own architecture. For example, there is a systems architecture involving hardware servers and networks; most people conceptualize this at the bottom of the technology stack because it serves as a foundation. Atop that is layered a software architecture, where database management systems (DBMSs) and other data management tools or platforms run. The next layer up is a storage layer where data is persisted; the trend is to persist data on many platforms, both inside and outside the warehouse environment, which makes this layer rather complex. The final (and topmost) layer is a data architecture that uses metadata, semantics, indexing, and view technologies to create organized views into data in the persistence layer as well as data elsewhere in the enterprise or beyond.
The logical data warehouse (LDW) is the topmost layer of the DW architecture. This layer is a data architecture, in that it is mostly a collection of data models and databases (that's "database" in the sense of a collection of data, not a DBMS), plus the keys, interfaces, and dependencies that link the models together into a connected logical design. Keep in mind that this is a primarily a "view" layer consisting of logical definitions with little or no actual data. Even so, it complements and integrates the "storage and persistence" architecture just under it, where actual data lives.
Data views aside, the LDW should also have rich interfaces and operate in real time. The LDW must be more than a view layer; it should also support many interface types so that a wide range of tools and users can access data through its views. Furthermore, data viewed through the LDW architectural layer can be physically located just about anywhere. The assumption is that a fully developed LDW will interface with many data sources and instantiate many data structures, on the fly, as needed by users and applications, using federation, virtualization, and view technologies. In use cases involving time-sensitive data, the LDW also serves as a real-time and near-time architectural layer.
A fully developed logical data warehouse serves many beneficial purposes:
Data views aside, the LDW should also have rich interfaces and operate in real time. The primary point of the LDW layer is to provide a fairly comprehensive big picture of data managed in the extended DW and other data environments. This is a single layer through which data can be seen, discovered, accessed, and analyzed, thereby reducing data redundancy, movement, and processing.
Specialized views of data. The virtual views enabled by most LDWs can enable new ways of exploring and analyzing data, data inventory and audit methods for data governance and stewardship, friendly views of data that give business people access to data, and the agile construction of new datasets for analytics and reporting.
Integration and interoperability. The logical layer of a DW usual supports a diverse, high-performance collection of interfaces that enable cross-platform integration and interoperability for broad queries, data exploration, and analytics with data within the extended data warehouse environment, as well as other enterprise data ecosystems.
Speed and scale for new on-the-fly practices. When the LDW includes high-performance, real-time, and near-time functionality, it can provision data that is fresher (as required by time-sensitive business processes), and it can create data structures at run time (as required by discovery oriented analytics) without limiting data to the pre-built structures of the DW's persisted store. Achieving these advantages has been a challenge in the past because software, hardware, and networks simply lacked the speed, scale, and reliability required of large, complex, ad hoc instantiations of complex data. Today, multiple advances have made the logical data warehouse fully practical, such that it's time for more organizations to embrace it.
To learn more about the logical data warehouse and similar concepts, replay two recent TDWI Webinars on the subject:
-- The Logical Data Warehouse: What It Is and Why You Need It. Originally broadcast June 24, 2015
-- Drawing the Big Picture: Multi-Platform Data Architectures, Queries, and Analytics. Originally broadcast August 26, 2015