Modern Requirements for the Operational Data Warehouse
New requirements for data, software, and business practices are driving a new wave of modernization for the operational data warehouse.
- By Philip Russom
- June 7, 2019
Modern enterprises looking for growth in revenue and profitability know that data is critically important to gaining competitive advantage. A high return on investment comes from digitally transforming business operations by capturing and analyzing a greater variety and volume of data to inform better business insights. When data sets with extremely diverse structures and characteristics are integrated for multiple use cases in operations and analytics, we call the resulting data set hybrid data.
Valuable hybrid data comes from an increasing number of different sources -- both old and new, internal and external. It is inevitable that hybrid data will arrive in many structures, schemas, and formats with variable characteristics for volume, latency (from batch to streams), concurrency, requirements for storage and in situ processing, and the emerging characteristics of machine data (IoT standards, geocoding, events, images, audio, training data for machine learning, etc.).
From a technology viewpoint, it is challenging to integrate data of such diverse characteristics. From a business viewpoint, however, integration is well worth the effort because it provides deeper visibility into business processes and richer analytics insights than were possible before hybrid data's greater variety emerged.
Hybrid Data Architectures
Hybrid data usually drives users to deploy many types of database management systems and other data platforms (such as Hadoop and cloud) to capture, store, process, and analyze hybrid data. After all, it's difficult or impossible to optimize a single instance of a single data platform type to satisfy the eclectic requirements of hybrid data's multiple structures, latencies, storage paradigms, and analytics processing methods.
TDWI sees the diversification of data and the quickening adoption of advanced analytics as the strongest drivers toward hybrid data architectures, so called because hybrid data is increasingly distributed across multiple platforms, both on premises and on one or more clouds. For some use cases, the right tool for a particular data type might be sufficient. However, there is more value in integrating access and analysis in a hybrid data architecture that can deliver the scale and performance needed to produce actionable insights.
The Modern Operational Data Warehouse (ODW)
Hybrid data and hybrid data architectures are already here. To get full business value from them, you need an appropriate data management platform, and that's where the modern operational data warehouse comes in. The modern ODW delivers insights from a hybrid data architecture quickly enough to impact operational business decisions.
The operational data warehouse continues to focus on speed. Note that the operational data warehouse has been with us for decades, sometimes under synonyms such as the real-time, active, or dynamic data warehouse. No matter what you call it, the operational data warehouse has always involved high-performance data ingestion and query so that data travels as fast as possible into and out of the warehouse.
Through analysis, an ODW provides timely insights for time-sensitive decisions such as real-time offers in e-commerce, network optimization, fraud detection, and investment decisions in trading environments. However, an ODW also supports time-sensitive operational processes such as just-in-time inventory, business monitoring, and operational reporting.
Performance and real-time requirements continue to apply to the ODW; however, a modern ODW must also handle a broader range of data types and sources at unprecedented scale as well as new forms of analytics. The modern ODW satisfies requirements old and new largely by leveraging the speed and scale of new data platforms and analytics tools.
The modern ODW is a hybrid data management solution and is hybrid in multiple ways. It integrates hybrid data from multiple operational systems and other sources. The modern ODW is built to handle modern data, which trends toward hybrid combinations. Furthermore, an implementation of a modern ODW may itself be hybrid when it spans both on-premises and cloud systems. In addition, a modern ODW tends to have substantial data integration capabilities that integrate data among the source and target systems of a hybrid data architecture.
The best ODWs operate with very low latency. A modern ODW is built for today's hybrid data and business use cases that demand real-time or near-real-time performance. Low-latency use cases supported by modern ODWs include real-time analytics, operational reporting, management dashboards, business activity monitoring, catching fraud before cash leaves the ATM, and making an offer before the potential customer leaves the store or website.
A modern ODW is strong where other approaches are weak. For example, the traditional enterprise data warehouse is great as a corporate "single source of truth" but inflexible and expensive. Operational data stores are fast in a limited domain but not extensible to larger enterprise needs. Data lakes are great for storing big and varied data economically but poor at data governance and predictable performance.
By comparison, a modern ODW is built on the latest technology (see below) for superior speed, scale, maintenance, functionality, and cost containment. In addition, a modern ODW assumes that leveraging hybrid data is its raison d'etre, so it is built to handle an extremely broad range of data types at massive scale with extremely high performance.
Given this daunting list of system requirements, it is unlikely that a user organization can satisfy even half of them with a homegrown system that was built by IT groups or consultants. Therefore, users should seek vendor-built systems designed and optimized for modern operational data warehousing.
A successful ODW leverages recent advancements in data platforms and tools. These include parallel execution, columnar databases, in-memory execution, high-speed storage, distributed file systems, scalable clusters, elastic clouds, cloud-based databases, and managed services for cloud data solutions. Because of the extreme diversity of hybrid data, a successful ODW will interoperate via many access methods (such as R, Scala, SQL, or GUI), accommodate a wide variety of user skills (from data scientist to business user), and flexibly support new deployment models (data center, public cloud, private cloud, managed service, multicloud, and hybrid cloud, alone or in any combination).
A Final Word
For more information, read "TDWI Checklist Report: Building a Modern Operational Data Warehouse," online at https://tdwi.org/checklists. This article is drawn from that report.
Philip Russom is director of TDWI Research for data management and oversees many of TDWI’s research-oriented publications, services, and events. He is a well-known figure in data warehousing and business intelligence, having published over 600 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at [email protected], @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.