Modernization Projects Will Dominate Data Management Through 2020
Data management solutions are modernizing furiously to keep pace with new analytics, data usage, emerging technologies, and business innovations.
- By Philip Russom
- December 19, 2019
Drivers for Data Management Modernization
If you're not sure whether you need to modernize your data management solutions, there are several good reasons to consider it.
Analytics is the leading driver for change in data management. Business people are demanding more analytics, whether that entails upgrades to existing analytics (OLAP, customer intelligence, fraud detection) or creating new analytics (based on mining, predictions, real-time reactions, etc.). Each analytics use case has unique data requirements that force changes in how data is captured, aggregated, and prepared for analytics. At the moment, the hottest growth area is data preparation for machine learning because of new advances (such as AutoML) that promise to take predictive analytics into a new generation.
For further reading: 2018 TDWI Upside article, "Data Requirements for Machine Learning"
Data warehouses, similar data sets, and their platforms or tools need technical upgrades. It's a life cycle thing. Solutions built for the requirements of decades past should be modernized to meet today's business needs (analytics, operational monitoring, management dashboards) and data characteristics (volume, structure, modeling, latency, storage, etc.).
Data management solutions need to leverage new technology. There's a lot of new technology users can take advantage of today, the leading options being new database types and cloud-based data platforms, plus Hadoop, Cassandra, and other open source software. In addition, users should consider new tooling for data integration, reporting, and advanced analytics, as well as modern development methodologies such as lean and agile.
Among these shiny new options, the leading enabler for modern data management is cloud. In fact, cloud features prominently in many users' plans for data management modernization because the cloud's elasticity and object storage are good for the scalability and flexible data handling they need.
For further reading: 2019 TDWI Best Practices Report: Cloud Data Management
Data management solutions need better business alignment. Modernization -- even when driven by analytics, solution life cycles, and new tech -- is an opportunity to realign data strategies with business strategies. In other words, we should all be forever diligent in ensuring that data management work and its data products support modern business, which is increasingly digital and data driven.
Projects for Data Management Modernization
Data management modernization is usually organized as a development project, where each modernization project focuses on a specific solution type (warehouse, lake, hub) or layer within a technology stack (data platform, virtualization, metadata). Here is my list of modernization projects that will continue to dominate work in data management through 2020.
1. Data Warehouse Modernization
The fact that data warehouse modernization has been ongoing for years shows that the data warehouse continues to be relevant today -- but only when modernized appropriately. Warehouse modernization is often driven by the business need for a broader range of analytics (as mentioned) and to give the warehouse better quality data, modern data models, enriched metadata, new subject areas, in-place analytics processing, and greater speed and scale.
For further reading: 2019 TDWI Pulse Report: The Modernization of the Data Warehouse
To enact these improvements, many organizations choose to migrate the data warehouse -- wholly or in part -- to new data platforms, which are increasingly in the cloud. The modern data warehouse is multiplatform and hybrid, typically with reporting data managed on premises and data for advanced analytics managed in the cloud.
For further reading: 2019 TDWI Checklist Report: Cloud at Scale for Modern Data Warehousing
2. Hadoop Modernization
A complaint that all Hadoop users share (regardless of the use cases they implemented on Hadoop) is that the cluster required for the Hadoop Distributed File System (HDFS) is far more complex to design, set up, and maintain than they thought. Even worse, a successful data lake will increasingly demand more nodes for the cluster, which gets very expensive in terms of administrative payroll and on-premises server hardware.
HDFS aside, Hadoop suffers from other problems. Retrofitting relational and metadata capabilities onto Hadoop turned out to be harder than expected, resulting in weaker functionality. In turn, these weaknesses limit important data lake use cases, namely self-service data exploration and operational reporting at scale.
Even so, Hadoop excels with discovery-oriented analytics, which is important to data scientists and analysts. As users modernize Hadoop, they need to address the cost and admin challenges of HDFS, and they need to provide Hadoop users with better support for relational and semantic functions without diminishing Hadoop's killer apps in data science and big data analytics.
For further reading: 2019 TDWI Upside article, "The Death of Hadoop?"
3. Data Lake Platform Modernization
Most data lake users are committed to the data lake's method of managing diverse big data, but not so much Hadoop as the lake's platform. These users want to continue with the data lake method and even expand it into more use cases, but they know they cannot modernize and mature their data lake successfully on the current state of Hadoop.
In a related trend, a number of users have prototyped their data lake on relational databases or some other on-premises system and they must select a more affordable or more easily scaled platform before expanding the lake.
Although data lakes are relatively new, user organizations are already modernizing them, largely by migrating or redistributing lake data to different platforms. Some are keeping Hadoop as the data lake platform but migrating to cloud-based Hadoop. Others are abandoning Hadoop altogether in favor of newer databases and warehouse platforms built specifically for the cloud. Still others are redistributing data across multiple platforms, often in hybrid architectures that involve a mix of on-premises and cloud systems.
For further reading: 2019 TDWI Upside article, "Data Lake Platform Modernization: 4 New Directions"
4. Logical Data Warehouse Modernization
TDWI has always defined the data warehouse as a data architecture of multiple layers. Some layers are inherently physical, as in the systems architecture where hardware servers and software servers combine to form data platforms where data is stored. Other layers are intrinsically virtual, as in the logical data warehouse, where data types, structures, names, and relationships are documented in data semantics ranging from modern data catalogs to tried-and-true metadata (whether technical, business, or operational).
As a special layer of the data warehouse (similar to the operational data warehouse layer discussed below), the logical data warehouse also needs to be modernized to keep pace with new data sources and targets as well as new business use cases in both operations and analytics.
For further reading: 2019 TDWI Upside article, "Modernizing the Logical Data Warehouse"
5. Operational Data Warehouse Modernization
The operational data warehouse has been around for years, attempting to bring reporting and analysis functions closer to business operations at performance levels closer to real time. To improve on the concept, the modern operational data warehouse (ODW) is built on the latest technology for superior speed, scale, low maintenance, new functionality, and cost containment.
In addition, a modern ODW is built to handle an extremely broad range of data types at massive scale with performance so fast that it approaches real time. Many organizations need to modernize their ODWs to deliver insights from hybrid data architectures quickly enough to impact operational business decisions as well as to enable time-sensitive, data-driven business practices such as business monitoring, e-commerce recommendations, and fraud prevention.
For further reading: 2019 TDWI Checklist Report: Building a Modern Operational Data Warehouse
6. Data Hub Modernization
Older hubs -- especially homegrown ones -- were little more than a single, siloed database with a simple design similar to an operational data store or a row store. By contrast, a modern hub is a connected architecture of many source and target databases.
Old hubs are typically limited to a single data domain or use case, such as a customer master or a staging area for incoming transactions. A modern hub is typically multitenant, serving multiple business units, and handles all data domains and use cases. To enable these complex use cases, a modern data hub has tools for data pipelining, orchestration, virtualization, curation, and publish and subscribe.
Many organizations need to decommission their feature-poor and narrowly designed homegrown hubs and upgrade to vendor-built ones that address modern requirements for multiple use cases.
For further reading: 2019 TDWI Upside article, "Exploring the Benefits of a Modern Data Hub"
7. Metadata Management and Other Data Semantics Modernizations
Technical metadata continues to be required for data access and cross-system interoperability, whereas new use cases in self-service, data exploration, data prep, and dashboards require business metadata. Operational metadata can be useful for system performance monitoring, security analytics, and capacity planning.
Organizations need to move beyond technical metadata to get value from business and operational metadata, as well as other approaches to data semantics, such as business glossaries and especially the new discipline of data cataloging. Without modern data semantics, many use cases for the modern data warehouse will be hamstrung.
For further reading: 2019 TDWI Upside article, "Modern Metadata Management"