Exploring the Benefits of a Modern Data Hub
As data's sources, structures, latencies, and business use cases evolve, we need to modernize how we design, deploy, use, and govern data hubs.
- By Philip Russom
- September 9, 2019
A modern data hub is about far more than mere data persistence. Older generations of data hubs focused narrowly on consolidating data into one location and persisting it for a short list of business use cases. Today's data hubs must do more than consolidate data, and they must support growing lists of use cases in operations and analytics.
Here are a few of the other characteristics of a modern data hub.
A modern data hub is not a persistence platform. Instead, the modern data hub is a gateway through which data moves, virtually or physically. In fact, in most use cases, a modern hub collects and merges data on the fly, then passes the newly instantiated data set to a target user, app, or database with zero persistence or temporary persistence (for staging) at the hub. Think of the hub as a lens through which a broad range of users can see, access, and extract data regardless of its physical location, whether in the cloud or on premises, whether structured or unstructured.
A modern data hub represents data without physically persisting it. Rich semantics is the enabler of the broad visibility into the data of the enterprise and possibly beyond. Data virtualization techniques make it possible for the modern data hub to acquire data and instantiate data sets at runtime.
A modern data hub has enterprise scope, even with today's complex, multiplatform, and hybrid data landscapes. The query and data integration tooling of a modern hub can reach beyond the hub to all data, old and new, traditional and modern, on premises and in the cloud -- for insights and operations based on correlations of diverse data from distributed sources. The hub's integrated tooling makes this happen through a massive library of interfaces and deep support for new technologies, data types, and platforms.
The modern data hub differs sharply from old-fashioned ones. Older hubs -- especially homegrown ones -- were little more than a single database with a simple design, similar to an operational data store or a row store. By contrast, a modern hub is a connected architecture of many source and target databases. Old hubs are typically limited to a single data domain or use case, such as a customer master or a staging area for incoming transactions. A modern hub is typically multitenant, serving multiple business units, and handles all data domains and use cases.
Furthermore, many homegrown data hubs are "roach motels," where data comes in for reporting or study by a short list of users but rarely comes out to be shared and reused elsewhere. A modern data hub is the opposite: there is little or no persistence at the hub, and in most use cases data collected by the hub is immediately shared with many users and applications.
A modern data hub is not a silo. A hub cannot be a silo if it integrates data broadly, provides physical and virtual views, represents all data regardless of physical location, and is governed appropriately. In fact, a modern data hub with these characteristics is a cure for silos.
Benefits of a Modern Data Hub
A modern data hub does many compelling things. For example, it:
Creates visibility into all data. As just discussed, the hub does not consolidate silos as a way of centralizing and standardizing data. Instead, it provides views that make data look simpler and more unified than it actually is in today's complex, multiplatform data environments. This way, unique views -- for diverse business functions, from marketing to analytics to customer service -- can be created in a quick and agile fashion without migration projects that are time-consuming and disruptive for business processes and users. In addition, users can access, analyze, and share data through views that represent data with names and structures that are appropriate to their specialties and technical competencies.
Centralizes control for data usage, ownership, and sharing. Once most of your data is visible from a single console, a number of positive things become possible. Business and technical people can finally get "the big picture" by seeing all or most of a data landscape. The big picture then becomes an inventory for data that should be governed, improved, leveraged for business advantage, managed in compliant ways, and so on. Depending on how the picture is drawn, it can make users aware of relevant data assets, which leads to data sharing and data-driven collaboration.
Demands advanced capabilities that you cannot build yourself. The IT world is full of old-fashioned data hubs that are homegrown or consultant-built. TDWI sees these as feature poor and limited in business value, compared to vendor-built hubs that support advanced forms of orchestration, pipelining, governance, and semantics, all integrated in a unified toolset.
Relies on modern semantics for data visibility, access, and cataloging. This includes multiple forms of metadata (technical, business, and operational metadata) as well as search indices, domain glossaries, and browseable data catalogs. After all, it takes diverse semantics to create diverse views for multiple business and technical purposes.
Moves data at the right latency via high-performance data pipelining. Given the virtual nature of the modern hub, it regularly instantiates data sets quickly on the fly. It may also handle terabyte-scale bulk data movement. Either, way a modern data hub requires modern pipelining for speed, scale, and on-demand processing.
Provides rules and processes for fine control over data operations. This is largely enabled by modern data orchestration and well as traditional techniques such as business rules and machine learning for automating some data management tasks.
Constructs a connected architecture for what would otherwise be a bucket of silos. Again, this is accomplished without consolidating silos. Think of the data views, semantic layers, orchestration, and data pipelines just discussed. All these create threads that weave together into a data fabric, which is a logical data architecture for all enterprise data that can impose functional structure over hybrid chaos.
For Further Reading
For more information about directions in data hub modernization, read the 2018 TDWI Checklist Report The Modern Data Hub: Where Big Data and Enterprise Data converge for Insight and Simplicity. Some of ideas in this article were borrowed from this report.
Philip Russom is director of TDWI Research for data management and oversees many of TDWI’s research-oriented publications, services, and events. He is a well-known figure in data warehousing and business intelligence, having published over 600 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at [email protected], @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.