What You Need to Know about Multi-Database Data Architectures
It's about the design of individual databases, but even more about how all databases in an enterprise relate and interact.
By Philip Russom, TDWI Research Director
Allow me to start with a couple of quotes from Wikipedia's article on data architecture:
"Data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations."
"A data architecture should set data standards for all data systems as a vision or a model of the eventual interactions between those data systems."
I like the Wikipedia article because it pulls together two different scopes of data architecture -- one for individual databases, another for integrating multiple databases and their applications on an enterprise scale.
On the one hand, each individual database needs a proven and appropriate architecture that gives it high performance, fidelity to business entities modeled, and readability for colleagues who will inherit the database later.
On the other hand, few databases exist in a vacuum anymore because data is so often integrated and synchronized across databases for the sake of 360-degree views of key business entities and for the purpose of giving one department visibility into another department. Therefore, there's a need for multiple databases to share a common architecture (with common models, policies, rules, and standards) which in turn makes it easier to integrate and sync data across multiple systems.
To clarify these points, let's dive into some of the consequences of multi-database data architectures, sometimes called enterprise data architectures.
The line between data model and data architecture can be hard to see. For example, in data warehousing, we regularly refer to star schema and multi-dimensional cubes as architectures, whereas they're really data models. The confusion is natural, since a good data architecture will specify preferred data models for certain situations.
A true data architecture will specify preferences for many "standards." It's not just data models we're talking about. It's also standards for hand coding in certain programming languages, which interfaces to use in which situations (ODBS versus proprietary APIs versus data services, etc.), and the usual data standards about data types and their acceptable formats and value ranges.
Architect a database so it's conducive to data integration. After all, data synchronization is common and to be expected. However, you might also design a customer record so that most customer-facing apps can comply with a single design. Likewise, you might alter table designs so it's easier to find recently altered or created rows, which speeds up change data capture and data synchronization. Obviously, data flows across multiple systems in these use cases, so enterprise data architecture may also encompass or overlap with data integration architecture.
Enterprise data architecture assumes multiple databases on multiple types of data platforms. After all, different types of data and data from diverse sources may be so different as to require multiple, diverse platforms just to capture, store, and process all the data properly. This grows ever truer as big data drives up the diversity of data types and sources. Hence, any system architecture that complements a data architecture should assume a broad portfolio of data platforms, including relational databases (whether the established mature brands or the newer columnar and appliance-based ones), non-relational databases (whether new NoSQL databases or legacy ISAM, VSAM, etc.), or the Hadoop Distributed File System (HDFS). Data diversity aside, enterprises tend to have many data platform types (due to acquisitions, departmental funding, lingering legacies, etc.), which affect the scope and details of an enterprise data architecture.
Data governance (DG) can, among other things, enable data architecture. Although it's true that many DG programs originated to provide policies for the compliant business use of enterprise data, most also are a collaborative board for establishing and policing technical standards for data and the development of data management solutions. Hence, a DG board can be useful organizational unit for determining the standards of enterprise-scope data architecture. Data architects have already discovered this; TDWI surveys show that the job title "enterprise data architect" is quite common among people serving on a data governance board.
A data architect is more of an archeologist than an architect. Data architects rarely architect (or "design" or "model") new databases. Instead, the average data architect spends much of his/her time digging into existing applications and databases to determine what adjustments would help these systems fit into larger, enterprise-scope, multi-database data architectures for greater data performance, compliance, integration, virtualization, and richness of content.
For more information on data architectures, consider attending the next TDWI World Conference (February 23-28, 2014) or TDWI BI Executive Summit (February 24-26, 2014). Both include substantial content about data architectures and their evolutions, plus how successful organizations are adapting to them.