Critical Components for Data Fabric Success
What is a data fabric and how can you implement one successfully?
- By Alberto Pan
- January 10, 2024
Today’s data management ecosystem is constantly evolving, as companies struggle to democratize and unify data assets with minimal friction. By now, most of us have heard of data fabrics. Defined by Gartner as an emerging data management design for attaining flexible, reusable, and augmented data integration pipelines, services, and semantics, a data fabric supports both operational and analytics use cases across multiple deployment and orchestration platforms and processes.
Recently, GigaOM observed that data fabric implementations fall into two categories: physical and logical. For a physical data fabric, data must be physically replicated to a common repository before it can be accessed; a logical data fabric, in contrast, establishes logical connections to distributed data sources, enabling access without replication.
Physical Versus Logical: Key Differences
Like GigaOM, Gartner and Forrester have also outlined the basic components of data fabric design. In a physical data fabric, users access data, run analytics on it, or use APIs at a consumption layer to deliver the data wherever it is needed. Prior to that, data is modeled, prepared, and curated in the discovery layer, and transformed and/or cleansed as needed in the orchestration layer. In the ingestion layer, data is drawn from one or more data sources (which can be on premises or in the cloud) and stored in the persistence layer, which is usually a data lake or data warehouse.
Logical data fabrics integrate data using data virtualization to establish a single, trusted source of data regardless of where the data is physically stored. This enables organizations to integrate, manage, and deliver distributed data to any user in real time regardless of the location, format, and latency of the source data.
Unlike a logical data fabric, a physical data fabric requires the ability to physically centralize all the required data from multiple sources before it can deliver the data to consumers. Data also needs to be physically transformed and replicated every time and be adapted to each new use case. This process is normally accomplished by extract, transform, and load (ETL) processes that perform the required steps on large volumes of data via scheduled batches. As a consequence, in physical data fabrics, the time required to attend to new data needs from the business may be quite long. Also, the proliferation of data replicas can create significant governance problems.
In a logical data fabric, data is only persisted as needed, which allows faster turnaround when attending to new business data needs and minimizes governance problems. A well-built logical data fabric supports many data integration styles, such as data replication, ETL, and streaming data. It can also take advantage of advanced caching mechanisms and AI/ML to accelerate data processing.
Leveraging Artificial Intelligence/Machine Learning (AI/ML)
A logical data fabric is like a layer of intelligence above the relevant data sources; it stores the data source’s metadata and other intelligence that support its ability to act on the underlying data. AI/ML capabilities can greatly extend a logical data fabric’s real-time data-access functions. Armed with real-time metadata, AI/ML can observe usage and make timely, intelligent recommendations to data consumers.
AI/ML can also support the automation and refinement of many data management and DataOps tasks, such as orchestrating multiple cloud systems. Query optimization, too, can be strengthened by AI/ML.
In addition to accelerating query response time through optimal strategies, AI/ML capabilities can determine what data the user is likely to require and begin providing access to it -- even before an individual requests it.
Finally, the logical data fabric can also be integrated with large language models to answer business queries expressed in natural language and to assist in creating data products.
Supporting Data Mesh and Other Modern Data Architectures
One of the most powerful features of a logical data fabric -- and also one of the simplest -- is that it uncouples data access from the data sources themselves. By facilitating this decoupling, logical data fabric enables organizations to make any number of changes to the ways data is used without affecting the underlying data.
Landsbankinn, the largest financial institution in Iceland, established a logical data fabric to reduce its duplicated development efforts (which were occurring due to the many departmentally isolated data silos) and to secure the vast number of access points. Having successfully implemented their logical data fabric, Landsbankinn uncovered an organizational flaw in the bank’s new, smoothly operating data infrastructure.
During the pandemic, Landsbankinn found itself needing to quickly deliver financial products to customers who were suffering from the COVID-19 crisis. However, to deliver these products, the company had to first go through many rounds of revisions. Initially, they were created by IT; they were then sent to the relevant domains for their review, after which they were returned to IT to complete the fix. It was a cumbersome process that left Landsbankinn wondering why the people who knew the data best couldn’t just create the financial products themselves with support from IT.
Thanks in part to the logical data fabric, Landsbankinn implemented a data mesh configuration so they could bypass their central IT team and instead rely on the business experts at a time when the data mesh concept was still very new. In a data mesh, data is managed not by one central team but by different teams with specific knowledge about different data domains. The logical data fabric enabled the bank’s data-domain stakeholders to develop financial data products in a semantic layer above the different data sources without affecting the underlying data sources.
This greatly streamlined Landsbankinn’s development efforts, enabling financial products to be created and delivered much more quickly. The data mesh structure also provided the bank with a better overview of the entire data pipeline. Finally, because the bank could reduce the need to replicate data between domains, it saved considerable time and effort.
Weaving the Right Fabric
If your enterprise needs to simplify and accelerate data access across disparate data sources, a data fabric just might be the right option. At a minimum, an effective data fabric can be comprised of a data integration component, a data catalog, metadata activation features, one or more semantics, DataOps, and data preparation layers. However, there can be many nuances to each of these elements given that a data fabric is, by definition, a composable architecture.
The data fabric design can be thought of as a journey in which each individual element can change or mature over time. However, organizations need to prioritize components that are best suited to supporting current and future data and analytics needs.
Alberto Pan is EVP and chief technical officer at Denodo, a provider of data virtualization software. He has led product development tasks for all versions of the Denodo Platform and has authored more than 25 scientific papers in areas such as data virtualization, data integration, and Web automation.