Q&A: Data Fabric Use Cases Emerging for Modernization
The data fabric concept is predicted to evolve from a buzzword to a concrete implementation stage very quickly because it promises and provides great business value, argues Nick Golovin, Ph.D., CEO and the technical founder of Data Virtuality.
- By James E. Powell
- June 4, 2021
With the promise to transform the data landscape and play a major role in data management modernization, data fabrics are a hot topic this year. Is it a technological concept whose time has come? If so, what exactly is it?
Upside: How would you define data fabric?
Nick Golovin: Data fabric describes a concept for the overall management of a company's data landscape with key components such as data integration, data preparation, metadata management, data governance, and data orchestration. It can take over the role of traditional, monolithic approaches such as data warehouse or data lake as a single source of truth. In this way, modern use cases for data management and analysis can be enabled.
This does not mean that there is no room for data warehouses or data lakes -- as a technology for storing data of certain types and for certain use cases they will continue to exist and evolve, but no longer as the holistic concept for data management in companies.
Why do you think that this concept is gaining so much traction lately?
The concept takes three important facts into account:
- The data landscape is becoming increasingly complex
- Agility is becoming increasingly important
- Use cases are becoming more diverse.
In many use cases, data is not only retrieved from a single data warehouse or data lake but -- as an example -- from two data warehouses, a data lake, an operational data store (ODS), and five APIs. These data sources must be connected and populated in different ways, for example via ETL, ELT, CDC, or streaming. This is the new "digital" reality.
Furthermore, other important issues such as data quality, data governance, metadata management or data lineage need to be considered as well. A data fabric provides a framework for all the above.
Which technologies are needed to build a data fabric?
A data fabric combines different steps of data management. There is a data-delivery/data abstraction layer. In addition, metadata plays an increasingly important role -- it is enriched and made accessible to business users. Data pipelining and business logic are also part of data fabric. All of these individual technologies are not new in themselves, but combining them into a single architecture brings new advantages. For example, end-to-end metadata management enables significantly better data governance and data quality.
What is the role of data virtualization in this concept?
Data virtualization plays a central role as a data delivery layer (alternative terms are data abstraction layer and data access layer). It provides the agility and real-time capabilities that are crucial for a data fabric. Many companies leverage data virtualization initially and then transition toward a data fabric architecture over time.
Are there other similar concepts? What about data mesh, data lakehouse, and unified data and analytics platforms, for example?
There are certainly other approaches such as lambda architecture, data lakehouse, united data and analytics architecture, or also data mesh. The names and concepts are different, but they all have one premise in common: the understanding that the business use cases cannot be served with a single technology such as a data warehouse or data lake. Instead, you have to combine several technologies to achieve good results. I am observing a change from a technology-centric to a use case-centric approach.
How will the role of a citizen data use evolve with this concept?
In the data fabric concept, data analysts are taking on a greater role in data management. Today, many data analysts can work well with, for example, SQL, the programming language for database management. However, they are not data engineers. Sometimes they are in a central unit of an organization and sometimes they are embedded in respective divisions.
In the past, data analysts usually executed the last phase of data pipelining, i.e. analytics, business intelligence, and data visualization. Data fabric with an SQL-based data delivery layer -- as in Data Virtuality's Logical Data Warehouse -- enables data analysts to get closer to the data and to work with a range of different tools.
For what kind of companies is this concept well-suited?
This concept works well for every company that wants to meet the increasing demands of digital transformation and regulatory requirements with a new approach. Also, companies that plan or transform their monolithic core system architecture (such as the traditional core banking systems) to a distributed, agile application architecture. For such complex challenges in the field of data, the concept of data fabric could be the answer, I think.
What are specific use cases that can be realized with this concept? Can you name a few?
Many of the current use cases revolve around digital transformation in general and digital communication channels with customers in particular. Nowadays there is a large amount of information about customers in many different sources that need to be integrated for Customer 360 / know your customer (KYC), and other customer-related use cases. This data is needed to interact with the customer in the best possible way.
Another popular use case is the regulatory reporting requirements of banks and financial institutions. There are also some use cases on topics such as fraud detection and IoT.
Other challenges that data fabric can help with include cloud migration or changes to the core system architecture. Generally speaking, wherever the number of data sources is very large or there is a high demand for agility and real-time data, the concept of data fabric should be considered.
Where do you see the data fabric concept heading in, say, the next 2-3 years?
I think that the data fabric concept will move from a buzzword to a concrete implementation stage very quickly because it promises and provides great business value. The family of technologies that enable a data fabric will grow together more closely in order to provide a more cohesive vision.
In the long run, (beyond, say, three years) we will see the continued move away from a monolithic approach to a distributed management of physically decentralized data as it is reflected in the data fabric. There will be no coming back to the monolithic structures of the past.
What role does your company play in the data fabric market?
Data Virtuality enables a data fabric with our hybrid solution called Logical Data Warehouse that already addresses many aspects of a data fabric. Looking at our name Data Virtuality, people think of us as a data virtualization provider, but we have never seen ourselves as a pure provider of data virtualization. Instead, we provide the benefits of data virtualization and simultaneously overcome the limitations by combining data virtualization with ETL/ELT/data pipelines. Our Logical Data Warehouse conceptually covers 60 to 70 percent of the data fabric. Solutions from other providers help meet the needs of the remaining 30 to 40 percent of a data fabric, but bringing them together and orchestrating them for the overall coverage of a data fabric concept is still a vision in the making.
Editor's note: Nick Golovin, Ph.D., is CEO and the technical founder of Data Virtuality. Before founding the company he worked for more than 10 years on many large-scale data integration projects in an international environment. He received his Ph.D. from the University of Leipzig in data integration, data warehousing, and machine learning. You can reach the author via email or Twitter.