Q&A: Data Mesh/Data Fabric Implementation Tips for Success
Two emerging architectures are designed to make data management easier. OvalEdge CEO Sharad Varshney explains what you need to know.
- By Upside Staff
- May 19, 2022
Today’s enterprises are struggling to manage large amounts of data spread across multiple internal and external systems. New data architecture models such as the data fabric and the data mesh are attractive to those seeking to manage distributed data. Should you consider one of these approaches? We asked OvalEdge CEO Sharad Varshney to explain the potential benefits and obstacles.
Upside: What is driving the adoption of data mesh and data fabric for data management?
Sharad Varshney: Both the volume and importance of data are increasing at a startling rate. As a result, there is a growing demand for agility. However, legacy systems are incapable of supporting this demand.
Today, on-premises solutions are becoming increasingly obsolete, and at the very least, companies must incorporate cloud architecture. Yet, even a modern, centralized, cloud-based architecture is insufficient in this rapidly evolving industry. To achieve the agility and scalability required, businesses must embrace decentralization.
A data fabric provides the essential connections supported by trusted procedures that businesses require to ensure highly scalable and easily accessible data management tools. With a data mesh architecture, organizations can use a data fabric to address latency issues by supporting a decentralized approach.
What is the difference between a data fabric and a data mesh?
The easiest way to differentiate between a data fabric and a data mesh is this: A data fabric architecture is centered on integrating and connecting the technologies that support data management, and a data mesh architecture focuses on the people and procedures behind data management.
Both approaches streamline data management by connecting various systems and technologies in a distributed landscape. However, you could say that data fabric architecture effectively underpins a data mesh by providing the flexibility, agility, and connectivity required if domain owners are to support seamless, decentralized data access.
Why is a data lake not a data mesh by default?
A data mesh is a relatively new concept; the first data lakes were conceived over a decade ago. The two concepts are fundamentally different. A data lake can be described as a centralized storage facility where data is organized and secured from various sources. Conversely, a data mesh connects data lakes and other data sources through distributed architecture.
The keyword here is distributed. Although a data lake is a centralized solution, data mesh architecture relies on decentralization. By enabling various domain owners to take charge of the data in their care, you can ensure that the experts most familiar with the data govern it most effectively.
How can data virtualization enable organizations to implement a simple, functional data mesh architecture?
Ultimately, a data mesh architecture can be created using data virtualization. By enabling users to create virtual data models that are unified and simple, data virtualization provides a quick way for domains to onboard various data products. It allows you to get rid of specific technical requirements, such as classification of structure, and instead work from a simplified, combined view.
Data virtualization makes data easier to understand and unify. Because the responsibility in a decentralized data system is on the individual domain owners, this simplified surface layer is incredibly helpful during the implementation stage.
How do data mesh and data fabric architectures enable agility and self-service data access?
I explained why data fabric and data mesh architectures are trending, but it’s important to explain how they support the requirements of modern data management.
Data access and self-service are quicker and simpler because data sources are intelligently connected. Beyond this, because data assets are managed independently by informed teams and access permissions are in place (along with other security and quality features), users can count on receiving safe, governed data quickly through self-service provisions.
When it comes to business agility, the core benefit of these approaches is the speed of data analytics. Trusted results are delivered in real time, enabling business users to adopt insights and deliver results that will enable their organization to beat the competition.
What could be the biggest traps in implementing these technologies?
Budget is one of the most significant stumbling blocks for organizations adopting data mesh and data fabric architectures -- more specifically, the distribution of funding. Applying radical changes to your data systems requires healthy spending, so you need to be prepared for this.
Then, of course, there is the distribution of that budget. When budgets aren’t handled correctly, domains lose out, and there may not be enough money to cover critical infrastructure and application tech.
Another major hurdle is leadership. Moving from a centralized to a decentralized model requires a complete culture change. This means C-suite executives must be in the know and capable of instigating change. Although these new technologies operate at a distributed level, they still require solid centralized governance, which is often difficult to establish.
What best practices can you suggest for a successful implementation?
When it comes to distributed data architecture, the critical thing you must implement is solid governance. Even though data is dispersed and managed by different teams with different approaches, a unified governance structure must ensure everybody follows the same fundamental rules and policies.
Another key factor is discoverability. You must put in place provisions that enable domain owners to publicize the location of data assets and any updates or changes to every relevant user. Beyond this, nodes should be registered in a centralized document.
Finally, use a self-service platform that doesn't require a high-level understanding of data architecture and design. It must be universally accessible and easy to install, scale, and implement. Self-service is a core concept of distributed data systems, so ensuring this element is correct will support your efforts.
What skills do business teams and data teams need to adopt these trends?
First and foremost, data teams and business users must be open to change. Data technologies evolve at lightning speeds, and it is critical that users are not only willing but motivated to use new methods to support a data-driven culture.
Beyond this, it’s important that executives implement data governance early, including a data literacy program. Users new to data management need a dedicated data literacy program to give them the tools they need to analyze and access data.
At the same time, trained data specialists must be literate in new technologies. Data literacy in its most basic form enables users to understand how to access data and why it matters; advanced data literacy programs for data professionals could include further training, research tasks, or even demonstrations.
[Editor's note: Sharad Varshney is the co-founder and chief executive officer at OvalEdge, creators of a data catalog and data governance tool. He founded OvalEdge to blend his unique experience in big data technology and process management into creating a much-needed data management product. He has a nuclear engineering degree from IIT, the premier institute of technology in India. You can reach the author via email or LinkedIn.]