Q&A: Why Data Virtualization is More Relevant than Ever
Composite Software executive VP Robert Eve discusses the evolution of data virtualization and exactly how the technology can be used to address today's data management challenges.
- By Linda L. Briggs
- March 2, 2011
Data virtualization has advanced well beyond its roots to become an essential tool in the data integration toolbox. In this interview, Robert Eve, executive vice president of marketing for Composite Software, Inc., an independent provider of data virtualization software, talks about increasing interest throughout the industry in data virtualization, including its evolution, benefits, and challenges -- and just where companies are most likely to see the return on investment (ROI).
BI This Week: What do we mean by the term "data virtualization"?
Robert Eve: Data virtualization integrates data from multiple, disparate sources in an abstracted, logically consolidated manner for consumption by nearly any front-end business solution, including business intelligence and analytics. By accessing the data from original or already consolidated data warehouse sources, data virtualization avoids the need for additional physical consolidation and replicated storage, making it faster to build and lowering the cost to operate when compared to data-warehouse-only integration approaches.
As middleware technology, data virtualization (or data federation) has advanced well beyond its roots in enterprise information integration (EII) to become an essential tool in the data integration toolbox. In fact, enterprise adoption is now so far along that Claudia Imhoff of Intelligent Solutions and Colin White of BI Research recently co-authored a TDWI white paper on the subject (Ten Mistakes to Avoid When Using Data Federation Technology).
What is happening that makes data virtualization more relevant today and in the near future?
Information management is getting harder every year. Business users are far more demanding as competition drives the need for faster time-to-solution results, and as information-savvy business staff members deploy more "do-it-yourself" capabilities. Information overload, due to exponential data volume growth and omnipresent delivery on our desktops and phones, is something we all feel at a personal level. Finally, the IT environment keeps getting more complex as we layer new fit-for-purpose sources such as NoSQL, and new uses such as predictive analytics, on top of byzantine IT infrastructures. These are the drivers making data virtualization more relevant and necessary than ever before.
Don't existing approaches to data integration support these new challenges?
While it's true that traditional approaches such as enterprise data warehousing (EDW) and data quality continue to evolve as requirements change, there is also a growing recognition among today's best DW thinkers that the EDW is insufficient as the sole focal point for all data integration and data quality. More IT groups are deploying data virtualization to complement their EDW investments. That's because data virtualization delivers the flexibility and agility that the traditional approaches were never designed to do.
What are some of the ways that data virtualization addresses these new challenges?
One of data virtualization's best attributes is agility, which helps with the time-to-solution challenge. With agility, adding new data sources, extending existing views, enabling new data visualizations, and adding more activities are measured in hours or days, not weeks or months.
On the complexity front, data virtualization is not disruptive. By leveraging standards such as ODBC, JDBC, REST, and SOAP on the connection side, and SQL and XQuery on the programming side, it fits easily into complex environments. These standards also reduce the learning curve for new developers, so organizations get a jumpstart at the project level and more easily grow into enterprise-wide deployment over time.
Who is using virtualization now? Is it mostly large enterprises so far?
Larger enterprises have been the early adopters for a number of reasons. First, they have bigger data silos and more of them to integrate. Second, large enterprises tend to be early adopters. They were the first to deploy data warehouses, data warehouse appliances, NoSQL data stores, and more. They adopted data virtualization technology in many instances to help them extend and gain more value from these earlier investments. That said, data virtualization works equally well for smaller enterprises or at the project level in a large enterprise.
Where is the ROI usually seen most immediately with virtualization?
Only the most important projects get funded these days. Our experience shows that data virtualization's biggest ROI tends to be the acceleration of business benefits. For example, an enterprise justifies the deployment of a new customer experience management portal because it would help generate an additional one million dollars in new revenue every month. Using data virtualization to integrate the data, instead of a physical warehouse, could allow the project to be completed two months sooner, netting a two-million-dollar time-to-solution benefit over the warehouse alternative.
Additional immediate benefits include the staff hours saved during the build cycle plus the hardware costs saved from not having to stand up a new warehouse. Over the long run, data virtualization produces even more benefits by reducing complexity and thus significantly reducing or eliminating certain ongoing maintenance costs.
What are the challenges inherent in "virtualizing" a data warehouse solution or a DW appliance?
There are three essential challenges that must be addressed when using data virtualization to provide data from one or more data warehouses and/or appliances. This first is to provide access to these data sources. This is relatively easy, especially if the sources are relational. Simply connect to the source using a standard protocol such as ODBC, JDBC, or many others. If the source is MDX- or NoSQL-based, the data virtualization middleware needs to accommodate these data shapes as well.
The second challenge is optimizing queries into and across these sources. This challenge is especially intricate and complex, but its solution is the core of advanced data virtualization. After working for nearly a decade on developing and bringing to market the best technology for optimizing queries, Composite's expertise in this area is being sought by other manufacturers looking to incorporate data virtualization into their data products -- a recent example is Netezza, with whom Composite now has a partnership.
Designing and agreeing on a common data model or canonicals across multiple sources and domains is the third challenge. This actually has nothing to do with technology; rather, it's an issue of people and data governance. The more robust data virtualization platforms support source data introspection, relationship discovery, metadata integration, rapid prototyping, and more -- but at some point the organization has to make decisions.
How are data virtualization solutions evolving to keep pace with evolving dynamics?
The pace of data virtualization evolution is exciting as its capabilities expand over a number of dimensions. Source data access and delivery continues to grow wider to accommodate new data sources of varying shapes, sizes, and locations as well as new visualization approaches. Query optimizations are getting deeper, enabling data virtualization to tackle bigger and harder data integration problems. Deployments are broadening as enterprises build global data fabrics with complete location transparency.
More powerful and intuitive tools, along with additional automation, continue to make data virtualization easier to use. Better governance, something everyone is seeking, is leading to data virtualization platforms with an array of new discovery, modeling, lineage and monitoring capabilities as well as best practices to ensure better, more effective data governance.
What are some key considerations or critical success factors for companies looking at a data virtualization solution?
The most critical success factor is selecting the right implementation use case. Our collaboration with Claudia Imhoff, Mike Ferguson of Intelligent Business Strategies Limited, and several other noted BI and data integration analysts, resulted in a Data Integration Strategy Decision Tool that is available at http://www.compositesw.com/index.php/resources/brochures-datasheets/. It systematizes the decision-making process to enable enterprises to select the use case with the highest potential for success.
Learning from early adopters' experiences is another sound way to select use cases. TDWI readers can find resources featuring specific enterprise case studies at http://www.compositesw.com/index.php/resources/customer-case-studies/ as well as proven, effective data virtualization usage patterns at http://www.compositesw.com/index.php/resources/white-papers-reports/.
Selecting a use case for its potential for success can enable companies to gain momentum and ultimately self-fund expansion. Many early adopters have progressed to the point where they are deploying at an enterprise scale, and are reusing objects to gain an order of magnitude in additional benefit.
How is Composite Software positioned to meet these market needs?
For nearly a decade, Composite Software has focused on delivering high-end data virtualization solutions to the world's most demanding and complex organizations. We have over 300 staff-years of data virtualization product development, tens of thousands of hours of implementation experience, and millions of operating hours -- no one else comes close to this depth and breadth. These assets have defined us as the gold standard in the data virtualization market, and therefore the safest choice for organizations who view data virtualization as a key data strategy for successfully addressing their IT information system agility, volume, and complexity challenges.