Experts Reconsider the Data Warehouse
At a recent industry event, Claudia Imhoff made a startling confession: she's okay with data virtualization -- she even recommends its use!
- By Stephen Swoyer
- October 24, 2012
At a recent event in New York, NY, industry luminary Claudia Imhoff made a startling confession.
"One of the things I've talked about [in recent meetings with clients] is the virtual nature of business intelligence architectures," said Imhoff, who then took a deep, exaggerated breath: "I've had therapy for several years to be able to say that. One of the things ... I stress [in these meetings] is that if I'm building a business intelligence environment ... I would not create a physical one as my first option."
Imhoff, CEO of consultancy Intelligent Solutions Inc., made her disclosure during a panel discussion at Composite Software's Data Virtualization Day (DVD) that included two high-profile colleagues: Wayne Eckerson, a principal with BI Leader Consulting, and Rick van der Lans, a principal with R2O/Consultancy. All three industry thought leaders discussed data virtualization.
Imhoff's "confession" got a good laugh, but it wasn't just a throwaway line.
It likewise wasn't a no-holds-barred testimonial to data virtualization (DV). What it was, instead, was a common-sense endorsement of DV, for particular purposes, in specific contexts. It affirmed the utility of DV as an infrastructure tool that can be combined with other tools -- including other data integration (DI) technologies. That such an endorsement had been made by Claudia Imhoff, creator of the Corporate Information Factory (CIF) and one of the big thinkers in BI -- this was a Very Big Deal.
A decade ago, data management (DM) effectively anathematized the virtual data warehouse (VDW), and with good reason. The VDW, which was enabled by means of federation technology, promised to deliver the flexibility, convenience, and timeliness that had proven to be so frustratingly elusive in most enterprise data warehouse (EDW) projects. Its was an extremely tempting proposition, lesser so for DM pros than for exasperated business users, who -- then as now -- were unable to consume information when and how they wanted it.
The virtual data warehouse promised them deliverance at a cost -- chiefly, sub-par performance. The virtual data warehouse was slow. Its query performance was poor -- although caching could (and for common queries did) help to ameliorate this issue -- and its responsiveness was typically much worse than was that of a conventional DW.
Just as important, the VDW by itself didn't address the Achilles heel of the oft-maligned EDW -- namely, access to clean, consistent, cleansed data. If DM teams couldn't give users access to the data they wanted when they wanted it, it wasn't because they weren't trying: integration is hard, especially when it involves custom or proprietary data sources.
The VDW ultimately foundered, but not before the damage had been done. For a long time, DM pros treated the concept of the virtual data warehouse -- and, to some extent, the term "data federation" itself -- a lot like Lord Voldemort of the Harry Potter books: as that which shouldn't be named.
What's different now? For starters, explained Eckerson, data virtualization isn't data federation. True, federation technology enables -- underpins -- DV, but DV-as-a-vision likewise encompasses data profiling and data cleansing, too.
This is the kind of holistic approach that Composite, Denodo Technologies, IBM Corp., and Informatica Corp. prescribe as part of their own DV visions. This being said, tools from Composite and Denodo -- which (unlike IBM and Informatica) don't have established DI practices -- tend to integrate more of the capabilities traditionally associated with ETL and data quality tools. (These include transformation and normalization, along with the ability to profile and/or apply cleansing routines.)
The point, said Eckerson, is that DV is a viable option for many scenarios. It might not be the answer, but it's an answer -- in this case, to any of several long-standing problems.
"Virtualization allows you not only to federate data but to support data quality and data profiling," he said, referring to testimonials from several Composite customer representatives who'd spoken at the event. "It sounds like [DV is] also really useful for highlighting bad data, [or for] getting data into the hands of analysts more quickly so they can do analytics with it, [or for] helping users sort out what data definitions are" in cases where there's no master data practice.
Imhoff, too, described a common scenario for which DV is an ideal candidate: prototyping.
"My first option is to build [a BI layer] virtually because it's so much easier to prototype it, it's so much easier to tear it down and deploy it, [and] it's so much easier to make changes to it," she said. "It's been a pretty dramatic change to business intelligence toolboxes that we now have virtualization as one of the primary tools. I'm not saying it's going to replace ETL. Of course it won't, but I think it's a pretty good partner to ETL ... and it should be in everybody's toolbox."
Imhoff wasn't finished with her confessions, either. She explicitly broke faith with one of the most contentious claims of the EDW faithful. "It also took me awhile to admit that not everything belongs [or] is in a data warehouse. There is a need to have multiple sources for some of our analytics. How do we do that? We've heard story after story today of it takes too long to get it into the data warehouse," she said, pointing to the popularity of self-service or discovery BI tools for such use-cases.
"We may not be able to get [all of this data] into the data warehouse fast enough today, but that doesn't mean that you don't ultimately want to move it into the data warehouse," she continued, adding that data virtualization -- in the form of what Gartner Inc. analyst Mark Beyer calls the "logical data warehouse" -- could be helpful in this case, too.