New BI DW Ecosystem
Forget tools, forget systems, forget architectures. Forget data itself. What's needed is a radically new information ecosystem.
- By Stephen Swoyer
- September 25, 2012
The data warehousing Holy Wars are over, and nobody won.
In fact, to the extent that data management (DM) practitioners are still waging war over Inmon, Kimball, and n-normal forms, they risk losing the peace.
"[C]hanges that are happening in the data [management] landscape in general ... [are] putting a lot more stress and importance on how data integration is handled in the enterprise," said industry veteran Shawn Rogers, vice president of research for business intelligence (BI) with Enterprise Management Associates.
Rogers made this point as part of a co-presentation with Bob Eve, vice president of marketing with Composite Software Inc., at last month's Pacific Northwest BI Summit.
He described the data warehousing Holy Wars of the past as a distraction, noting that most data management (DM) professionals have long since left them behind.
"The Holy Wars are over," said Rogers. "When I got into this industry, most of the conversations that took place were whether or not you were on the Inmon or Kimball side," he continued, adding that most warring factions have "kind of moved along."
DM professionals might've left the Holy Wars behind, but they -- and we -- are still stuck with the tools and the architectures produced by more than two decades of warfare. This data management status quo isn't cutting it, said Rogers, who argued that new and more powerful analytic technologies require new and more powerful ways of thinking about how information is managed and consumed.
Not new and more powerful DItools nor new-and-improved data warehouse architectures. Forget tools; forget systems; forget architectures: what's needed, Rogers insists, is a new and more flexible information ecosystem.
At TDWI's World Conference in Chicago earlier this year, Rogers outlined his concept of a "hybrid data ecosystem" in which information "lives" in distributed sites across an organization and workloads run in a "fit-for-profile" context -- i.e., on the platforms for which they're best suited. He returned to this idea at last month's BI Summit.
In the hybrid data ecosystem, the emphasis isn't on bringing data into a single repository -- i.e., the EDW -- but on moving it to fit-for-profile platforms, or -- in many cases -- on leaving it where it is. The hybrid data ecosystem is an information meshwork, abstracting information from operational systems, data marts, big data distributed frameworks, and cloud topologies, among other sources.
It also addresses the problem of dynamic or kinetic information, which is where the classic EDW vision unquestionably runs out of gas. It's been a long time since information was served up in nice, neat batch intervals. In the age of Web applications, information is always moving: it streams at sub-second ("real-time") rates; pulses in intervals from seconds to minutes ("near real-time"); arrives (at sometimes staggering volumes) every few minutes or (even) every few hours ("intraday"); or conforms to more traditional batch periods of several days or longer.
The point, Rogers maintained, is that the EDW vision which once sustained the industry -- e.g., one system, one source of record, one version of the truth -- is no longer as compelling as it once was. It hasn't necessarily been obviated, he was careful to say; it's simply been supplemented. In the process, however, it's also been displaced: the Kimball/Inmon Holy Wars posited an information management solar system, of which the data warehouse (in either its Kimball or Inmon variants) was the center.
In the hybrid data ecosystem, however, the EDW becomes simply one provider among many providers. Its importance might still be oversized relative to some providers -- the hybrid data ecosystem is as much an information meritocracy as anything else -- but it's no longer the center of mass around which everything else revolves. That's because an ecosystem doesn't have a center of mass.
It's an altogether different paradigm. "We understand that the [enterprise data warehouse] isn't going to go away. ... It will continue to play a critical role," Rogers said. "[A]t the same time operational [data] plays a role, data marts play a role, big data ... analytic platforms [are] playing a more interesting role."
Although Rogers stopped short of suggesting that the EDW might become extinct, that's one implication of his vision. "Oftentimes, the data that's in these [distributed] systems is there because that's the best place for that information to be," he said. "There's this sort of new ... idea around ... [putting] the right data and the right workload on the right platform and [letting] those platforms provide [BI and analytic services]."
Abstraction's the Thing
If you displace the EDW from the center of things, you have to replace it with something, right? Yes and no. Rogers uses the word "ecosystem" intentionally. It implies an interconnectedness -- a meshwork -- of things, the interactions of which comprise a systemic whole. Nevertheless, an information ecosystem still has to have a terra firma -- something in the context of which information can take root and flourish.
Part of Rogers' co-presentation with Eve involved making the case for a "unified access layer" as a kind of terra firma that can provide abstracted access to all of the resources in an information ecosystem. Eve's company, Composite, markets data virtualization technology that claims to do just that.
Eve discussed the case of one customer (a prominent pharmaceutical company) that uses his company's Composite Information Server to create "virtual data marts."
Composite works by building or creating "views" -- abstractions -- of underlying providers. A "virtual data mart," for example, comprises a "view" of select information residing on a meshwork of different providers. "There's no single data warehouse there in the middle, [but instead] a lot of single sources," said Eve, adding that -- in this case -- the pharmaceutical company modifies existing views to adapt them to new use cases, in effect rapidly rolling out new fit-for-profile virtual data sources or applications.
During the Q&A portion of the BI Summit, Eve and Rogers fielded a provocative question from one attendee, who asked if the use of technologies such as data virtualization doesn't simply amount to putting Band-Aids over "bad architecture."
Eve, to his credit, offered a candid response. "Every customer has bad architecture, [so] every time we implement, [data virtualization] becomes a Band-Aid," he said, adding: "It's really more that concept of the abstraction layer, or unified information access, that I think is a Band-Aid that everyone can use."
Industry luminary Claudia Imhoff, president of consultancy Intelligent Solutions Inc., sought to clarify Eve's point. "I'm not sure it's a Band-Aid over bad architecture," she said, "it's rather that we're trying to do things with that architecture that that architecture was never intended to do. I don't look at it as the architecture has failed, I think it's that the architecture has gone as far as it can go."
Future Perfect?
It might be forward thinking, but Rogers' concept of the hybrid data ecosystem isn't an outlier. Composite's Eve pointed to the work of Gartner analyst Mark Beyer, who champions a "logical data warehouse," that shares important similarities with Rogers' idea.
"[T]hink of [it as] that abstraction layer, [that] semantic layer over the top of everything that's consistent and shared," he said. "Below that, [it's] fit for profile, optimized [for workload]," he concluded. "You're going to have consolidated data, distributed data, [a] Hadoop distributed store ... [you're going to need] some kind of planning and control layer on top of that ... and a semantic layer over the top of that."
Industry veteran Mark Madsen, a principal with consultancy Third Nature Inc., says he favors the idea of "a platform that separates the front-end from the back-end," a scheme that -- in a sense -- is in harmony with the visions of both Rogers and Beyer.
"I think we need a label for the data environment as it shifts from being a system on the side to a core piece of infrastructure. In the non-BI world, we like to talk about a 'data platform.' The data warehouse presumes that you have sources, load data, and have a SQL-based interface on top. It's a layer cake of products," he explained, in a discussion following the BI Summit. "A data warehouse is a system-oriented view that tried to encompass everything in one unified model, not a platform-oriented view."
A platform- or layer-oriented concept, like that of the hybrid data ecosystem, recasts this view, says Madsen. "The idea of a data platform is that it provides a single veneer for multiple services to meet different needs, batch to real time and SQL to SOA," he concludes. "It separates what you want to do as an application from what the infrastructure needs to do to provision data."