The Real-Time BI Dilemma
Real-time is viral: once one business unit has it, others will clamor for real-time access, too. The question is -- does your organization really need it?
- By Stephen Swoyer
- April 9, 2008
For many organizations, the decision to move to real-time BI is fraught with challenges. The much-ballyhooed promises of real-time -- instantaneous access to timely and potentially game-changing business information -- come at a price. The operative question for many would-be real-timers, experts say, is how much do you want to it -- or how much do you want to spend to get it? As with many high-priced options, if you have to ask, you may not need it.
There's a further wrinkle, observers caution: real-time information isn't vetted. There are scenarios -- ranging from the merely conceivable to the frighteningly far-fetched -- in which the real-time data you're feeding to your users could be imperfect, incomplete, or simply anomalous.
Another way of putting it is that real-time data integration -- or, more precisely, the real-time consumption of, or action upon, data -- implies a much narrower time slice. It implies seeing a bevy of trees (in the real-time trickle of information) but not necessarily being able to see a forest. For this reason, some experts argue, organizations might not even want to go real-time.
"There is a value in having real-time data, there are some things you want to know, but for a lot of the most common [scenarios], it just isn't necessary," says Ken Hausman, product marketing manager for data integration with SAS Institute Inc.
"If you're doing a strategy based on inventory levels -- [for example], if you're making sneakers in 12 different colors and 15 different sizes and you need to order them from Taiwan and you have to worry about shipments over the ocean because it goes by boat, and you're making this decision for next year's line -- you're not really looking at inventory levels every five seconds. You're looking at the long-term buying habits of your customers."
It's All About Context
Don Tirsell, senior director of product marketing with data integration specialist Informatica Corp., agrees -- to a degree.
"Not everyone needs [real-time]. Not all [would-be adopters], when they sit down and they actually look at their requirements, actually need it," he concedes. Tirsell disagrees, however, about real-time's cost. "It isn't just a question of [real-time's] costing too much, because real-time [data integration] isn't nearly as expensive as many customers think. We can make [real-time] surprisingly affordable [for the customer]," he continues. "The real issue is, do they need that [real-time] access to the information?"
In other words, experts ask, do would-be real-timers really need to tinker with a data warehousing paradigm -- timely access to a wealth of operational information, most of it conveniently presented in an historical context -- that's worked so well for them?
From Hausman's perspective, "I think of data warehousing as being the source of information for making long-term strategic decisions, and I would feel very uncomfortable making strategic decisions based on information that's changing every second," he cautions.
"Fundamentally, most of the data integration vendors out there still focus on batch because that's what drives data warehouses. Everybody is moving more toward real-time capabilities. Some are further ahead than others, [but] the question is, what are customers using them for, and what's the percentage of what you're doing out there [for which they're applicable]? Is that 10 percent, is that 1 percent, is that a fraction of 1 percent?"
SAS and other players disagree about the cost of real-time: Hausman, for his part, says that in many cases, it's difficult to justify the investment. Tirsell disagrees, citing his company's lengthy experience in the real-time arena, dating back as far as 2001, when Informatica was touting real-time ETL -- then, admittedly, in its gestational phase -- as an alternative to batch-oriented data movement.
Hidden Real-Time Pratfalls: Availability and Insatiability
Increasingly, industry players seem to be warming to the idea that real-time doesn't have to be prohibitively expensive.
Consider Sami Akbay, vice-president of marketing and product management with data replication specialist GoldenGate Software Inc., who -- like Informatica's Tirsell -- also downplays the putative cost of real-time done right.
"The cost associated with real-time has just been blown out of proportion," he argues. "People have a perception that it's prohibitively expensive, but if you do a proper study of it, you actually get to comparable cost points, [especially] when you take into account the return on your [real-time] investment relative to what you were doing before [with batch-driven integration]."
The real issues, Akbay argues, are real-time availability and, to a degree, real-time insatiability: that is, once one group or business unit has access to real-time information, other user constituencies will inevitably demand the same. The upshot, he indicates, is that if the accelerated service-level requirements of real-time don't overwhelm an IT organization, user insatiability just might.
"You have to keep in mind that once you go to real-time, your service level requirements change significantly. Significantly. Think about it: when you're doing it the traditional way [e.g., batch-driven ETL or data replication], when your data's a day old or a couple of days old and it's not available for a period of a few hours, it isn't really that big of a deal," he points out, "but when you're doing real-time, when your users are accustomed to real-time, you have a huge problem [if data isn't available for a few hours]. Real-time drastically changes your service levels -- and you have to make sure that you can respond to that."
That's the service level tip. Another problem -- that of user insatiability -- has a viral quality to it: once one business unit gets bitten by the real-time bug, the pathogen soon spreads. "People don't just do real-time for one specific purpose. They might start out that way, maybe because they think it'll be too expensive to use for other [applications or services]. Basically, [real-time feeds] end up being consumed by these different users who aren't part of the original intended audience," he comments.
"Just because you might be building [real-time] for a specific set [of users], other users are going to clamor for it, too."
Back to Square One
Which brings us back to the beginning of our argument: namely, that it doesn't always make sense to roll real-time out to such users -- even if it is cost-effective to do so, and even if (infected with either enthusiasm or real-time envy) they're clamoring for it.
"It's not a simple call" when to opt for real-time, says Bob Eve, vice-president of marketing with enterprise information integration (EII) specialist Composite Software Inc. At the beginning of the real-time wave, Eve acknowledges, Composite was approached by a bevy of customers that hoped to use its EII technology to do real-time. That's a perfectly valid use-case, Eve indicates, but what many of these customers found out -- and what Composite learned in the process -- is that they didn't always need real-time. For all of its advantages, there were -- in many scenarios -- a number of attendant disadvantages, too. As a result, he says, Composite's developed a real-time assessment tool that helps customers determine when or if real-time is right for them.
"What used to be black and white now is sort of shades of grey. We've built a little tool to help [customers] make that decision. We've kept it high-level and sort of opinion-based, [but] it identifies about 13 factors [they] should consider. Some of them are business factors, some are factors focused on data sources, some are focused on the nature of the data consumer," Eve explains. "It [assesses those factors and] tries to help you figure out across all of those factors: is this [a case where] I want to do physical data consolidation and therefore require a [batch] approach, or is this [a case where] I want real-time and can use sort of a data federation approach?"
Prospective customers might be surprised by what they discover, Eve maintains. "You can get three or four of the factors [which] say, do one thing, but you can also get three or four others which say something else. It really helps clarify the thinking around this: it just isn't as easy a decision as many [customers] think: it isn't something like if real-time, then EII; if batch, then ETL."
The Data Quality Tip
One big issue is data quality (DQ), experts say. If you're feeding customers real-time data, you want to make sure that it's as accurate, timely, and clean as possible.
Eve notes, "Certainly data quality is an issue [in real-time integration efforts], and if you look at our partnership [i.e., embedding and reselling relationship] with Informatica, it can go both ways: they've got some really nice tools for data quality that complement our technology. Some really nice tools."
You'd think data quality would be a no-brainer, with half-a-decade or more of DQ advocates beating their respective information-quality drums, but that just isn't the case, according to many observers.
"We're still doing a lot of fire-fighting. We're called in to put out a lot of fires," says Daniel Teachey, director of media relations with SAS subsidiary DataFlux.
Things have improved slightly, Teachey allows: customers -- spurred by governance requirements -- are getting proactive (if you call five years or more of temporizing with regard to enterprise data quality initiatives being "proactive") and bringing in best-of-breed DQ technology.
Even so, he argues, DataFlux is still crunching through a lot of mucky data. "The biggest trend for us is that some of the bigger deals we've had are not in reaction to a customer service problem or a supply problem," he explains. "Now we're being pulled in on these data governance-types of initiatives where there's a corporate mandate to fix data problems."
At any rate, Teachey stresses, the data is still problematic. That's one reason why SAS frequently kicks real-time access problems over to DataFlux, both Hausman and Teachey say: real-time presupposes clean, neatly profiled data. That's what DataFlux and other best-of-breed vendors specialize in.
"The reason SAS kind of throws that over our way is it's just a different technology set. It's not for decision-support. It's to clean up things in the operational world," Teachey explains. As organizations get more serious about governance -- e.g., as banks become more disciplined about running the numbers on the loans which they approve -- real-time feeds become more important.
"Real-time plays a bigger hand for the governance stuff. If I'm creating policy information on an insurance thing, we can actually suck in some actuarial calculations within our data monitoring technology and use those," he says.
A Moot Issue?
To a degree, SAS' Hausman concludes, to-real-time-or-not-to-real-time is kind of a moot question: organizations are moving inevitably toward right-time -- i.e., the degree of latency that's acceptable to them -- and, as technology matures, right-time windows will continue to shrink.
Right now, he stresses, it's still important to debate the merits -- i.e., should we or shouldn't we? -- but, over time, that calculus will shift drastically in favor of real-time, or right-time, or whatever degree of sub-second latency an organization decides it can live with.
Then there's the operational BI wave, which is wholly transforming BI from its decision-support roots. As operational BI becomes more pervasive, real-time becomes more pervasive, too.
"What we're sort of evolving into is more of an operational BI [model] … where you're looking at the operational systems and doing reporting off of them," he comments. "So from a SAS perspective, we are certainly able to do … operational BI," Hausman concludes. "Historically, we stayed away from the operational side, but using more of the DataFlux technologies, we can provide more real-time, federated views. For example, we're able to offer real-time data quality input screens in SAP. That's where the market is heading."