SAND Touts Information Lifecycle Management for Analytics
Company’s new offering incorporates ILM practices to bring ballooning data warehouse volumes under control
- By Stephen Swoyer
- March 30, 2005
Pity the very modern data warehouse administrator. Not only must she contend with rapidly expanding data volumes, all of which must be assimilated into her data warehouse environment without any resulting loss in performance, but she’s also got to deal with compliance measures which require that more data than ever be kept online and available to regulators.
What’s a data warehouse administrator to do?
Kenilworth, N.J.-based SAND Technology Corp. thinks it has just the ticket. The upstart vendor offers the SAND Searchable Archive, an analytics information lifecycle management (ILM) that brings the ILM practices perfected in the data storage space to data warehousing and analytics.
“What we really want to talk about is our move to bring the practice of information lifecycle management to data warehousing and the broader analytics space,” says Robert Thompson, vice-president of marketing with SAND. ”ILM is something that’s been around for years, but oddly enough, the folks on the data warehousing side haven’t really gravitated to it at all.”
ILM describes the process of moving aged or infrequently accessed information out of a data warehouse and into near-line or off-line storage, which (based as they are on commodity Serial ATA disk arrays, content management appliances, or bulk tape storage, and so on) are typically less expensive than front-line data warehouse storage hardware. Logical links remain, so as far as the data warehouse is concerned, nothing has changed.
In almost all cases, the size of a data warehouse can be dramatically compacted in this fashion. ILM does increase query time in some respects—if users are trying to run a query on data that’s been moved to near-line or offline storage, they obviously aren't going to enjoy the full speed of the online data warehouse. But, by the same token, such a data arrangement can increase performance on a front-line data warehouse.
That sounds intriguing, but how does SAND propose to differentiate its strategy from that of a competitive vendor—say, Princeton Softech, which markets a similar product based on similar principles?
Thompson says there’s very little overlap between his company’s products and those of Princeton Softech. If anything, he argues, SAND concentrates solely (or almost entirely) on ILM for analytics, whereas Princeton Softech typically focuses on ILM for data sourced from ERP systems.
“They are specialists in ERP applications, and the thing they do smartest is they understand how the data is moving in an ERP state,” he says. “We don’t know that stuff, and we won’t do that stuff. Occasionally people will come to me and say, ‘I want to talk with you about archiving my PeopleSoft application,’ but unless they’re talking about archiving what is [non-production] information, we don’t do that. We are a warehouse player, and that’s where we concentrate.”
Are there any situations in which customers might deploy SAND’s Searchable Archive products alongside Princeton Softech’s own data management ILM offerings? Certainly, Thompson says.
“We can work with [Princeton Softech], we can work with files from Teradata, from DB2, from SAS, from Oracle, but we will do the kind of integration we’re doing with SAP, which really means we’re interfacing with all of their management utilities,” he comments.
SAP? Hasn’t Thompson claimed that SAND doesn’t compete with the likes of a Princeton Softech because the latter is typically more focused on the ERP side of the fence? That’s correct, he reiterates—but SAND, in this case, is concentrating on SAP’s Business Information Warehouse (BW), which—for many SAP customers—has by default become an enterprise data warehouse.
Not surprisingly, Thompson sees a booming business in the SAP BW market alone. “[SAP BW has] cubes, they have all sorts of stuff that’s really tough to manage. They’ve got 9,000 BW sites worldwide, and if you think about that, if only 10 percent of them have a problem—if we can penetrate ten percent of them, that’s a big business,” he says.
In its marketing presentation, SAND highlights several successful implementations of its technology in the UK, almost all of which are multi-terabyte implementations. Does this mean that SAND’s Analytics Server and Searchable Archive are high-end-only solutions? Not necessarily, says Thompson: “Obviously, this makes sense where the data’s bigger. But it’s interesting. If you ask any organization, they’ll tell you their data is big and out of control—but some people think half a terabyte is big. So while we certainly are targeting our activities toward the bigger organizations, we are finding that people of a more modest size paying attention.”
SAND and other vendors typically highlight several trends they say contribute to, or otherwise portend, the inevitability of ILM for data warehousing. For example, consultancy MEGA Group projects that structured data will continue to grow at a 125 percent compound annual growth rate, which means that enterprise data warehouses, likewise, will effectively double every year. Needless to say, Thompson argues, data warehouse performance improvements simply can’t keep up with growth of this nature.
“As times have changed and the warehouse is being accessed by a lot more users than people ever anticipated, the standard ways of solving this [increased growth and usage], such as reducing storage costs 15 percent per quarter, increasing processing power, really don’t get people ahead of the curve,” he says. “Organizations spend a lot of time on their data retention policies, and it’s always interesting that the data retention policies aren’t driven by business needs but by what are perceived to be the technical limitations.”
As an example, he cites the case of one SAND prospect that retains its data for up to two years—not because this requirement addresses any business rule (in fact, company business leaders prefer a retention policy of five years), but because they can’t afford to retain, much less manage, data beyond that.
Also contributing to the inevitability of ILM for data warehouses, proponents say, are compliance requirements such as the Sarbanes-Oxley Act of 2002, which impose specific data retention requirements for all publicly traded companies. Instead of keeping this data online in relational databases or data warehouses, organizations could opt to move it to near-line storage—where it can be accessed by online users—or to offline storage resources, such as write once, read many (WORM) drives, which satisfy stringent data authenticity requirements. If this data requested by regulators, subpoenaed as part of a lawsuit, or required in the event of a disaster, it can be quickly restored.