Aster Touts Deep Analytics, Recoverability
Aster Data must distinguish itself to compete in a teeming DW segment
- By Stephen Swoyer
- February 4, 2009
It's been a busy new year for data warehousing (DW) specialist Aster Data Systems. In January alone, the company secured an additional round of venture capital funding, notched a partnership with MicroStrategy Inc. (which both vendors touted at MicroStrategy World), and unveiled "Data Warehousing Goes Green", a new program that rewards customers with "credits" when they use existing hardware assets in their Aster nCluster deployments.
Aster has been on something of a roll since it announced – simultaneously with DW competitor Greenplum Inc. -- an in-database MapReduce feature for its DW platform. The move which brings a Google-like analytic capability to Aster's nCluster DW systems; it also snagged some enviable publicity for the company, even if rival Greenplum announced basically the same capability at about the same time.
The circumstances that attended Aster's would-be MapReduce coup highlight an important -- and inescapable -- trend: new vendors are popping up just about every month. Players from established giants such as Hewlett-Packard Co., IBM Corp., Microsoft Corp., Oracle Corp., and Teradata Corp. -- to DW specialty vendors such as appliance pioneer Netezza, Dataupia Inc., Kognitio, ParAccel Corp., Vertica, and others -- are all competing for customers in an increasingly fractious space.
How, then, does Aster propose to differentiate itself when it must occasionally share its laurels with competitors? Senior director of product marketing Sean Kung says in-database MapReduce is just one of his company's selling points. He touts Aster's "recovery-oriented" approach to massively parallel processing (MPP), which he says differs significantly from the fault recovery features of its competitors.
"What we mean by 'recovery-oriented' computing is that instead of focusing [largely] on preventing things from failing, you assume that at some point they're going to fail," Kung explains. "These are commodity components. Things are bound to fail, because the prices are inexpensive, and when you have a large number of components, the law of statistical averages tells you that things are bound to fail."
Aster handles recovery in several different ways. First, Kung says, in the event of a node failure, nCluster initiates an automatic failover. This usually occurs in a minimally disruptive manner. "The failover is seamless, so the queries in other workloads … that are actually hitting our nCluster system are not interrupted," he explains. "What's unique is that when we actually do the failover, not only is it seamless, but it's load-balanced. We have a concept called partitions -- think of them as buckets of data. When a server fails, the partitions actually get divvied up across the n-1 cluster of servers."
Second, Kung continues, node failures don't always stem from faulty hardware, so nCluster automatically runs diagnostics and heuristics on a faulty node once (if) it boots back up. The goal is to both determine if the problem is likely to recur and, ideally, to identify configuration changes that could prevent it from occurring again.
"We're also different in the way that we do recovery, which is essentially using 'delta-base' replication. If a node fails -- let's say that a node reboots; it takes a minute to reboot, it comes back up, and it's fine. During that one- to two-minute period, there could have been some changes -- either the DBA could have made some changes … or you could have loaded some additional data into the system. But when the node comes back up, we only load those changes."
Finally, Kung indicates, Aster's nCluster systems can maintain availability even in the event of a full node recovery. "If there really is a true hard failure … in that case, you do need to put in (provision) a new node. When you pop it into your nCluster, in that scenario it will take time to recover, because you are migrating and recovering many, many terabytes of data," he says. "Our approach is that we do require the transfer [of all of that data] -- because there's no way to avoid that -- but in our scenario, we maintain availability for the system. As these many terabytes of data are actually copying, we do it in the background, and we allow queries and changes and other types of workloads to continue uninterrupted."
In most cases, Kung says, Aster isn't displacing symmetrical competitors -- i.e., rival DW software or appliance specialists. The ugly truth, he says, is that many customers have tried to tap off-the-shelf RDBMSes for analytic workloads. That hasn't worked out so well, he argues. "Databases like SQL Server, DB2, Oracle, MySQL -- those are being used for OLTP as well as for decision support. All of the top-tier database companies have both a transactional as well as an analytic data warehouse offering, except that they offer it in a single database and they just play around with packaging. They want to be able to allow a single database technology to serve multiple masters," he explains. "In those cases, we're doing very well … [by] primarily displacing existing systems that don't scale well."
Over the coming months, Kung believes that customers -- shopping for value and insight in a brutal economic environment -- will place an even bigger premium on analytic horsepower. Thanks to in-database MapReduce and other features, he argues, Aster is well positioned to reap the benefit of this interest.
"That's one reason … we introduced SQL or in-database MapReduce -- so we can bring the value of deeper insights for ad hoc analytics," he comments. "It's no longer a case where large-scale just means a particular type of workload. Nowadays, it means being able to handle both reporting as well as things like advanced analytics, ad hoc types of queries, and other types of workloads," Kung concludes. "This type of consolidation of multiple types of workloads into a single system is definitely a trend that we're seeing and really critical to front-line and any sort of business-critical type of warehouse."
About the Author
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at
[email protected].