Redefining the Data Warehouse Appliance
By Philip Russom, Senior Manager, TDWI Research
The data warehouse appliance (DWA) has undergone considerable evolution since first appearing early this decade. The evolution is driven by both vendor products and user best practices, and the resulting change has affected the capabilities of DWAs, the type of vendor that provides them, how to use DWAs, and how to define them. Before we list the diverse product types that a new definition of data warehouse appliance should encompass, let’s examine some of the trends and developments that are driving a redefinition.
Whole-Technology Stack Appliances
Netezza was the first vendor to offer a data warehouse appliance (introduced around 2002), so early DWA definitions were based upon Netezza products, which provide a whole-technology stack for data warehousing. That is, the Netezza Performance Server combines database and operating system software with server and storage hardware in a complete data warehouse platform. Before Netezza, Teradata, Sequent, and White Cross (now Kognitio) had for years offered similar single-vendor combinations of hardware and software purpose-built for data warehousing, though not necessarily in an appliance package or described as an appliance.
DATAllegro launched in mid-2005 with a whole-technology stack solution involving proprietary hardware, similar to Netezza. Although DATAllegro soon left its own proprietary hardware in favor of commodity hardware from other vendors, DATAllegro’s DWAs still require specific certified hardware configurations that constitute a whole-technology stack. Microsoft acquired DATAllegro in 2008 and announced it will fold DATAllegro’s MPP architecture into SQL Server, which runs on commodity hardware.
Kognitio also jettisoned its proprietary hardware in a process similar to that of DATAllegro, and Teradata just announced a new family of data warehouse packages that includes DWAs, some running on commodity hardware. Examples of commodity hardware include general-purpose servers built around Intel or AMD CPUs, and popular network and storage hardware from Cisco and EMC.
The movement from proprietary to commodity hardware has good reasons behind it. Commodity hardware is relatively inexpensive and thus helps keep down the price of DWAs. This is important since DWAs compete largely on their low cost. Also, the vendors providing commodity hardware have proved to be good partners for software-oriented appliance vendors.
Partial-Technology Stack Appliances
Starting in 2006, a new wave of vendors emerged with database management systems (DBMSs) purpose-built for data warehousing. These include DBMSs based on the relational model (Greenplum and Kognitio) and the columnar model (ParAccel and Vertica). Most of these DBMSs are licensed in multiple ways. For instance, ParAccel offers three types of licenses:
- Sold and licensed stand-alone, like any DBMS software
- Added onto another database product as a query accelerator
- Embedded in an appliance, usually with a certified or recommended hardware configuration
In the context of the embedded license model, most of the new DBMS vendors call their product a software appliance. This somewhat oxymoronic term refers to a software component (namely a DBMS) that may be embedded in a full data warehouse appliance. Hence, each of these vendors offers a partial stack appliance, often called a software appliance.
The software appliance has proved to be a good starting point for the new DBMS vendors. It allows them to focus on database software (not designing and building hardware), which is their point of greatest innovation and therefore their value proposition. The software appliance product enables the new, small DBMS vendors to partner with commodity hardware vendors and to benefit from these larger firms’ resources.
Miscellaneous Appliances for Data Warehousing or Business Intelligence
A couple of large BI vendors have rolled out specialized appliances in the last few years. For example, Cognos Now! (acquired from Celequest) is a server blade that includes a 64-bit in-memory database for real-time operational BI and performance management with IBM’s Cognos BI platform. It also includes a tool for designing dashboards and similar applications. SAP Netweaver Business Warehouse Accelerator is a similar blade product that accelerates the query performance of the data warehouse component (often called BW) that’s part of SAP Netweaver.
The Sartori Server from Dataupia is hard to categorize. It’s an appliance in the sense that it folds hardware and software components into a rack-mount server blade. It’s an assemblage of commodity hardware parts, such as CPUs, hard drives, and RAID controllers, plus Dataupia’s embedded DBMS. However, it doesn’t stand alone like most DWAs, and it’s usually added on as a query accelerator or capacity extender for other traditional technology stacks.
Hardware/Software Bundles that Resemble Appliances
As DWAs entered the marketplace, relational database vendors (IBM, Microsoft, Oracle, Sybase, and HP) stepped up their offerings of hardware and software bundles that assemble a whole technology stack for data warehousing. Examples of bundles from leading database vendors include HP Neoview and IBM InfoSphere Balanced Warehouse (most hardware and software components in these bundles are HP and IBM products, respectively). Launched in late 2008, the HP Oracle Data Machine and the HP Oracle Exadata Storage Server are both based on hardware from HP and software from Oracle. Sybase’s Analytic Appliance (announced in May 2008) combines pSeries hardware from IBM with Sybase IQ (the “mother” of all columnar databases, purpose-built for data warehousing).
Most of these bundles are not DWAs per se because their components are rarely purpose-built for data warehousing, yet they offer many of the benefits of a DWA. In particular, a preconfigured technology stack reduces system integration work, reduces time to use, and comes from a single vendor, which supports the whole stack. Furthermore, vendor size matters in that some user organizations avoid start-up vendors. For these users, the hardware/software bundles are significant, because they come from large, stable vendors and include familiar, mature DBMSs.
User Practices for Data Warehouse Appliances
Based on user interviews and vendor briefings, TDWI Research knows there are prominent “sweet spots”—application situations in which users turn to data warehouse appliances:
Most appliances support a multi-terabyte data mart. The mart enables an analytic application typically focused on high-volume analysis of customers, transactions, call-level details, utility grid capacity, etc.
Users begin with multiple terabytes, instead of building up to them. In the early days of DWAs, most users rolled out 1–3 TB in the first phase of system deployment and grew toward 10 TB. Today, 10 TB or more in the first phase is common, and DWAs can handle this.
DWAs are used for highly dynamic data analysis, and seldom for reporting. Typically, a handful of business analysts and similar users are on a discovery mission in which they think up ad hoc queries and alter them iteratively. DWAs are known for perky responses to such complex analytic queries against large data sets.
Users take full advantage of DWAs’ MPP shared-nothing architecture. After all, this is what enables DWAs to manage multi-terabyte databases and very complex queries against them.
DWAs complement and augment enterprise data warehouses (EDWs). Some appliances host an EDW, so that’s possible. But the vast majority of DWAs are deployed as an SOS—a “system on the side” that off-loads data management tasks and analytic workloads that are best kept out of the core EDW. Hence, the DWA has joined other SOS platforms in the extended EDW environment, such as operational data stores, data staging areas, data marts, cubes, and so on.
Some DWA users follow a load, analyze, and delete (LAD) method. For example, when a business problem or opportunity arises, business analysts extract terabytes of relevant operational data and load it into the DWA. They then analyze the information until they learn what they need to know. Before moving on to the next analytic project, they delete (or archive) the terabytes they’ve been working with and start over with a new multi-terabyte data set.
DWA-based analyses work with operational data in poor condition. Note that the LAD method doesn’t allow time for much data modeling, transformation, or cleansing. Luckily, DWAs compensate for less-than-ideal data structures and quality by supporting highly complex SQL. Instead of users transforming data into clean, multidimensional structures at the database level, complex SQL provides equivalent functionality (to a certain degree) at the query level. As a useful byproduct, this gives DWA-based analytic applications data-model independence, which is lost when analysts depend on remodeling data.
The Data Warehouse Appliance Redefined
Due to the trend among vendor offerings from whole-technology stacks to partial ones, a newly revised definition must encompass DWAs that comply with the original definition (from Netezza and DATAllegro), as well as the newer software appliances (from Greenplum, Kognitio, ParAccel, Vertica, and so on). A truly comprehensive definition will also include variations on the DWA theme (such as the appliances mentioned from Dataupia, Cognos, and SAP). Furthermore, hardware/software bundles assembled for data warehousing (though using mostly general purpose components) share many characteristics and benefits with DWAs, so these should be mentioned anytime DWAs come up. (See Table 1 for a summary of vendors and products.)
Vendor products aside, the user community continues to redefine how it uses data warehouse appliances. As discovery-driven analytics become more common and more mission critical, users are in more dire need for a data warehouse platform that can respond quickly (with little or no tuning) to ad hoc and/or complex queries against multi-terabyte data sets of less-than-ideal structure and quality. To get the analytic databases they need, users will probably continue the trend of the SOS—systems on the side that augment the analytic capabilities of an enterprise data warehouse environment.
- Note that the next generation of data warehouse appliances is about diversity. Adjust your definition of DWAs to include both whole technology stack and partial technology stack approaches. Don’t forget that many hardware/software bundles have characteristics and benefits similar to DWAs, as do the BI accelerators that are packaged as appliances.
- Expect the definition of the data warehouse appliance to evolve—again. DWAs are still new, and their possibilities are still being explored.
- Be open to alternative DBMSs for a DW platform, including open source and columnar databases. Otherwise, you exclude most data warehouse appliances and some bundles. Likewise, be open to Linux, which is the most common operating system for DWAs and similar bundles.
- Know your requirements, and select a DW platform that matches them. Don’t acquire a DWA based solely on its compelling low cost or perky query performance. Know the DWA sweet spots and watch for these while gathering data warehouse platform requirements.
- Don’t replace or overtax your EDW. It’s still the single version of the truth for most reporting and analysis. Help it play that role by off-loading taxing analytic applications to DWAs and other SOS platforms.
- Your evaluation list should include DWAs, software appliances, and similar bundles. After all, these are now established data warehouse platforms, along with more traditional platform components, such as relational databases and the usual hardware servers.
Philip Russom is the senior manager of TDWI Research at The Data Warehousing Institute (TDWI), where he oversees many of TDWI’s research-oriented publications, services, and events. He’s been an industry analyst researching BI issues at Forrester Research, Giga Information Group, and Hurwitz Group. You can reach him at firstname.lastname@example.org.
An earlier version of this article appeared in TDWI Flashpoint on June 5, 2008.
This article originally appeared in the issue of .