Welcome to TDWI FlashPoint. In this issue, Philip Russom discusses data warehouse appliances and how they are defined.
- FlashPoint Snapshot
BI Search and Text Analytics: New Additions to the BI Technology Stack
Redefining the Data Warehouse Appliance
- FlashPoint Rx
Don't Overlook CDI and MDM Development Complexity
FlashPoint Snapshots highlight key findings from TDWI's wide variety of research.
How would BI users benefit from data extracted from text sources? (Select five or fewer.)
1,241 responses from 370 respondents (3.3 responses per respondent).
Which facts and entities are the most important for extraction from text sources? (Select five or fewer.)
Based on 1,369 responses from 370 respondents (3.7 responses per respondent).
Source: BI Search and Text Analytics: New Additions to the BI Technology Stack (TDWI Best Practices Report, Q2, 2007). Click here to access the report.
Based on 370 Internet-based respondents. Rounding and multiple-choice questions are responsible for percent totals that do not equal exactly 100 percent.
Redefining the Data Warehouse Appliance
Philip Russom, TDWI Research
The data warehouse appliance (DWA) has undergone considerable evolution since first appearing early this decade. The evolution is driven by both vendor products and user best practices, and the resulting change has affected the capabilities of DWAs, the type of vendor that provides them, how to use them, and how to define them. Before we list the diverse product types that a new definition of data warehouse appliance should encompass, let's examine some of the trends and developments that are driving a redefinition.
Whole-technology stack appliances
Netezza was the first vendor to offer a data warehouse appliance (introduced around 2002), so early DWA definitions were based upon Netezza products, which provide a whole-technology stack for data warehousing. That is, the Netezza Performance Server combines database and operating system software with server and storage hardware in a complete data warehouse platform. Before Netezza, Teradata, Sequent, and White Cross (now called Kognitio) had for years offered similar single-vendor combinations of hardware and software purpose-built for data warehousing, though not necessarily in an appliance package or described as an appliance.
DATAllegro launched in mid-2005 with a whole-technology stack solution involving proprietary hardware, similar to Netezza. Although DATAllegro soon left its own proprietary hardware in favor of commodity hardware from other vendors, DATAllegro's DWAs still require specific certified hardware configurations, which constitute a whole-technology stack. Kognitio also jettisoned its proprietary hardware in a process similar to that of DATAllegro. Examples of commodity hardware include general purpose servers built around Intel or AMD CPUs, and popular network and storage hardware from Cisco and EMC.
The movement from proprietary to commodity hardware has good reasons behind it. Commodity hardware is relatively inexpensive and thus helps keep down the price of DWAs. This is important since DWAs compete largely on their low cost. Also, the vendors providing commodity hardware have proved to be good partners for software-oriented appliance vendors.
Partial-technology stack appliances
Starting in 2006, a new wave of vendors emerged with database management systems (DBMSs) purpose-built for data warehousing. These include DBMSs based on the relational model (Greenplum and Kognitio) and the columnar model (Calpont, ParAccel, Vertica). Most of these DBMSs are licensed in multiple ways. For instance, ParAccel offers three types of licenses:
- Sold and licensed standalone, like any DBMS software
- Added onto another database product as a query accelerator
- Embedded in an appliance, usually with a certified or recommended hardware configuration
In the context of the embedded license model, most of the new DBMS vendors call their product a software appliance. This somewhat oxymoronic term refers to a software component (namely a DBMS) that may (or may not) be embedded in a full data warehouse appliance. Hence, each of these vendors offers a partial stack appliance, often called a software appliance.
The software appliance has proved to be a good starting point for the new DBMS vendors. It allows them to focus on database software (not designing and building hardware), which is their point of greatest innovation and therefore their value proposition. The software appliance product enables the new, small DBMS vendors to partner with commodity hardware vendors and to benefit from these larger firms' resources.
Miscellaneous appliances for data warehousing or business intelligence
A couple of large BI vendors have rolled out specialized appliances in the last few years. For example, Cognos Now! (acquired from Celequest) is a server blade that includes a 64-bit in-memory database for real-time operational BI and performance management with Cognos' BI platform. It also includes a tool for designing dashboards and similar applications. SAP Business Intelligence Appliance is a similar blade product that accelerates the query performance of the data warehouse component (often called BW) that's part of SAP Netweaver BI.
The Sartori Server from Dataupia is hard to categorize. It's an appliance in the sense that it folds hardware and software components into a rack-mount server blade. It's an assemblage of commodity hardware parts, such as CPUs, hard drives, and RAID controllers, plus Dataupia's embedded DBMS. However, it's not self-standing like most DWAs, and it's usually added on as a query accelerator or capacity extender for other, traditional technology stacks.
Hardware/software bundles that resemble appliances
As DWAs entered the marketplace, relational database vendors (IBM, Microsoft, Oracle, and Sybase, plus HP) stepped up their offerings of hardware and software bundles that assemble a whole-technology stack for data warehousing. Examples of bundles from leading database vendors include IBM InfoSphere Balanced Warehouse (most hardware and software components in the bundle are IBM products) and Oracle Optimized Warehouse Reference Configurations (each bundle combines Oracle Database with server, storage, and network hardware from vendors such as Dell, EMC, HP, IBM, and Sun). Sybase's new Analytic Appliance (announced in May 2008) combines pSeries hardware from IBM with Sybase IQ (the "mother" of all columnar databases, purpose-built for data warehousing).
Most of these bundles are not DWAs per se, because their components are rarely purpose-built for data warehousing, yet they offer many of the benefits of a DWA. In particular, a pre-configured technology stack reduces system integration work, reduces time to use, and comes from a single vendor, which supports the whole stack. Furthermore, vendor size matters, in that some user organizations avoid start-up vendors. For these users, the hardware/software bundles are significant, because they come from large, stable vendors and include familiar, mature DBMSs.
The data warehouse appliance redefined
Due to the trend among vendor offerings from whole-technology stacks to partial ones, a newly revised definition must encompass DWAs that comply with the original definition (from Netezza and DATAllegro), as well as the newer software appliances (from Kognitio, ParAccel, Vertica, and so on). A truly comprehensive definition will also include variations on the DWA theme (such as the appliances mentioned from Dataupia, Cognos, and SAP). Furthermore, hardware/software bundles assembled for data warehousing (though using mostly general purpose components) share many characteristics and benefits with DWAs, so these should be mentioned anytime DWAs come up.
Note that the next generation of data warehouse appliances is about diversity. Adjust your definition of DWAs to include both whole-technology-stack and partial-technology-stack approaches. Don't forget that many hardware/software bundles have characteristics and benefits similar to DWAs, as do the BI accelerators that are packaged as appliances.
Expect the definition of the data warehouse appliance to evolve—again. DWAs are still new, and their possibilities are still being explored.
Be open to alternative DBMSs for a DW platform, including open source and columnar databases. Otherwise, you exclude most data warehouse appliances and some bundles. Likewise, be open to Linux, which is the most common operating system for DWAs and similar bundles.
Know your requirements, and select a DW platform that matches them. Don't decide to acquire and use a DWA based solely on its compelling low cost or perky query performance. Educate yourself about DWA sweet spots and watch for these while gathering data warehouse platform requirements.
Your evaluation list should include DWAs, software appliances, and similar bundles. After all, these are now established data warehouse platforms, along with more traditional platform components, such as relational databases and the usual hardware servers.
Register to replay Philip Russom's recent TDWI Webinar "Data Warehouse Appliances: An Update on the State of the Art."
Read Philip Russom's oft-quoted article "Defining the Data Warehouse Appliance" (based on a survey conducted in November 2005).
See Wikipedia's broad definition of data warehouse appliances.
is the senior manager of TDWI Research
at The Data Warehousing Institute (TDWI), where he oversees many of TDWI's research-oriented publications, services, and events. He's been an industry analyst researching BI issues at Forrester Research, Giga Information Group, and Hurwitz Group.
FlashPoint Rx prescribes a "Mistake to Avoid" for business intelligence and data warehousing professionals from TDWI's Ten Mistakes to Avoid series.
Ten Mistakes to Avoid When Planning Your CDI/MDM Project
Mistake 3. Overlooking CDI and MDM Development Complexity
In order to enable an MDM or CDI solution to serve other operational applications, you need to do some programming. To connect a CDI hub to other applications, the interface code—the code that submits and retrieves customer data from the de facto data store—must be modified.
There are two basic approaches to modifying interface code to support new CDI or MDM technology. One approach is to modify the operational application to send and retrieve data to and from the hub.
The second approach is specific to environments already leveraging a messaging server—for instance, enterprise application integration (EAI), enterprise service bus (ESB), or other application messaging technology. This approach involves modifying the interface code (once again, relating to an application’s submission and retrieval logic) to communicate with the CDI hub. This alternative is transparent to the operational application’s software.
It’s hard work, since it requires the IT organization to be intimate with the transaction processing logic of individual applications and the accompanying interface or messaging code. And the organization should be savvy enough to retain technical expertise to support the API set and service-oriented architecture (SOA) environment that the CDI or MDM product relies on. Implementing a master data hub is more complex than simply loading a file; it requires the skills to modify transaction processing logic to interface to the hub.
This excerpt was pulled from the Q3 2006, TDWI Ten Mistakes to Avoid
series, Ten Mistakes to Avoid When When Planning Your CDI/MDM Project,
by Jill Dyché and Evan Levy.