RESEARCH & RESOURCES

June 5, 2008: TDWI FlashPoint - Redefining the Data Warehouse Appliance

Trends and developments driving redefinition of data warehouse appliance.

Welcome to TDWI FlashPoint. In this issue, Philip Russom discusses data warehouse appliances and how they are defined.

CONTENTS

  • FlashPoint Snapshot
    BI Search and Text Analytics: New Additions to the BI Technology Stack
  • Column
    Redefining the Data Warehouse Appliance
  • FlashPoint Rx
    Don't Overlook CDI and MDM Development Complexity

FlashPoint Snapshot

FlashPoint Snapshots highlight key findings from TDWI's wide variety of research.

 


How would BI users benefit from data extracted from text sources? (Select five or fewer.)

1,241 responses from 370 respondents (3.3 responses per respondent).


Which facts and entities are the most important for extraction from text sources? (Select five or fewer.)

Based on 1,369 responses from 370 respondents (3.7 responses per respondent).



Source: BI Search and Text Analytics: New Additions to the BI Technology Stack (TDWI Best Practices Report, Q2, 2007). Click here to access the report.

Based on 370 Internet-based respondents. Rounding and multiple-choice questions are responsible for percent totals that do not equal exactly 100 percent.

Top


Redefining the Data Warehouse Appliance

Philip Russom, TDWI Research

The data warehouse appliance (DWA) has undergone considerable evolution since first appearing early this decade. The evolution is driven by both vendor products and user best practices, and the resulting change has affected the capabilities of DWAs, the type of vendor that provides them, how to use them, and how to define them. Before we list the diverse product types that a new definition of data warehouse appliance should encompass, let's examine some of the trends and developments that are driving a redefinition.

Whole-technology stack appliances

Netezza was the first vendor to offer a data warehouse appliance (introduced around 2002), so early DWA definitions were based upon Netezza products, which provide a whole-technology stack for data warehousing. That is, the Netezza Performance Server combines database and operating system software with server and storage hardware in a complete data warehouse platform. Before Netezza, Teradata, Sequent, and White Cross (now called Kognitio) had for years offered similar single-vendor combinations of hardware and software purpose-built for data warehousing, though not necessarily in an appliance package or described as an appliance.

DATAllegro launched in mid-2005 with a whole-technology stack solution involving proprietary hardware, similar to Netezza. Although DATAllegro soon left its own proprietary hardware in favor of commodity hardware from other vendors, DATAllegro's DWAs still require specific certified hardware configurations, which constitute a whole-technology stack. Kognitio also jettisoned its proprietary hardware in a process similar to that of DATAllegro. Examples of commodity hardware include general purpose servers built around Intel or AMD CPUs, and popular network and storage hardware from Cisco and EMC.

The movement from proprietary to commodity hardware has good reasons behind it. Commodity hardware is relatively inexpensive and thus helps keep down the price of DWAs. This is important since DWAs compete largely on their low cost. Also, the vendors providing commodity hardware have proved to be good partners for software-oriented appliance vendors.

Partial-technology stack appliances

Starting in 2006, a new wave of vendors emerged with database management systems (DBMSs) purpose-built for data warehousing. These include DBMSs based on the relational model (Greenplum and Kognitio) and the columnar model (Calpont, ParAccel, Vertica). Most of these DBMSs are licensed in multiple ways. For instance, ParAccel offers three types of licenses:

  • Sold and licensed standalone, like any DBMS software
  • Added onto another database product as a query accelerator
  • Embedded in an appliance, usually with a certified or recommended hardware configuration

In the context of the embedded license model, most of the new DBMS vendors call their product a software appliance. This somewhat oxymoronic term refers to a software component (namely a DBMS) that may (or may not) be embedded in a full data warehouse appliance. Hence, each of these vendors offers a partial stack appliance, often called a software appliance.

The software appliance has proved to be a good starting point for the new DBMS vendors. It allows them to focus on database software (not designing and building hardware), which is their point of greatest innovation and therefore their value proposition. The software appliance product enables the new, small DBMS vendors to partner with commodity hardware vendors and to benefit from these larger firms' resources.

Miscellaneous appliances for data warehousing or business intelligence

A couple of large BI vendors have rolled out specialized appliances in the last few years. For example, Cognos Now! (acquired from Celequest) is a server blade that includes a 64-bit in-memory database for real-time operational BI and performance management with Cognos' BI platform. It also includes a tool for designing dashboards and similar applications. SAP Business Intelligence Appliance is a similar blade product that accelerates the query performance of the data warehouse component (often called BW) that's part of SAP Netweaver BI.

The Sartori Server from Dataupia is hard to categorize. It's an appliance in the sense that it folds hardware and software components into a rack-mount server blade. It's an assemblage of commodity hardware parts, such as CPUs, hard drives, and RAID controllers, plus Dataupia's embedded DBMS. However, it's not self-standing like most DWAs, and it's usually added on as a query accelerator or capacity extender for other, traditional technology stacks.

Hardware/software bundles that resemble appliances

As DWAs entered the marketplace, relational database vendors (IBM, Microsoft, Oracle, and Sybase, plus HP) stepped up their offerings of hardware and software bundles that assemble a whole-technology stack for data warehousing. Examples of bundles from leading database vendors include IBM InfoSphere Balanced Warehouse (most hardware and software components in the bundle are IBM products) and Oracle Optimized Warehouse Reference Configurations (each bundle combines Oracle Database with server, storage, and network hardware from vendors such as Dell, EMC, HP, IBM, and Sun). Sybase's new Analytic Appliance (announced in May 2008) combines pSeries hardware from IBM with Sybase IQ (the "mother" of all columnar databases, purpose-built for data warehousing).

Most of these bundles are not DWAs per se, because their components are rarely purpose-built for data warehousing, yet they offer many of the benefits of a DWA. In particular, a pre-configured technology stack reduces system integration work, reduces time to use, and comes from a single vendor, which supports the whole stack. Furthermore, vendor size matters, in that some user organizations avoid start-up vendors. For these users, the hardware/software bundles are significant, because they come from large, stable vendors and include familiar, mature DBMSs.

The data warehouse appliance redefined

Due to the trend among vendor offerings from whole-technology stacks to partial ones, a newly revised definition must encompass DWAs that comply with the original definition (from Netezza and DATAllegro), as well as the newer software appliances (from Kognitio, ParAccel, Vertica, and so on). A truly comprehensive definition will also include variations on the DWA theme (such as the appliances mentioned from Dataupia, Cognos, and SAP). Furthermore, hardware/software bundles assembled for data warehousing (though using mostly general purpose components) share many characteristics and benefits with DWAs, so these should be mentioned anytime DWAs come up.

Recommendations

Note that the next generation of data warehouse appliances is about diversity. Adjust your definition of DWAs to include both whole-technology-stack and partial-technology-stack approaches. Don't forget that many hardware/software bundles have characteristics and benefits similar to DWAs, as do the BI accelerators that are packaged as appliances.

Expect the definition of the data warehouse appliance to evolve—again. DWAs are still new, and their possibilities are still being explored.

Be open to alternative DBMSs for a DW platform, including open source and columnar databases. Otherwise, you exclude most data warehouse appliances and some bundles. Likewise, be open to Linux, which is the most common operating system for DWAs and similar bundles.

Know your requirements, and select a DW platform that matches them. Don't decide to acquire and use a DWA based solely on its compelling low cost or perky query performance. Educate yourself about DWA sweet spots and watch for these while gathering data warehouse platform requirements.

Your evaluation list should include DWAs, software appliances, and similar bundles. After all, these are now established data warehouse platforms, along with more traditional platform components, such as relational databases and the usual hardware servers.

Bibliography

Register to replay Philip Russom's recent TDWI Webinar "Data Warehouse Appliances: An Update on the State of the Art."

Read Philip Russom's oft-quoted article "Defining the Data Warehouse Appliance" (based on a survey conducted in November 2005).

See Wikipedia's broad definition of data warehouse appliances.



Philip Russom is the senior manager of TDWI Research at The Data Warehousing Institute (TDWI), where he oversees many of TDWI's research-oriented publications, services, and events. He's been an industry analyst researching BI issues at Forrester Research, Giga Information Group, and Hurwitz Group.

Top




FlashPoint Rx

FlashPoint Rx prescribes a "Mistake to Avoid" for business intelligence and data warehousing professionals from TDWI's Ten Mistakes to Avoid series.


Ten Mistakes to Avoid When Planning Your CDI/MDM Project

Mistake 3. Overlooking CDI and MDM Development Complexity

In order to enable an MDM or CDI solution to serve other operational applications, you need to do some programming. To connect a CDI hub to other applications, the interface code—the code that submits and retrieves customer data from the de facto data store—must be modified.

There are two basic approaches to modifying interface code to support new CDI or MDM technology. One approach is to modify the operational application to send and retrieve data to and from the hub.

The second approach is specific to environments already leveraging a messaging server—for instance, enterprise application integration (EAI), enterprise service bus (ESB), or other application messaging technology. This approach involves modifying the interface code (once again, relating to an application’s submission and retrieval logic) to communicate with the CDI hub. This alternative is transparent to the operational application’s software.

It’s hard work, since it requires the IT organization to be intimate with the transaction processing logic of individual applications and the accompanying interface or messaging code. And the organization should be savvy enough to retain technical expertise to support the API set and service-oriented architecture (SOA) environment that the CDI or MDM product relies on. Implementing a master data hub is more complex than simply loading a file; it requires the skills to modify transaction processing logic to interface to the hub.


This excerpt was pulled from the Q3 2006, TDWI Ten Mistakes to Avoid series, Ten Mistakes to Avoid When When Planning Your CDI/MDM Project, by Jill Dyché and Evan Levy.

Top

About the Authors

Philip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 550 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him by email (prussom@tdwi.org), on Twitter (twitter.com/prussom), and on LinkedIn (linkedin.com/in/philiprussom).


Jill Dyché is an acknowledged speaker, author, and blogger on the topic of aligning IT with business solutions. As the vice president of SAS Best Practices, she speaks, writes, and blogs about the business value of analytics and information.

Prior to being acquired by SAS in 2011, Jill was a partner and cofounder of Baseline Consulting, where she combined the roles of best practices expert, industry gadfly, key client adviser, and all-around thought leader. At both firms, she has led client strategies and market analysis in the areas of data governance, business intelligence, master data management (MDM), CRM, and big data.

Jill’s first book, e-Data (Addison Wesley), has been published in eight languages. Her book The CRM Handbook (Addison Wesley) is the bestseller on the topic. With Evan Levy, Customer Data Integration (John Wiley and Sons) was the first book on the topic of MDM and discussed managing data as a strategic asset. Jill has contributed to a range of other books and her work has been featured in leading publications including Computerworld, CIO Magazine, the Wall Street Journal, the Chicago Tribune, the Harvard Business Review blog, Forbes.com, and Newsweek.com. Her latest book, The New IT: How Technology Leaders Enable Business Strategy in the Digital Age, profiles executives from companies including Comerica, Brooks Brothers, Mylan Pharmaceuticals, Canadian Tire, Union Bank, Mandalay Resort Group, Men’s Wearhouse, and Toyota Financial Services, highlighting the roles they played in transforming IT and driving strategy.


Evan Levy, CBIP, is an acknowledged speaker, writer, and consultant in the areas of enterprise data strategy, data management, and systems integration. Business is experiencing exponential growth in data volumes, sources, and systems; in his current role, Evan advises clients on strategies to address business challenges using their existing data and technology assets coupled with new and creative methods and practices. With more than 25 years of experience consulting with clients, Evan leads classes and workshops offering practical, real-world experience to address challenges in a manner that utilizes existing skills coupled with new methods to ensure IT and business success.

Get to Know Evan Levy

TDWI Las Vegas 2013 World Conference Monday Keynote: The Data Supply Chain: A Different Approach to Managing Your Company's Data


TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.