LESSON - Appliances—Data Mart or Enterprise Data Warehouse?
By Stuart Frost, CEO, DATAllegro, Inc.
Appliances are becoming established in the data warehousing market, but some companies and analysts have positioned appliances as “just” suitable for data marts (DM). Is this true, or can they also be used for large-scale enterprise data warehouse (EDW) projects?
The answer is yes, they can—under certain circumstances. While few would claim that appliances are currently ready to handle complex EDW, appliances are finding an interesting niche as an integral part of many EDW infrastructures.
DM and EDW Differences
Definitions of DMs and EDWs vary, but the most common differences lie in the number of business processes supported by a given system. A DM typically supports only one business process or subject area, whereas an EDW supports several, and in some cases is a true enterprisewide system. In addition, DMs are often fed summarized information from the EDW in a hub-and-spoke architecture, although this varies across the industry.
A significant majority of Global 2000 companies have deployed data warehouses in the last 10 years, establishing the overall business value of analytics. However, many companies are now struggling to keep up with new demands on their data warehouse systems. Such challenges include:
- Significant data growth due to:
- New legislation (the Sarbanes-Oxley Act, EU data retention laws, etc.)
- Mergers and acquisitions
- The need to analyze growing volumes of point-of-sale or telecommunications transactions to remain competitive
- Business demands for reduced latency, which translates into faster query times
- Larger user bases
- Demand for ever more complex, ad hoc queries to address fraud detection and anti-money laundering
As a result, many previously successful EDW installations on platforms such as Teradata, DB2, and Oracle are becoming overwhelmed by the need to support hundreds of users with a broad mix of query types against tens of terabytes of data. Upgrade quotes for these platforms can easily be tens of millions of dollars—and even then they may not meet business needs!
Since appliances are relatively easy and cheap to maintain, any additional complexity... is limited in nature and overwhelmed by the huge benefits.
Using Appliances to Divide and Conquer the Problem
Since high-performance data warehouse appliances are now available at prices as low as $20,000 per terabyte, a number of EDW users are turning to this new technology as a potential solution. However, they are not relegating appliances to the role of mere data marts. Instead, they are using appliances as a low-cost front end to the EDW itself.
In a typical scenario, large-volume, fine-granularity transaction records are stored directly on the appliance. The appliance then handles tasks such as:
- Data cleansing
- Long-term storage of transaction details for compliance
- Ad hoc queries
- Applications such as fraud detection that require access to data at very fine granularity
- Exports to external analytics systems such as SAS
- Building large-scale aggregation or summary tables and exporting them to the EDW
By offloading these tasks from the EDW to the appliance, companies are greatly reducing the need for expensive EDW upgrades. In addition, the specialized nature and advanced technology of the appliance enables these processes to run significantly faster, often by two orders of magnitude.
Since appliances are relatively easy and cheap to maintain, any additional complexity introduced by this divide-and-conquer approach is limited in nature and overwhelmed by the huge benefits.
New data warehouse appliance technologies have the potential to transform the data warehousing market. By acting as a high-performance, high-capacity, and low-cost front end to an established EDW, they can add significant value to an already successful installation—while avoiding expensive upgrades.
If this all sounds too good to be true, many vendors offer free proofs of concept so you can check out their claims at minimal cost. What do you have to lose, apart from poor performance and high costs?
This article originally appeared in the issue of .