LESSON - Six Ways to Transform the Economics of Data Warehousing
By David Menninger, Vice President of Product Management andMarketing, Vertica Systems, Inc.
For more than a decade, IT organizations have been plagued by high data warehousing costs, with millions of dollars spent annually on specialized, high-end hardware and DBA personnel overhead for performance tuning. The root cause: data warehouse database management software that was designed 20 or 30 years ago to handle write-intensive OLTP workloads, not query-intensive analytic workloads.
Although state of the art for so many years, those OLTP DBMSs were always the wrong tool for the job of data warehousing. This has become more apparent—and more costly—as the amount of data that companies need to analyze and the number of people who need to analyze it have skyrocketed. Overtime, these costs and missed opportunities to serve the business upset the economics of the data warehousing and greatly diminished its return on investment (ROI).
Verizon, Mozilla, Comcast, JP Morgan Chase, and dozens of other companies have implemented a new generation of data warehouses that are more economical, make data centers “greener,” and most important, enable the business to make data-driven decisions at more levels and on more fronts so they can out-innovate and out-execute competitors.
How to Achieve Speed, Simplicity, and Dramatic Savings, Too
Use a DBMS built from the ground up to handle large-scale data analysis workloads for many concurrent users. The following innovations have enabled customers to manage terabytes of data faster, more reliably, and more economically:
- Blazing speed on commodity hardware. You don’t need costly “big iron” or specialized data warehouse hardware to get great performance. Columnar data storage provides answers to queries 50 to 200 times faster than traditional databases while running on “green” grids of inexpensive, off-the-shelf servers.
- Compression lowers storage costs. Most data warehouses are 5 to 20 times larger than the amount of data loaded into them, due to indexes and other auxiliary structures designed to improve performance. With columnar data structures, storage is often 4 to 15 times smaller than the amount of data loaded, due to aggressive compression. This dramatically lowers storage costs and further improves performance.
- No DBMS “tax.” Licensing should be based on the amount of data stored, not on the type or amount of hardware on which the database runs. Deploy on 1 server or 100, and unless the data volume rises above the license limit, you can change your data warehouse hardware without having to change your license.
- Free replication and high availability. When you are managing terabytes of data, licensing should allow you to replicate data without limit (or added fees) to ensure high performance and high availability. In addition, there should be no extra costs for development, testing, or staging copies of the database.
- Eliminate “rip and replace” upgrades. As data volumes and BI users rise, data warehouses often outgrow the hardware on which they were deployed, resulting inexpensive “rip and replace” upgrades. A shared-nothing MPP architecture enables a data warehouse to scale “out” by adding inexpensive servers to the cluster to handle the additional load.
- Lower maintenance costs. You should not need a legion of DBAs to tune performance. Configuration, database design, optimization, failover, and recovery should be automated, lowering DBA costs and speeding up delivery of solutions to the business.
For a free white paper on this topic, click here and choose the title “Optimizing DBMS Architectures for Next-Generation Data Warehousing.”