LESSON - Investing in Data Warehouse Query Performance
By Eric Rogge, Vice President of Marketing, HyperRoll
There are many challenges with data warehouses, not the least of which is estimating the investment in software, hardware, and effort to optimize the performance of a data warehouse. It is important that data warehouse architects, managers, and other stakeholders enter into the data warehouse investment cycle with honest and realistic assessments of performance need. Of course, the best plans can go awry. Plus, there is always a political aspect to the investment, where commitments are made and expectations are set by vendors, by external consultants, by internal architects, and by IT managers. Any unmet expectation by the data warehouse or by these stakeholders can damage business relationships and careers.
With all of these factors in play, defining the level of investment needed for adequate data warehouse performance can be a slippery fish. To capture the fish, or in other words, to identify adequate investment, key stakeholders should itemize and plan for the different data warehouse capabilities that are impacted by its performance. These include:
- Static to dynamic report transition
Report development backlogs are often addressed by converting static report sets to smaller, more versatile prompted or ad hoc reporting capabilities to users. The engineering trade-off is that user query rates and complexity becomes less predictable and manageable.
- Query complexity
To compete in the market, businesses look to increasingly complex analytics to manage the subtle and complex aspects of competing for customer attention and engagement. These analytics drive increasingly complex and performance-demanding queries to the data warehouse.
Most queries to a data warehouse are aggregate queries, requiring extra computational work by the relational database management system (RDBMS). Supporting a few aggregates is easy, but adding dimensional permutations can quickly scale the number of aggregates to calculate to thousands or more.
- Schema complexity
As with query complexity, schemas are now more complex because they must describe more complex business operations. This is especially true as users demand new and different views of the data from the data warehouse.
- User scalability
Often business expects a nonlinear scaling of user communities. Early user counts may be in the low teens or hundreds, but full-scale deployments may be in the thousands.
Clearly, data warehouse managers need to forecast the rate and nature of change for these and other similar factors that impact data warehouse performance. The most common approach is to size the data warehouse system for near-term needs and then hope that traditional data warehouse tuning solutions will cover any performance gaps in the foreseeable future. The unfortunate truth is that this approach quite often falls short of meeting the need for data warehouse query performance. Combining the impact of a few of the above performance factors can require a tenfold or even a hundredfold improvement in query performance. With this level of query performance risk, data warehouse stakeholders have to take a progressive approach to assuring adequate performance.
That approach requires a fundamental understanding of all possible data warehouse hardware and software architecture approaches for assuring performance. Much as successful fishermen often have secret and unique ways of catching fish based upon deep knowledge of fish behavior and environment, data warehouse stakeholders should cultivate broad understanding of performance enhancement options and use innovative ways to meet their data warehouse performance demands. This means moving beyond the mainstream methods and seeking alternatives that are not in use by business competitors.
An evaluation framework for assuring data warehouse query performance should start with a broad search for query performance enhancement alternatives. This should be followed by a comparative assessment of the various alternatives based upon an organization’s performance impacting factors (like those described here) along with the other usual technology and vendor selection criteria. Organizations should evaluate their need to differentiate from their competitors via speed and ease of data access. As the need for differentiation grows in priority, so should the need for differentiating technology and methodologies. Most importantly, a structured business case should be created for any initiative to increase data warehouse query performance.
This article originally appeared in the issue of .
Eric Rogge is an experienced technology professional with 30+ years with enterprise, business intelligence and data acquisition software and hardware. His unique combination of R&D, marketing and consulting experience provides both vendor and customer clients with balanced, practical experience and insight.
Mr. Rogge’s experience includes the design and deployment of analytic applications and predictive models for energy, manufacturing, technology, banking, brokerage and retail industry segments. He has also designed data acquisition hardware and software for government and commercial use. He has more than 2 decades of experience managing the positioning, design, production and deployment of software and hardware. Eric holds both a Bachelor’s of Science degree in Electrical Engineering from U.C. Davis and an MBA in Finance from Santa Clara University.