LESSON - What’s Keeping You From Data Warehouse Nirvana?
By DATAllegro, Inc.
With so many years of data warehouse (DW) history and accumulated expertise, you’d think that DW success would be easier. You’d think that DW implementers would be the honored heroes of the business by now, squarely stationed atop broad pedestals. However, according to our survey of DW practitioners and users, people still struggle with issues that hamper their DW implementations. These problems are surmountable, and it is likely that those surveyed have achieved pain-free status at one time or another, but the reward is temporary as the bar rises with each success. Let’s look at two of the most prevalent DW issues as we consider how new technology may assist.
In our recent survey, 560 people reported one or more types of DW pain. Not surprisingly, ad hoc query limitations are the most frequent type of DW pain (52 percent). It seems that ad hoc query is still the most sought-after, and elusive, feature of data warehousing. While many folks admit to having it in some form, ad hoc query is still problematic. Why is achieving ad hoc query so hard? Do you really need it? What can you do to make it easier?
Ad hoc query capability is often the functionality behind exceptional DW success or ROI (return on investment). We’ve all heard reports of extreme revenue gains or cost reductions inspired by a zealot with newly acquired access to detail data. It’s truly a worthy goal. Some practitioners have tried to add this capability to their existing DW amid all the other processing, while others delivered a separate “query sandbox” to power users. The former approach is akin to tuning your car while driving it—definitely risky. Achieving consistent performance for queries you’ve never seen in the context of limitless combinations of known queries is daunting. Plus, there’s the converse effect of the ad hoc query on the existing workload. Unfortunately, a “tuned” ad hoc query ceases to be ad hoc.
The latter approach to ad hoc query is simpler, but hard to justify when high-performance DW systems (necessary for ad hoc) cost millions of dollars. For many years, the effort and cost of ad hoc query made it a luxury and therefore a frequent concession by users. Fortunately, that concession is no longer necessary, as DW price/performance drops dramatically with data warehouse appliances that leverage commodity hardware and open source software. Happily, the appliance model also dictates that ongoing tuning and administration are minimal, so adding systems to the architecture is not as cumbersome as with traditional DW technologies.
Query performance is the next frequent DW pain (41 percent). Reliable query performance garners user acceptance and often drives physical database design efforts. Query performance success is achievable, but fleeting—especially if tuning is done at the query level—because changing conditions require ongoing scrutiny.
Query performance is inextricably linked to scalability (concurrency, complexity, capacity), but just scaling the technology may not achieve desired performance gains. This is why the concept of linear scalability continues to be important. DW teams who don’t test for linear scalability in their benchmarks may find lackluster performance as they get nearer production. Another contributor to poor performance is the complexity of integrating all of the various components, many of which were not built specifically with DW in mind. Fortunately, DW appliances help here too. Being built and tuned for the job, they make optimal DW integration the vendor’s responsibility.
There is not space to discuss the remaining “pains,” but by now you should sense a trend. While DW isn’t yet painless, new approaches show tremendous promise.
This article originally appeared in the issue of .