High-Performance Data Warehousing: Executive Summary
New research analyzes the new needs of high-performance data warehousing in an age of "big data." The report presents barriers, benefits, strategies, tips, and vendor tools for achieving the speed, scale, complexity, and concurrency that characterize high-performance data warehousing.
- By Philip Russom, Ph.D.
- October 1, 2012
Data used to be just data. Now there’s “big data,” real-time data, multi-structured data, analytic data, and machine data. Likewise, user communities have swollen into thousands of concurrent users, reports, dashboards, scorecards, and analyses. The rising popularity of advanced analytics has driven up the number of power users with their titanic ad hoc queries and analytic workloads. And there are still brave new worlds to explore, such as social media and sensor data.
The aggressive growth of data and attendant disciplines has piled additional stresses on the performance of systems for business intelligence (BI), data warehousing (DW), data integration (DI), and analytics. That stress, in turn, threatens new business practices that need these systems to handle bigger and faster workloads. Just think about modern analytic practices that depend on real-time data, namely operational BI, streaming analytics, just-in-time inventory, facility monitoring, price optimization, fraud detection, and mobile asset management. Many of the latest practices apply business analytics to leveraging big data, which is a performance double whammy of heavy analytic workloads and extreme scalability.
The good news for BI, DW, DI, and analytic practices is that solutions for high performance are available today. These solutions involve a mix of vendor tools or platforms and user designs or optimizations. For example, the vendor community has recently delivered new types of database management systems, analytic tools, platforms, and tool features that greatly assist performance. Users continue to develop their skills for high-performance architectures and designs, plus tactical tweaking and tuning. This report refers to this eclectic mix of vendor products and user practices as high-performance data warehousing (HiPer DW).
According to this report’s survey, two-thirds of users surveyed feel HiPer DW is an opportunity (not a problem) because of the business practices it supports. Yet, only a quarter have made major changes for the sake of performance; the majority of survey respondents say they can achieve most HiPer DW goals via minor tweaks and tunings. Respondents report that any business process or technology that is analytic, real-time, or data-driven benefits from HiPer DW, including advanced analytics, big data analytics, operational BI, and business performance management. The leading challenges to HiPer DW are cost, tool deficiencies, inadequate technical skills, and real-time operation.
Users with hands-on HiPer DW experience surveyed for this report say that HiPer DW tasks take up a quarter of their development time, on average. It’s important for BI and DW developers to keep this percentage down, or else it detracts from their highest priority: developing new BI and analytic applications. To that end, one-third of respondents are contemplating a replacement of their current DW platform to get a platform that performs better straight out of the box.
According to survey responses, the HiPer DW function poised for the greatest growth is in-database analytics, followed by Hadoop, MapReduce, in-memory databases, and private clouds. The many real-time functions poised for growth include trickle loads for DWs, streaming data, microbatches, and complex event processing.
This TDWI Best Practices Report helps users understand new business and technology requirements for high-performance data warehousing (HiPer DW), as well as the many options and solutions available, whether vendor-built or user-built. Of course, performance doesn’t result solely from the data warehouse platform, so this report also discusses analytics, BI, visualization, data integration, clouds, grids, appliances, data services, Hadoop, and other types of tools and platforms. The tips and strategies presented here can help user organizations prioritize their acquisition of vendor tools and their adoption of design best practices.
Cloudera, IBM, Oracle, ParAccel, SAP, SAS, Teradata, and Vertica sponsored the research for this report.
Philip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 600 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him by email (email@example.com), on Twitter (twitter.com/prussom), and on LinkedIn (linkedin.com/in/philiprussom).