LESSON - Analytics and Reporting Without Limits
By Lisa Dreyer, Director of Product Marketing, Sybase, Inc.
Keeping up with the demands of today’s enterprise reporting, business intelligence, advanced analytics, and predictive modeling requirements—including Web-based applications—is pushing relational database technology and its users to their breaking points.
The sheer volume of data required to generate accurate trending information, facilitate insightful decision making, and comply with increasingly stringent government and industry regulations is choking the performance of current analytical systems. IT organizations find themselves with no choice but to restrict usage to prevent system failure. These restrictions typically dictate:
- Limits to the amount of data available for analytics
- Limits to the amount of concurrent users and queries
- Limits to the launch of new applications
- Limits to the complexity of analysis
- Limits to access windows
Some performance improvement can be gained through tuning and optimization techniques, technology upgrades, additional hardware, and hardware upgrades. But these methods are costly and come with their own limits. Queries can still take hours, days, or longer to run and return actionable information.
Column-Based Processing Eliminates Limits
A fundamentally different approach is required to deliver the speed, scalability, and cost effectiveness organizations need in today’s real-time marketplace. As data volumes increase and response time requirements decrease, relational database, row-by-row processing won’t get the job done. Instead, what’s called for is a column-based approach to analytical and reporting processes.
Unlike ordinary relational databases, which store data in tables by row, column-based engines—like Sybase IQ—store and access data in tables by column. While this would be inappropriate for a transactional environment, in which a transaction is effectively equivalent to a row, it is sensible and necessary in a query-processing environment, since queries are generally created by defining columns.
A major advantage of this column-based approach is that, in effect, the entire database is automatically indexed because selection criteria in a query are defined by column.
Performance Benefits and More
One of the main advantages of the column-based approach is the reduction in the amount of data that needs to be read. This dramatically speeds up input/output (I/O) cycles. By reading data on a columnar basis, only the data related to a specific query must be read. In contrast, working with a conventional database, each row must be read in its entirety, regardless of the relevance of the fields to the query. Multiply this I/O reduction by a few million rows per table, and the performance difference is considerable.
While a columnar approach is different from row-based processing, there are few differences from a management point of view. For example, with Sybase IQ, database access is via standard SQL (ANSI 99). Because of this, the learning curve is minimized in organizations familiar with conventional databases.
The limits inherent in using conventional databases to support enterprise reporting and analysis are eliminated with column-based architectures. The result is a system that supports lightning-fast query speeds with:
- No limits to amount of data available for analytics
- No limits to amount of concurrent users and queries
- No limits on launching new applications
- No limits to the complexity of queries
- No limits to access windows
Best Environment for a Column-Based Approach?
The benefits of the column-based approach are most significant in environments in which the query load is unpredictable and ad hoc queries are common. This is because a conventional database cannot be pre-tuned for unexpected queries. The column-based approach used by Sybase IQ, however, effectively provides self-tuning capabilities.
In addition, complex queries that involve multiple selection criteria across a variety of tables, and those involving large table scans, can be processed much more efficiently within a column-based environment.
Finally, for large data stores containing detailed transactions and sub-transactions, like clickstream data, a column-based engine like Sybase IQ does not require data to be pre-aggregated for analysis, making it easy for users to analyze transaction-level data.
This article originally appeared in the issue of .