LESSON - Best Practices for Architecting Your Data Warehouse
By Ray Roccaforte,Vice President, Oracle Corporation
Data warehouses have to manage more and more information each day, while business users continue to expect consistently fast query response times. Getting the right balance of server, storage, networking, and intelligent software to deliver on these expectations can be a challenge. Of equal importance is finding a solution that does not compromise the scalability, reliability, security, and support for mixed workloads, analytics, and data mining that are crucial to the success of any data warehouse.
What to Consider
Ignoring the basics of properly architecting a data warehouse can result in an expensive failure. Here are some basics to consider when architecting your data warehouse.
System performance and scalability. Start by defining your user population’s requirements in terms of complexity and frequency of queries, both today and in the future. Don’t forget to factor in how your data volume is expected to grow, the frequency of data loads, and the mix of workloads, or query performance and system scalability issues will arise. To avoid these issues, you need a data warehouse that can easily scale out data storage, network bandwidth, and processing capacity.
Required analysis capabilities. Next, establish your analytic requirements. Most businesses need more than regular reports and ad hoc query analysis. For example, data mining engines analyze data to deliver forward-looking insights using sophisticated predictive techniques. Likewise, OLAP engines can deliver advanced analytics such as forecasting and what-if analysis very quickly. Architectural complexity is greatly reduced if these engines are hosted within the data warehouse database itself, eliminating the need for multiple repositories and engines.
User service-level agreements. In addition to acceptable performance criteria, users are increasingly dependent on data warehouses to make timely decisions in a secure manner; downtime or loss of access to information is unacceptable. It’s important to design a system based on proven high-availability technologies that can protect your data warehouse from planned and unplanned downtime. Plus, you need an infrastructure that fully protects sensitive information throughout the enterprise. Using high availability and data protection features in your data warehouse, rather than manually integrating third-party tools, can dramatically reduce implementation costs and the cost of meeting users’ service-level objectives.
Time to market. Finally, you need to map the skills and resources required to implement and manage your data warehouse today and in the future to what you already have. Where in-house resources may not be available, it’s important to build your data warehouse using industry-standard components that can be supported by resources and skills that are readily available. This helps reduce your time-to-market and ongoing management costs and protects your long-term data warehousing investment.
Once you’ve identified a list of solution providers that meet your requirements, you should talk to reference customers. Also, consider running benchmark tests, or proof of concepts, but be sure to set aside reasonable timelines in your project plans for these activities. And always use real source data from operational systems; using data subsets will not reflect your go-live deployment, and will put project success at risk.
What Oracle Offers
Oracle leads the database and data warehouse markets with innovative products, including Oracle Database 11g and the Oracle Database Machine. In addition to industry-leading performance and scalability, Oracle Database 11g offers integrated analytics on reliable, low-cost grids. The Oracle Database Machine is a complete data warehousing grid—built using Oracle Database 11g and industry-standard servers, storage, and networking components—that delivers extreme performance for large-scale data warehouses. It’s a fast, reliable, and secure system that can easily scale to meet the complex reporting and analytics needs of the most demanding organizations.