Is Your Business Intelligence Environment Protected?
By Wayne W. Eckerson, Director of Research and Services, TDWI
Disaster on the Loose
Given the sorry state of our environment and geopolitical relations, I am forced to ask a simple question: When was the last time you reviewed and tested the disaster recovery plan for your data warehouse? Do you even have a disaster recovery plan for your data warehouse?
Global warming, by raising ocean temperatures, is spawning more intense hurricanes that pack the power of several nuclear bombs, as witnessed by the devastation wrought by Hurricane Katrina. Global warming also increases the volatility of weather, contributing to more tornadoes and violent thunderstorms that can damage infrastructure and knock out power. And rising temperatures are creating longer and more intense heat waves that strain power supplies, cause rolling blackouts and outages, and spark virtually uncontrollable forest fires that threaten residential and commercial property.
Moreover, scientists say it’s only a matter of time before a pandemic breaks loose with the potential to kill millions of people and force the rest of us to work in isolation or quarantine. And the rapidly expanding industrial economies of China and India make the supply and price of oil vulnerable to persistent geopolitical tensions in the Middle East, unstable governments in Africa, and strained relations elsewhere. A dramatic rise in the price of oil caused by a geopolitical meltdown in any oil-producing nation would quickly send our economy into a tailspin and potentially cause energy slowdowns or blackouts.
Had enough? While natural and geopolitical disasters are on the rise, they aren’t the biggest threats to your BI environment. According to research by Information Age, a leading UK magazine for executives, most IT executives believe the greatest threats to the continuity of their IT operations are internal system failure (65 percent) and viruses (45 percent). Natural disasters registered at 32 percent and power and communications outages at 33 percent.
Ten years ago, there was little need to create a disaster recovery plan for data warehouses and the reports and applications they support. At that time, the vast majority of data warehouses were loaded in batch on a monthly basis from a half-dozen or so source systems. Most loads were fairly small, and even the biggest data warehouses were less than a couple of hundred gigabytes in size. Not surprisingly, most data warehousing teams didn’t have a disaster recovery plan, let alone a backup strategy. The common sentiment back then was that if the data warehouse crashed, you could simply refresh the data warehouse in its entirety from source systems once everything came back online.
Today, most data warehouses have become mission-critical systems. Many now capture and update transactions on a real-time basis and support dozens of run-the-business applications. Business users have become so dependent on data warehousing information to make daily business decisions that they practically crucify data warehousing managers if the system goes offline even for a few hours. Moreover, as a decision-making engine, a data warehouse is critical to helping organizations respond in an optimal fashion when disaster strikes. Data warehouse reports can help executives figure out how to prioritize activities, allocate resources, and reassign staff to deal with emergencies. A decade ago, the data warehouse may have been the last system restored after a disaster; today, it should be the first system to come online in an emergency.
How Protected Are You?
Research shows that a majority of organizations are confident in the resiliency of their IT systems. Most have disaster recovery plans that safeguard the business from short- and long-term disruptions. Maybe the disaster recovery plan even includes the data warehouse, the servers it runs on, and the reports and applications it supports. Since many data warehouses now run within corporate data centers governed by corporate IT policies that include business continuity and disaster recovery planning, it is a good bet your organization has insured its data warehousing assets to some degree.
Unfortunately, most disaster recovery plans don’t go far enough to protect an organization against costly disruptions. Disaster recovery planning is insurance, and most companies insure only what they can afford—not what they need.
Has your organization prioritized the business processes and applications that are critical to its operations? If the data warehouse is a top priority, what about the ETL engines that populate the data warehouse and the BI servers that generate and distribute critical reports? A chain is only as strong as its weakest link, and a data warehouse is a complex environment comprising multiple systems, applications, and interdependencies with internal and external systems. The data warehousing environment can’t be fully restored until every one of its components is brought back online.
A decade ago, the data warehouse may have been the last system restored after a disaster; today, it should be the first system to come online in an emergency.
When was the last time you really tested the disaster recovery plan for your data warehouse? If you practiced recovering from a database failure, you only completed part of the test. You need to restore clients, servers, networks, storage, applications, and databases to fully simulate a recovery situation. And if you conducted your tests a year ago, there’s a good chance that your plans are out of date. Since a data warehouse is an adaptable system, it is constantly changing to answer new questions that business people ask. So, the queries, reports, metadata, ETL workflows, aggregates, and so on have probably changed since your previous test. Moreover, the questions that business people ask during an emergency may be very different from what they normally ask.
The key to resiliency is not just flexible, redundant systems; it’s people. During a disaster, there is a lot of chaos and confusion. Many key personnel may be absent or unable to work or access systems. Thus, you need redundancy not only in your systems, but also in your staff assignments. Your team members should all be schooled in what to do in a variety of emergency situations—and be ready to play multiple roles as needed.
Disaster recovery puts a premium on good- quality, up-to-date, end-to-end metadata, something that few organizations have successfully implemented. Metadata is critical for performing impact assessments—when something in a source system changes, you need to know how it will impact every other component in the system, down to metrics within end user reports. In an emergency situation, data warehousing teams can be seriously hamstrung in their ability to meet recovery time objectives (i.e. time to recover business functions), critical data points (the point in time from which data must be recovered), and recovery point objectives (time to recover data) without access to a dynamic, comprehensive metadata management system.
Of course, the data is the heart and soul of a data warehousing environment, and organizations must devise a good strategy for safeguarding data against power failures, network outages, floods, storms, or other disasters. Most organizations perform backups to low-cost tape, which is shipped and stored offsite. While it takes a long time to recover a data warehouse from tape, most of this data is historical and doesn’t have high value during an emergency. To protect more recent information, organizations should replicate or snapshot data as it moves through the ETL process and store it on disk in a disaster recovery system, which archives or deletes the data after an appropriate period of time, usually a few days or weeks.
Most data warehousing teams understand the need to manage the lifecycle of data warehousing information. Unfortunately, these teams often don’t anticipate disaster striking twice. Ideally, the online backup system should be maintained offsite so a data center problem doesn’t disrupt both the primary and backup systems. (This obviously is more costly and requires high-speed network connections.) They also don’t have a backup for the backup if the offsite system goes down. Most also don’t envision a disaster lasting more than a few days. Given that many businesses are still not fully functional in the wake of Hurricane Katrina, we need to extend the duration for how long we expect disasters to last. Finally, many offsite backup systems don’t protect companies from viruses, which propagate internally. An offsite system should have an internal gate that delays real-time propagation by several hours to safeguard against software attacks.
It’s not fun being the voice of gloom and doom, and no one wants to spend money to avert something that may never happen. But it seems to me that we are witnessing an inflection point in the number of crises, disasters, and geopolitical tensions caused by environmental degradation and political polarization. Of course I may just be hypersensitive to world and global events (or just tuning in for the first time in my life). Nonetheless, there is nothing like word of a possible impending crisis to impel us to dust off our disaster recovery plans. It’s better than waiting for a real-life disaster to test the effectiveness of our plans.
Wayne W. Eckerson is the director of research and services for TDWI. He has 17 years of industry experience and has covered data warehousing and business intelligence since 1995. Eckerson is the author of many in-depth reports, a columnist for several business and technology magazines, and a noted speaker and consultant. His book, "Performance Dashboards: Measuring, Monitoring, and Managing Your Business," was published by John Wiley & Sons in 2005. He can be reached at firstname.lastname@example.org.
Back to Table of Contents