LESSON - High Speed Backup and Recovery of Large-Scale Data Warehouses
By Jesse Fountain, Director of Product Management, DATAllegro, Inc.
Backing up a large data warehouse is a tedious process that can disrupt data warehouse productivity for hours, if not days. Usually the data warehouse must be taken offline for this process, or worse, is run in the background, making the backup process take even longer. If data volumes are large, it can also be extremely costly, making it unfeasible for many companies. Moreover, new legal requirements such as compliance and other legislation, as well as growing business requirements, are forcing companies to find better options for quickly, efficiently, and safely backing up their data warehouse data.
DATAllegro B25 is a cost-effective solution to back up large data warehouses. The DATAllegro B25 has a 25-terabyte capacity, and multiple B25 units can be coupled together so there is room for even the largest data volumes. The DATAllegro B25 can back up data from a DATAllegro data warehouse appliance as fast as one terabyte every 12 minutes (based on “P” Series appliance) ... and restore it just as fast.
New legal requirements such as compliance and other legislation, as well as growing business requirements, are forcing companies to find better options for quickly, efficiently, and safely backing up their data warehouse data.
A Better Way of Backing Up Data
Many companies are forced to back up their data warehouse table by table. This is due to the difficulty of finding storage space for backup files and the time required to run the backups. The DATAllegro B25 takes a “snapshot” of an entire data warehouse image and compresses it quickly and efficiently into a single location without impacting the space available for the data warehouse. Better yet, full backups can be completed within relatively small and available batch windows.
Adding to the speed and efficiency of the DATAllegro B25 is the “differential engine,” which comes into play for subsequent backups. Instead of backing up the entire appliance each time, the differential engine looks only for those tables that have been added or changed and adds them to the backup set.
The DATAllegro B25 can safely store backup files for as long as needed in a RAID 5 or 6 protected environment. In addition, backup files can be copied from the DATAllegro B25 to tape without impacting the performance of the data warehouse. Restoration of data can be at a table level, node level, or database level.
A Staging Area for Exporting and Importing Data
In addition to backing up data, the DATAllegro B25 can be used as a staging area for exporting and importing data to and from the data warehouse. Technically speaking, data from ETL tools such as Informatica can be “landed” onto the B25 and then, within the high-speed Infiniband network, data can be loaded or “upserted” rapidly into the data warehouse on the DATAllegro P or C series appliance. Finally, data can be exported from the DATAllegro appliance to external environments such as SAS as well as other large-scale data marts.
Repartitioning and Expansion
Growing your DATAllegro appliance environment is fast and easy with a DATAllegro B25. When growing data volumes result in the need to add another DATAllegro data warehouse appliance or additional nodes, the DATAllegro B25 can be used as the destination for a complete export of the data warehouse. Once the export is complete, the data is deleted from the existing appliance. Next, additional capacity is added to the appliance. Finally, the exported data is repartitioned and loaded across the newly expanded appliance.
Using DATAllegro B25 for expanding an appliance or repartitioning data provides an efficient and complete solution for maintaining and growing a large-scale data warehouse environment.