Data Archiving: More than Just Making a Copy
Protecting your data and ensuring that no data is lost isn't as simple as just making an extra copy, even if storage costs have dropped significantly. We examine key considerations for any data archival/retrieval plan.
- By Mike Schiff
- September 17, 2013
One of the basic goals of data warehousing is to create a platform for comparing current and historical values. Although early data warehouse summarized historic values (e.g., monthly revenue by customer), many organizations now wish to include highly detailed historic data (e.g., each individual sales transaction) as well. Even as the cost of online storage continues to decline, most organizations can't afford to store all their data, especially detailed historical values, online all the time. They need to archive critical subsets for future reference or for corporate governance and compliance purposes as well as for potential use in future data mining exercises that might not be technically feasible today.
However data archiving is not just about data retention; perhaps more important, it is really about data retrieval. There are several technical issues which must be addressed for an organization to be able to retrieve today's archived data in the future.
Hardware: Will the devices that can read the archive still be available? Over the years many types of media have come and (in many cases) gone. For example, paper tape and punch card readers are not devices normally found in today's data centers. Although magnetic tape is still the workhorse for archiving data, what if the tapes were created using a low density, 7-track format common in early mainframe environments? If you look around your office, you may very well find a few 3 1/2-inch diskettes and maybe even 8-inch or 5 1/4-inch floppies. Diskette drives have all but disappeared from today's PCs, although (at least for the immediate future) they can still be separately purchased with USB interfaces. Good luck to those of you who need to retrieve data from a Zip drive!
Software: Do you have software that opens the data file? For example, many early word processing and presentation software programs are no longer available. Even if your hardware can physically read the archive media, do you have software that can open and process the files? If the software is no longer available, or the copy you saved is not supported by your current operating system, how will you retrieve your data? Furthermore, even current versions of industry-leading office software such as Microsoft Office may not be fully backward-compatible or may not be able to open files created by much older versions of the software (e.g., prior to Office 97).
Metadata: Even if you can open the archive file, do you know what the content represents? Consider even the simple case of a flat sequential file. If the field layouts, formats, code sets, and value lists are not also preserved, how will you know what the data represents?
Physical deterioration: Will the storage media be able to retain the data for appropriate time periods? Magnetic fields can decay over time and make data stored on tapes and magnetic disks unreadable; even solid state and optical storage have limited life spans. Heat, humidity, and exposure to light can accelerate deterioration. One possible solution might be to archive data in a third-party cloud-based storage system and/or cloud-based BI vendors thus making the vendor responsible for proper climate control and even data refresh procedures to help prevent physical deterioration.
The Bottom Line
Unfortunately, all of these issues become more complicated over time and need to be addressed in any archival plan. For example, metadata can be stored with the archived data and the archived data can periodically be copied to fresh media (or perhaps a determination made to finally scrap it) long before there is serious risk of physical deterioration. Furthermore, if the data is stored in a third-party cloud, user communities should work with IT and legal departments to ensure that the proper protections are in place so that the data is still available to the organization in the event a new vendor is chosen or the current cloud vendor goes out of business.
Data archiving may appear to be relatively simple, but the real concern is data retrieval. Any organization that archives its data must take into consideration how (or perhaps if) they will be able to retrieve and understand it in the future. After all, the ability to analyze historical data, for example customer lifetime purchase history, requires that the data be available and able to be processed when needed.