Introducing Active Data Archiving: 4 Goals Every Enterprise Must Know
Long ignored, data archiving in most organizations today needs to be rethought and upgraded so it achieves modern goal.
- By Philip Russom, Ph.D.
- July 22, 2014
There are many problems with data archiving in the average enterprise today. Many organizations don't do archiving in any form. Others mistakenly think that mere data backups can serve as archives, whereas tape is actually the final burial of data, from which it rarely returns. Equally off base, others think a data warehouse is an archive. Though it's true that data archiving processes exist today in some organizations, these are rarely formalized or policy driven, such that data is archived in an ad hoc fashion (typically per application or per department) without an enterprise standard or strategy.
Even when an organization makes an honest attempt at an enterprise data archive, the result is usually not trustworthy (because data is easily altered), not auditable (due to poor metadata and documentation), not compliant (due to inadequate usage monitoring or the inability to purge data at specified milestones), and not properly secured (lacking encryption, masking, and security standards). Furthermore, with most existing data archives, it's hard to get data in with integrity and out with speed because the primary platform is not online, active, and highly available.
Why don't more organizations invest in formal archiving processes and technical solutions? The reason is most likely the common belief that archives provide little or no return on investment (ROI) for an organization because users rarely (if ever) access the data of the archive. Without prominent and frequent usage, a respectable ROI is unlikely.
A data archive can achieve ROI when deployed on an online, active platform that serves multiple uses and multiple users. Organizations do indeed need to retain data; that's not in question, but archived data is not just insurance for compliance, audit, and legal contingencies. Those are important goals, but a data archive should also be treated as an enterprise asset that should be leveraged, typically via controlled access for analytics. Hence, a data archive can be more than a cost center; it can achieve ROI when it serves multiple uses (archiving, compliance, and analytics of deep historical data sets) and it manages data online for active access at any time by a wide range of users.
Organizations need to define a mandate for modern archiving based on the following goals:
Archived data must be leveraged. Two typical use cases include fast, documented auditing (for compliance) and data for analytic applications, data exploration, and information lookups.
Some data will come out of the archive to be used elsewhere. To enable a broad range of users, tools, and purposes, the archive should support both query and search mechanisms. Furthermore, the archive should serve as a source for other data platforms, especially those for business intelligence and analytics.
A growing constituency of users will have access to archived data. This is a sticky point in organizations that define data governance and compliance as the process of limiting data access. The catch is to balance access and control, typically through well-defined user types that are controlled via role-based user access and strong security features in the archival platform.
Accessing archived data will be timely. First, to be truly active, the archive must be online like a database -- not offline like magnetic tapes and optical disks or any media that demand a distracting and time-consuming data restore process. Second, data access mechanisms should perform at or near real time for the sake of user productivity.
For more information about new directions in data archiving, read the 2014 TDWI Checklist Report, Active Data Archiving for Big Data, Compliance, and Analytics, online here.