Backups vs. Archives: Understanding the Difference
An information archive is a collection which has long-term usability. It can serve many different purposes: research, compliance with the law, reference, documentation, and the simple satisfaction of curiosity. A usable archive isn’t just a lot of files kept together; it needs organization and context.
An archive serves a different purpose from a backup. Backups exist for the usually short-term purpose of restoring files that are lost or damaged. A good backup is exhaustive and mirrors the original file structure. Both backups and archives require quality data management to satisfy their purposes.
An archive isn’t necessarily exhaustive. Some files have no long-term purpose and don’t need archiving; others contain information that should be kept out of the archive, for reasons of security and confidentiality.
Good organization makes a good archive. This may mean a different form of organization from its original environment. It will usually be indexed in some way, with a database or a set of index files.
An archive needs long-lasting storage media. This could be a disk drive or tape. Any storage medium has a finite life, so it’s necessary to migrate long-term archives to new media every five years or so. Less durable media, such as DVD-ROM, aren’t a good choice.
Policies for accepting, depositing, retaining, and accessing data are necessary.
The acceptance policy answers the questions: What goes into the archive? What kinds of files are qualified? What integrity checks need to be performed? The policy usually will accept only certain file formats and not others, in order to maintain uniformity and maximize long-term usability. It will require certain metadata with the submitted files, to permit a record of their origin, purpose, and content. This could be external metadata, supplied in addition to the files, or internal metadata such as an image file’s XMP record.
The deposit policy determines how to store the files. This includes how to organize and index them, and how to store their metadata. Many archives store an integrity check for each file, such as an MD5 signature, to later check for any alteration or damage. The policy might include classification among several levels of archiving, with some files more readily accessible, and others put offline into a “dark archive.”
The retention policy addresses how long to store data and when to purge it. The policy could be to retain everything forever, or to delete all files beyond a certain age, or to treat different classes of files differently. It could include moving older files from the accessible archive to the dark one when they meet certain conditions.
The access policy covers who can get information from the archive and how authorization and authentication work. An archive might be open to everyone, or only to authorized users. Different parts of the archive may be restricted to different sets of users. Public records might be available to everybody, while only trusted employees would have access to confidential business records.
For a small archive these policies can be simple: Use common sense about what to add, use an understandable file organization and naming convention, worry about retention when the volume fills up, and grant access on a case-by-case basis. For a large institution, like the Library of Congress, elaborate policies making every point clear are necessary.
A good archive involves a lot of work and is a valuable property, so it needs its own secure storage and backup. The backup copy should reside at a remote location, so that it will survive if something catastrophic happens at the data center.
Backup, fortunately, is a lot simpler. The main points are to do it regularly, make sure everything’s covered, and have at least one offsite backup. An archive is a serious undertaking that takes careful planning.
Please contact us to learn how we can help with your data management needs.