Defining backup is both simple and complex. Google says a backup is “the procedure for making extra copies of data in case the original is lost or damaged.” This definition describes the action (extra copies) and the purpose (in case the original is lost or damaged), and backup is a procedure, not a thing. So in order for something to be a backup, it must be an actual copy, made for the purpose of restoring the original when damaged, and there needs to be a procedure. Let’s take a look at these elements of the definition.
For something to be a backup, it must be an actual copy. A copy by definition is independent of the original. The two most common data protection techniques that are very helpful but do not meet the definition of a backup are RAID and snapshots. RAID and its close cousin erasure coding are not backup because neither of them protect against data corruption at the block, file or object level. They protect only against media failure. If a drive fails, RAID and erasure coding can automatically replace data using parity; however, if you delete a file or drop a table in a database, RAID will be of no help. This is why RAID is not a backup. File system and volume snapshots, most commonly found in NAS filers, are the opposite of RAID. If you delete or corrupt a file, they make it very easy to restore that file to its original condition; however, if you lose three drives in the RAID six volume that the snapshots reside on, those snapshots will be worthless. This is why snapshots that have not been replicated to another volume are not backup.
A backup is made for the purpose of restoring data when it is lost or damaged, which means that an archive is not a backup. Archives are made for multiple reasons, none of which are to restore data when it is lost or damaged. Archives store data in order to retrieve them for a different purpose. A backup of exchange is used to restore the exchange server to the way it looked yesterday. An archive of exchange is used to search for emails across a period of time, such as all emails from a given user or containing a certain phrase. It is possible for a single system to serve as both a backup and an archive, but it is rare.
In order for a backup to accomplish its purpose – restoring data – there needs to be some kind of procedure to make sure backup happens. For example, a bunch of DVDs containing copies of your laptop’s hard drive would meet two of the elements in the previous definition of backup, but the lack of an automated procedure puts into question their status as a backup. In addition, the lack of some kind of database containing what’s in those DVDs will make it very hard to use them to restore files.
One popular system also used for backup is replicated snapshots. Unlike snapshots alone, replicated snapshots do meet the first part of the definition – they are an actual copy. They also meet the second part of the definition, as they can be used to restore files or entire volumes if the original files or volumes were damaged or corrupted. The question is whether they match the procedure portion of the definition. The concern is the frequent lack of a management and reporting system to make sure that the copies happen at regular intervals and to make it easy to search the backups for the purpose of restore. If that system is in place then most people would consider replicated snapshots backups; if not, they are just copies.
What, then, is a backup in today’s world? Any system that schedules the copying of data from one system to another — and hopefully one location to another — and manages a catalog of that data to facilitate easy restore. This is a more limiting definition than the one that started this blog post, but it’s one that backup experts can agree on. The additional requirement of copying data from one location to another is more about disaster recovery, but one could argue that if a backup is stored in a box right next to the thing it’s backing up, it’s not really going to be available for a restore. Backup software products that backup data from a server or VM to other storage meet the definition. So do products that manage the scheduling, replication, and cataloging of snapshots from storage systems that support them.
There are a lot of things that meet the definition of backup today that would be unrecognizable to someone performing backups 20 years ago. Three things are still definitely not back up: RAID, un-replicated snapshots, and archives. They each have their purpose, but they are not backup. If one is using replicated snapshots as their backup system, they do need to make sure that there is some type of management and reporting system to make sure that this backup is doing its job. Learn the risks that your data faces and make sure you are protecting against all of them.
Sponsored by Commvault