Backup, Replication and Snapshot – When to Use Which?

“Backups.” It’s the universal term used to describe a much broader set of tools that make up the entire data protection process. There are varying degrees of data protection offered as features of RAID, backup, replication and snapshots. And each of those data protection tools help the IT professional recover from different types of data loss events. While most organizations have access to RAID, replication and snapshots, most do not have a formal plan in writing on when to use which tool and under what circumstance.

When to Use RAID?

Some type of protection from media failure is an absolute requirement. The problem is the earliest form of media protection was mirroring, which makes a real-time copy of data on a separate drive, known as RAID 1. Mirroring brings a high level of redundancy at a high cost, an effective doubling of capacity requirements. RAID 3 through RAID 6 as well its parity based derivatives alleviates this concern and reduces the media protection overhead by as much as 75%. My colleague, Joseph Ortiz, provides a detailed look at RAID and its various levels in his “What is RAID?” blog series.

When to Use Snapshots?

The next type of data protection that most organizations have at their disposal is snapshots. A snapshot is a virtual view of a volume or file system from a particular point in time. After a snapshot is taken, the snapshot system preserves the view of that point in time by preserving any blocks that change after that point.

This is a protection mechanism that is built into most modern storage systems. Snapshot technology takes advantage of the way the arrays organize data. Most arrays create a catalog that maps to the actual location of data that it stores. When a user or application requests data, it in reality is making a request of the catalog which then points the request to the actual data.

A snapshot makes a copy of the catalog or a section of the catalog and then sets all the data that is it mapped to read only. The storage system creates new entries in the catalog for new or changed data blocks but preserves the old catalog, which results in a point-in-time view of the snapshotted data. Modern storage systems can maintain hundreds of snapshots with minimal performance loss.

A snapshot is the opposite of RAID. It is totally dependent on the media being intact in order for it to work. As a result snapshots are excellent for quick recoveries from data corruption or accidental deletion, since the old catalog can be promoted to replace the current one.

When to Use Replication?

Most modern replication tools leverage snapshots. They use the same block tracking as snapshots. The difference is when a data segment changes the blocks representing that change are replicated to another storage system or another site, instead of, or in addition to, being tracked on the same system. As a result replication tools are able to survive not only a media failure but also a storage system failure.

The challenge with replication, of course, is the expense required to buy a second or even third storage array. IT professionals should use care to make sure only data and applications needing very fast recovery and minimal data loss be assigned to the replication process. Doing so reduces the capacity requirements of the second or third system.

When to Use Backup?

Backup should be applied to all data, including data being snapshotted and replicated. And since as we describe above, most data should not be replicated, it is the primary recovery method for most data sets. Backup of course involves copying data to some secondary storage device. The infrastructure that connects that secondary storage device and the type of secondary storage device dictate how fast backups and recoveries will occur. The maturity of the backup software largely determines how often backups occur, the more often backups occur the less data will be lost.

StorageSwiss Take

In reality most data centers will want to use all four techniques; RAID on all storage system, snapshots on most data, replication on critical data and backups on all data. The key is how to manage these processes, since they often come from different sources. Some backup software solutions are taking the lead by integrating snapshot and replication management, enabling organizations to set cascading policies that provide multi-tier protection strategy.

