There are two very different ways to create snapshots: copy-on-write and redirect-on-write. If IT is considering using the snapshot functionality of their storage system, it is essential to understand which type of snapshot it creates and the pros and cons of using either method.
Rather than the more common term volume, this column will use the term protected entity to refer to the entity being protected by a given snapshot. While it is true that the protected entity is typically a RAID volume, it is also true that some object storage systems do not use RAID. Their snapshots may be designed to protect other entities, including containers, a NAS share, etc. In this case, the protected entity may reside on a number of disk drives, but it does not reside on a volume in the RAID or LUN sense.
What all snapshot types have in common is that they are virtual copies not physical copies. If something happens to the protected entity, then the snapshot will be useless. For example, if there is a triple disk failure on a RAID 6 volume, snapshots will not help. An object storage system should also protect against a certain number of simultaneous failures. But if it exceeds that number, snapshots will not help. A snapshot has two primary purposes: easy recovery of deleted or corrupted files, and a source for replication or backup. In order for the snapshot to protect against media failure, you must replicate or back it up to some other device. In other words, you must make a physical copy.
With a snapshot, nothing significant happens on the collection of hard drives where the protected entity resides. The storage system merely takes note that the way the protected entity looks at that moment means it needs preserving. The difference between copy-on-write and redirect-on-write snapshots is how they store the previous version of a modified block, and these two methods have serious performance ramifications.
Consider a copy-on-write system, which copies any blocks before they are overwritten with new information (i.e. it copies on writes). In other words, if a block in a protected entity is to be modified, the system will copy that block to a separate snapshot area before it is overwritten with the new information. This approach requires three I/O operations for each write: one read and two writes. Prior to overwriting a block, its previous value must be read and then written to a different location, followed by the write of the new information. If a process attempts to read the snapshot at some point in the future, it accesses it through the snapshot system that knows which blocks changed since the snapshot was taken. If a block has not been modified, the snapshot system will read that block from the original protected entity. If it has been modified, the snapshot system knows where the previous version of that block is stored and will read it from there. This decision process for each block also comes with some computational overhead.
A redirect-on-write system uses pointers to represent all protected entities. If a block needs modification, the storage system merely redirects the pointer for that block to another block and writes the data there (i.e. it redirects on writes). The snapshot system knows where all of the blocks are that comprise a given snapshot; in other words, it has a list of pointers and knows the location of the blocks those pointers are referring to. If a process attempts to access a given snapshot, it simply uses these pointers to access those blocks where they originally resided. The fact that some of those blocks were replaced and are now represented by other pointers is irrelevant to the snapshot process. There is zero computational overhead of reading a snapshot in a redirect-on-write system.
The redirect-on-write system uses 1/3 the number of I/O operations when modifying a protected block, and it uses no extra computational overhead reading a snapshot. Copy-on-write systems can therefore have a big impact on the performance of the protected entity. The more snapshots are created and the longer they are stored, the greater the impact to performance on the protected entity. This is why copy-on-write snapshots are typically used only as temporary sources for backup; they are created, backed up, and then immediately deleted. Redirect-on-write snapshots, however, are often created every hour – or even every few minutes — and stored for days or even months when they are deleted only for space reasons. (The longer a snapshot is stored, the more extra space is required to hold the previous versions of changed blocks.)
Redirect-on-write snapshots are the preferred snapshot method if the plan is to use snapshots for medium-to-long-term protection against file deletions and corruptions. If a vendor is using copy-on-write snapshots and is recommending them for anything other than temporary sources for backups, make sure to ask them how they overcome the inherent performance penalties of copy-on-write.