Snapshot 101: Copy-on-write vs Redirect-on-write

Posted on April 1, 2016 by wcurtispreston

There are two very different ways to create snapshots: copy-on-write and redirect-on-write. If IT is considering using the snapshot functionality of their storage system, it is essential to understand which type of snapshot it creates and the pros and cons of using either method.

Register for a live technical roundtable on the challenges of snapshots within hypervisor file systems on May 4th, 2023 at 1:00 PM ET / 10:00 AM PT

Rather than the more common term volume, this column will use the term protected entity to refer to the entity being protected by a given snapshot. While it is true that the protected entity is typically a RAID volume, it is also true that some object storage systems do not use RAID. Their snapshots may be designed to protect other entities, including containers, a NAS share, etc. In this case, the protected entity may reside on a number of disk drives, but it does not reside on a volume in the RAID or LUN sense.

What all snapshot types have in common is that they are virtual copies not physical copies. If something happens to the protected entity, then the snapshot will be useless. For example, if there is a triple disk failure on a RAID 6 volume, snapshots will not help. An object storage system should also protect against a certain number of simultaneous failures. But if it exceeds that number, snapshots will not help. A snapshot has two primary purposes: easy recovery of deleted or corrupted files, and a source for replication or backup. In order for the snapshot to protect against media failure, you must replicate or back it up to some other device. In other words, you must make a physical copy.

With a snapshot, nothing significant happens on the collection of hard drives where the protected entity resides. The storage system merely takes note that the way the protected entity looks at that moment means it needs preserving. The difference between copy-on-write and redirect-on-write snapshots is how they store the previous version of a modified block, and these two methods have serious performance ramifications.

Consider a copy-on-write system, which copies any blocks before they are overwritten with new information (i.e. it copies on writes). In other words, if a block in a protected entity is to be modified, the system will copy that block to a separate snapshot area before it is overwritten with the new information. This approach requires three I/O operations for each write: one read and two writes. Prior to overwriting a block, its previous value must be read and then written to a different location, followed by the write of the new information. If a process attempts to read the snapshot at some point in the future, it accesses it through the snapshot system that knows which blocks changed since the snapshot was taken. If a block has not been modified, the snapshot system will read that block from the original protected entity. If it has been modified, the snapshot system knows where the previous version of that block is stored and will read it from there. This decision process for each block also comes with some computational overhead.

A redirect-on-write system uses pointers to represent all protected entities. If a block needs modification, the storage system merely redirects the pointer for that block to another block and writes the data there (i.e. it redirects on writes). The snapshot system knows where all of the blocks are that comprise a given snapshot; in other words, it has a list of pointers and knows the location of the blocks those pointers are referring to. If a process attempts to access a given snapshot, it simply uses these pointers to access those blocks where they originally resided. The fact that some of those blocks were replaced and are now represented by other pointers is irrelevant to the snapshot process. There is zero computational overhead of reading a snapshot in a redirect-on-write system.

The redirect-on-write system uses 1/3 the number of I/O operations when modifying a protected block, and it uses no extra computational overhead reading a snapshot. Copy-on-write systems can therefore have a big impact on the performance of the protected entity. The more snapshots are created and the longer they are stored, the greater the impact to performance on the protected entity. This is why copy-on-write snapshots are typically used only as temporary sources for backup; they are created, backed up, and then immediately deleted. Redirect-on-write snapshots, however, are often created every hour – or even every few minutes — and stored for days or even months when they are deleted only for space reasons. (The longer a snapshot is stored, the more extra space is required to hold the previous versions of changed blocks.)

StorageSwiss Take

Redirect-on-write snapshots are the preferred snapshot method if the plan is to use snapshots for medium-to-long-term protection against file deletions and corruptions. If a vendor is using copy-on-write snapshots and is recommending them for anything other than temporary sources for backups, make sure to ask them how they overcome the inherent performance penalties of copy-on-write.

About wcurtispreston

W. Curtis Preston (aka Mr. Backup) is an expert in backup & recovery systems; a space he has been working in since 1993. He has written three books on the subject, Backup & Recovery, Using SANs and NAS, and Unix Backup & Recovery. Mr. Preston is a writer and has spoken at hundreds of seminars and conferences around the world. Preston’s mission is to arm today’s IT managers with truly unbiased information about today’s storage industry and its products.

Tagged with: Backup, Object Storage, performance, RAID, Replication, Snapshot
Posted in Blog

4 comments on “Snapshot 101: Copy-on-write vs Redirect-on-write”

Ulrich G. says:

December 28, 2016 at 8:17 pm

What about read performance on a volume that has many redirect on write snapshots? The data for file then may come from many snapshots.

For shadow copies:
“Like the copy-on-write method, the redirect-on-write method is a quick method for creating a shadow copy, because it copies only changes to the data. The copied blocks in the diff area can be combined with the unchanged data on the original volume to create a complete, up-to-date copy of the data. If there are many read I/O requests, the redirect-on-write method can become expensive.”
(https://technet.microsoft.com/de-de/library/ee923636(v=ws.10).aspx)
wcurtispreston says:

December 28, 2016 at 8:57 pm

FWIW I’ve never heard of what MS is warning about. I HAVE experienced the performance impact of even a few copy on write snapshots, though.
republic says:

January 16, 2017 at 3:51 am

“200% fewer I/O”? If one operation uses 200% more operations than the other, the other uses 67% less than the first.

200% fewer is nonsense.
- wcurtispreston says:
  
  January 16, 2017 at 1:03 pm
  
  You know what, you’re right. LOL. I don’t normally make that math error. 😉

Comments are closed.

Snapshot 101: Copy-on-write vs Redirect-on-write

StorageSwiss Take

Share this:

Related

4 comments on “Snapshot 101: Copy-on-write vs Redirect-on-write”