The Problems with VMware Snapshots

There are generally three ways we create snapshots in this world: copy-on-write, redirect-on-write, and VMware’s way. As a result, you should delete VMware snapshots as soon as possible – preferably within a few minutes of their creation.

A previous post discussed the difference between copy-on-write and redirect-on-write snapshots and it is well worth the read if you are not familiar with those terms. VMware’s snapshot style is completely different than either of them.

Once a VMware takes a snapshot, all writes to the VMDKs that comprise that VM stops. Instead, new writes go to an alternate area. As long as that snapshot exists, all new writes will go to the alternate area and all reads will have to read both from the original VMDK and from the alternate area in order to supply the current version of all blocks.

First, this is completely opposite to the way all other snapshots work. Other snapshot systems use an alternate area only to preserve the before image of changed blocks. In the case of a copy-on-write system, the before image is copied out to the alternate area before overwriting a block. In a redirect-on-write system, the before image is left in place and its pointer preserved, while the current view of the active volume is given a pointer to a new block. Again, I cover this in detail in my prior post. But in VMware snapshots, the actual VMDK becomes the “snapshot” of the previous point in time, and the primary volume is forced to read what it needs from the alternate area.

If that were the only difference, one could argue that we are talking about semantics or having too many concerns about the trees and not the forest. Unfortunately there is a much bigger problem awaiting VMware administrators who leave their snapshots in place too long. Once they delete a snapshot, all writes since that the software took the snapshot must be copied from the alternate area to where they should be in the primary volume. Depending on the length of time the snapshot has existed and the number of writes that have happened since it was created, this could be quite a bit of I/O that has to happen all at once.

Unfortunately, many VMware administrators are completely unaware that VMware snapshots act in this way and try to use VMware snapshots like regular snapshots. For example, they might take a snapshot before an upgrade of the OS or an application, giving them the ability to easily roll back from that change. But if they are unaware of this difference, they might think they can leave that snapshot there for days or even weeks, until they are sure the new version of the OS or application isn’t causing any problems. Then when the snapshot is deleted, VMware has to redo all the writes that happened to that volume during that entire time!

StorageSwiss Take

VMware snapshots are bassackwards. Do not use them for anything other than a very temporary source for backup or another snapshot. A great thing to do, for example, is to take a VMware snapshot just before taking a storage-level snapshot — then immediately delete the VMware snapshot. Now you have a snapshot of a snapshot, but without the severe performance penalties of VMware’s snapshots.

W. Curtis Preston (aka Mr. Backup) is an expert in backup & recovery systems; a space he has been working in since 1993. He has written three books on the subject, Backup & Recovery, Using SANs and NAS, and Unix Backup & Recovery. Mr. Preston is a writer and has spoken at hundreds of seminars and conferences around the world. Preston’s mission is to arm today’s IT managers with truly unbiased information about today’s storage industry and its products.

Tagged with: , , , ,
Posted in Blog

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 25,542 other subscribers
Blog Stats
%d bloggers like this: