Avoiding Cache Rebuilds in VMware

Using a flash based solid state disk (SSD) or PCIe SSD inside a virtual host to act as a high speed local cache brings many performance benefits but it also brings some challenges. One of those challenges is what to do with the cache when a virtual machine (VM) using it is migrated to another host. Do you invalidate the cache or do something else?

Moving data to a flash storage area is fast. Getting the right data in the cache, a process we call warming, can take some time. Data being accessed by the VM has to be analyzed by the caching software for I/O intensity so that the premium flash storage area is used most efficiently. You don’t want stale data consuming capacity that can be as much as 15X the price of hard drive capacity.

An increasing number of server side caching products support VM migrations. When a VM is moved from host A to host B, prior to the migration, the VM’s cache is invalidated. If write caching was in use, the data is then flushed out to shared hard disk. When the migrated VM re-starts on host B, its cache is recreated on the new host, however, the problem is that the cache now has to be re-warmed.

Re-warming means that when the VM arrives at its new host, the caching software needs to begin the process of analysis all over again. The impact is that the application in the VM and the users of that VM will see a sudden drop in response time as all accesses will initially come from the hard disk system until the cache is warmed.

Cache re-warming also means that the migrated VM may need to operate under a new set of constraints. The cache on the second host is likely already full with other VM’s hot data and now additional room needs to be made for the newly arriving VM’s hot data. This could have a negative cascading effect on VM performance on the entire host as both the new VM and the existing VMs are now in competition for caching space.

There are two technologies becoming available which avoid cache re-warming and the re-adjusting of cache resources as the result of a migrated VM. The first involves targeted mirroring of cache resources. With these techniques, each VM’s cache or hypervisor’s cache will be targeted to another host and inbound writes will be synchronously mirrored between them. Then if a migration is needed, the VM can be moved to that host and services can pick up exactly where they left off. Some applications leverage VMware FT (fault tolerance) to provide automation to the process.

The downside to this technique is that the flexibility of where a VM can be moved to is constrained. It can only go to one location or suffer the re-warming issues described above.

The second technique essentially builds a network just for cache memory between the servers. We are seeing this manifest itself in two ways. In the first method, as we discussed in our article “The Benefits of a Flash Only, SAN-less Virtual Architecture” flash devices in servers are pooled together into a single shareable LUN. While this technique does have the latency of a network for all flash I/O operations, it does allow the building of RAIDed redundant flash storage without replacing the existing SAN. Since the pool is shared between hosts, all hosts can get to the caching area and no special processes are required.

There is still another technique coming to market that may provide the most interesting approach thus far. Essentially the location of the cache remains static and active even when the VM is migrated. For example if a VM is migrated from host A to host B, the caching software in host B leverages the storage network to access the VM’s cache data in host A; thereby avoiding cache re-warming in host B.

Since most VM migrations tend to be temporal and will eventually be migrated back to its original host, this technique may be ideal. As it not only eliminates the cache rebuild on host B, it eliminates the cache rebuild when the VM is migrated back to host A.

With this technique there is some latency introduced for managing the cache communication back to host A, however, if it’s over a fibre channel network, latency should be very minimal. The key is that this technique avoids the overhead associated with cache re-warming in both directions.

Storage Swiss Take

Server side caching leveraging a local SSD or PCIe SSD has become a popular way to solve performance problems. They allow for surgical, automated performance improvements. The challenge is how to maintain those performance improvements in a virtualized environment. Cache re-builds are a safe way to allow migrations, but they can cause a loss in performance. Building a network for cache is the next logical step. It allows for local server performance while avoiding most cache rebuilds.

Unknown's avatar

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Posted in Briefing Note

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 17.4K other subscribers
Blog Stats
  • 1,979,436 views