Overcoming the Recovery-In-Place challenge

Posted on June 30, 2014 by George Crump

The advanced capabilities of virtualization-specific backup applications, such as Veeam, have brought new flexibility to the protection and recovery of virtual servers. In fact, these improvements, plus better availability, may be justification enough to virtualize a workload. Two of the most sought-after VM backup features are recovery-in-place and Changed Block Tracking, but these features may dramatically change the selection criteria for disk backup devices. If organizations can’t find a better backup storage target then they won’t be able to optimize backup and recovery operations.

What is Recovery-in-Place and Changed Block Tracking?

Recovery-in-place is a capability several backup applications have added that allows for virtual machines (VMs) to be launched and executed directly from a disk backup device. This provides an almost instant recovery of a failed VM since no data has to be copied from the backup device across the network and on to primary storage. The technology actually leverages the Changed Block Tracking (CBT) functionality available in many of today’s hypervisors like VMware.

CBT copies a VM, block-by-block, to the disk backup device, taking only the latest versions of those blocks. This allows for more frequent backups of data because those backups can be completed more quickly since only changed blocks are being transferred at each protection event. CBT makes backups a throughout-the-day, random I/O transfer instead of a once-a-day, sequential transfer that legacy backup applications use.

CBT also allows the VM-specific backup application to have a near mirror image of the VM on the backup target. This makes recovery or testing of a VM from the actual backup target a logical next step for data protection software vendors to add to their products.

The Impact on Disk Backup Appliances

Both CBT and recovery-in-place require a random access device in order to work properly; in other words a disk based backup target. When a VM is recovered in place it is dependent on the storage I/O performance of the actual disk target, frequently a disk backup appliance. The problem is that to keep costs under control most backup devices are loaded down with high capacity hard disk and further encumbered with data deduplication. The combination makes those systems ideal for large, sequential data transfers, but they may not be well suited for CBT backups and recovery-in-place, which involve smaller, random transactions.

The CBT Problem

As mentioned above, CBT enables data centers to perform backups more frequently. It is not uncommon for backup administrators, armed with this capability, to perform backups several times an hour, instead of once per night. CBT is typically performed per VM, which means that 3-4 times per hour dozens, if not hundreds, of VMs are sending thousands of I/O blocks to the disk backup device. Suddenly, data protection I/O begins to look like database I/O. A disk backup appliance equipped with a small number of high capacity disks may not be able to keep up.

Finally, deduplication is less effective on these devices since the only data being transferred is net new blocks. For deduplication to achieve the highest reduction rate there typically needs to be redundant data being sent to the device. With CBT, after the initial transfer, there is very little redundant data involved.

The Recovery-In-Place Problem

Recovery-in-place allows for the rapid return to operations of a failed VM, but for that to be of value the performance of the VM in its recovered state has to be acceptable. Certainly some performance loss is an appropriate trade for rapid recovery, but there is a concern that the performance of the recovered in place VM will make it unusable.

The recovery-in-place performance problem is a direct result of the hardware chosen for that VM recovery. If that hardware is a disk backup appliance with a low number of high capacity hard drives and features like deduplication enabled, then performance will suffer dramatically. This is compounded by the fact that data centers of all sizes are using flash enhanced storage systems to improve the storage performance of VMs while in production. This makes the performance delta even worse.

Finally, the disk backup appliance must also have better redundancy than in the past. While the disk backup is a hosting a VM the disk backup appliance is no longer just storing a second copy of data, but actually the unique data that the VM is creating. RAID redundancy and quality components become a requirement.

The Solution – A Hybrid Disk Backup

The solution to the CBT and recovery-in-place problems may be to use a hybrid disk backup similar to what is used in production, but at more cost effective price point. The first step in striking the right performance and cost balance is to allow backup designers to use their own off-the-shelf storage media. While they would still need to follow best practices of similar drive types and quality components, using off-the-shelf hardware would drive costs down significantly.

The second step would be to leverage an inexpensive network connection, like iSCSI, so that multiple backup servers could access the storage device across the network. iSCSI can integrate into any Fibre Channel or NFS hosted virtualized environment with ease.

The final step would be to allow for part of the storage capacity to be comprised of a flash component such as Solid State Drives (SSDs). This would allow for the random I/O created by CBT backups to be quickly responded to by directing those writes to flash first. It would also allow recovery-in-place operations to occur from higher performing flash.

The device would need the ability to automatically move data between the flash and hard disk drive (HDD) areas so as not to burden the backup administrator with manually moving that data. Writes could automatically be moved to the HDD area as it completed and the system sees idle time.

A VM that has been recovered in place would have its data moved to the flash area as users begin to access that VM’s application. This use case allows for a smaller than usual flash area, which, when combined with off-the-shelf flash drives, brings the cost of high performance backup and recovery well within the reach of most data centers.

Conclusion

Changed Block Tracking and recovery-in-place are two features that can dramatically simplify the backup administrator’s job while at the same time improve the value of the data protection process by providing more frequent backup events and more rapid recoveries. But these features require that the backup device be reconsidered. The modern backup device needs to be able to handle the database-like random I/O of CBT backups and deliver the performance and protection required when hosting a VM.

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: CBT, Changed Block Recovery, Drobo, HDD, Hybrid Disk, Recovery in Place, SSD, Virtual machine, VM
Posted in Article