Designing Backup to replace Primary Storage

Users and application owners expect that the systems they use will never go down, and if they do they will be returned to operation quickly with little data loss. In our article “Designing Primary Storage to Ease the Backup Burden” we discussed how to architect a primary storage infrastructure that is able to help meet these challenges. We call this design Protected Primary Storage. But this design can be expensive, especially if it is applied to every application in the data center. For data centers that are not at the point of a storage refresh, or simply not able to afford that type of commitment, another design is required; Protected Storage as Standby. This design is less expensive but has longer yet still acceptable recovery point objectives (RPO) and recovery time objectives (RTO).

Data Transfer – The Enemy of RTO/RPO

The stricter an organization’s RPO and RTO requirements become, the more important it is to eliminate data transfers as much as possible, both in terms of data protection and recovery. For data protection this typically means some form of block level data protection. This type of data protection allows for more frequent data captures so that less data is lost if there is a failure.

For data recovery, it means being able to instantiate the data store for an application directly on the secondary disk storage so that the entire data store does not need to be transferred back to production storage, something that Protected Primary Storage does quite well but backup and replication software can provide similar results.

Protected Storage as a Primary Standby

The infrastructure that protects the primary storage infrastructure has evolved a lot over the last five years. For most data centers the front line of this protection is now disk, and it is managed by sophisticated software that does far more than just back data up. Today’s data protection software has evolved into availability software. These solutions include the ability to not only backup data, but to also replicate that data. Both types can be leveraged to recover the data volume for a server without having to transfer data.

Replication can be used to store the data in a native state on a secondary disk array. Storing data in its native state is important because this means that the data is immediately accessible without needing conversion from a backup format. Also important is that this secondary disk is typically more suitable to run production workloads than traditional backup storage, but the secondary disk array can be much less expensive than the primary array. Corners can be cut on performance, features and availability to lower the price of the secondary storage system. The extent that these corners are cut though is limited since there is an expectation that this secondary array may be called on to run production applications at some point.

Backup software can provide the next tier of recovery, lowering costs at the expense of a few more minutes of RTO/RPO. Some backup software solutions can, leveraging disk backup, support instantiating a server’s data store directly from the disk backup system. We refer to this as recovery in place. Typically this requires some processing to prepare the data to be readable by the application and there may be a performance penalty when accessing a server’s data set from a disk backup appliance that is designed specifically for low cost data storage. For example many of these systems deduplicate and compress data to save cost, but these processes can hinder the performance of a production application.

Protected Primary Storage vs. Replication vs. Backup

Most data centers should use a combination of replication and backup, as well as add in the protected primary storage capabilities discussed in the last article. Protected Primary Storage meets the strictest of all RPO and RTO requirements. It should allow an application to return to operation in the shortened period of time, a few minutes, with the least amount of data loss (less than an hour). Replication has the advantage of providing a more rapid recovery time than backup’s recovery in place, typically a dozen minutes of downtime instead of 20-30 minutes. With modern applications both offer similar data loss, since the backup and replication process will often feed off of each other. And of course both offer faster recovery than traditional backup recovery, which has to transfer all the data across the network and could take hours to return the application to service.

Performance in the recovered state is also important. Protected Primary Storage should provide similar if not identical performance during a failover. Replication should also provide better performance during the time that it acts as production than backup since the secondary array is not typically burdened down with deduplication and compression. But while the replication software may be included with the backup software solution, the hardware is of course not. More than likely the storage array that is being replicated to will be more expensive than backup storage and it will not have the data efficiency capabilities of a backup appliance.

In either case a backup solution is still going to be a requirement for long term data retention. Both replication and protected primary storage are only suitable for short term recovery demands, typically recovering the last known good copy of data. A backup solution provides the cost efficient retention of this data, can disperse those copies to two different types of media (tape or cloud) and enable the off-site movement of that data.

Conclusion

Protected Primary Storage and Protected Storage as a Standby are to some extent competing designs. Which design makes the most sense depends on the data center and where that data center is in its storage refresh cycle. That said a combination of protected primary storage, protected storage as standby, as well as straight backup could be used in combination. In this design the most critical applications are assigned to a small protected primary storage strategy, important servers are assigned to replication and the remaining servers are assigned to the recovery in place capabilities of the backup solution.

Twelve years ago George Crump founded Storage Switzerland with one simple goal; to educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought after public speaker. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , ,
Posted in Article
5 comments on “Designing Backup to replace Primary Storage
  1. Wim Provoost says:

    Another thing to note is that protected Primary Storage and (x-way) Replication will also not help in case of a human error (some deleting a VM by error). In case of replication it will be gone of all other sides instantaneous. Not most, but all datacenters should use a combination of protected primary storage and backup or asynchronous replication (of snapshots). Of course, make sure to test you secundary site regularly!

  2. George Crump says:

    Good point Wim. Of course, you could also leverage snapshots on the other side. That snapshot should protect you if the VM is deleted from the primary.

    • Aleksey says:

      Is it true for storage snapshots but not for vmware snaps? I thought that when you delete VM with vmware snapshots than you have nothing to recover. And snapshots are not backups anyway – two disk failed in raid-5 and you have no data anymore.

      • agonzalez says:

        Aleksey yes, vmware snapshot go way when you delete VM.

        Storage snapshot are good option for backup and fast recovery. I have worked with many different storage arrays on my career and have never lost any data because of disk failues. But anyway you will be replicating the storage snapshot to another DR storage. And also is recommended to have backup to disk or backup appliance like Unitrends Recovery Series Appliances that can backup, replicate and do recovery in place to meet your goals like explained in the article

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,250 other followers

Blog Stats
  • 1,566,236 views
%d bloggers like this: