Users and application owners expect that the systems they use will never go down, and if they do they will be returned to operation quickly with little data loss. In our article “Designing Primary Storage to Ease the Backup Burden” we discussed how to architect a primary storage infrastructure that is able to help meet these challenges. We call this design Protected Primary Storage. But this design can be expensive, especially if it is applied to every application in the data center. For data centers that are not at the point of a storage refresh, or simply not able to afford that type of commitment, another design is required; Protected Storage as Standby. This design is less expensive but has longer yet still acceptable recovery point objectives (RPO) and recovery time objectives (RTO).
Data Transfer – The Enemy of RTO/RPO
The stricter an organization’s RPO and RTO requirements become, the more important it is to eliminate data transfers as much as possible, both in terms of data protection and recovery. For data protection this typically means some form of block level data protection. This type of data protection allows for more frequent data captures so that less data is lost if there is a failure.
For data recovery, it means being able to instantiate the data store for an application directly on the secondary disk storage so that the entire data store does not need to be transferred back to production storage, something that Protected Primary Storage does quite well but backup and replication software can provide similar results.
Protected Storage as a Primary Standby
The infrastructure that protects the primary storage infrastructure has evolved a lot over the last five years. For most data centers the front line of this protection is now disk, and it is managed by sophisticated software that does far more than just back data up. Today’s data protection software has evolved into availability software. These solutions include the ability to not only backup data, but to also replicate that data. Both types can be leveraged to recover the data volume for a server without having to transfer data.
Replication can be used to store the data in a native state on a secondary disk array. Storing data in its native state is important because this means that the data is immediately accessible without needing conversion from a backup format. Also important is that this secondary disk is typically more suitable to run production workloads than traditional backup storage, but the secondary disk array can be much less expensive than the primary array. Corners can be cut on performance, features and availability to lower the price of the secondary storage system. The extent that these corners are cut though is limited since there is an expectation that this secondary array may be called on to run production applications at some point.
Backup software can provide the next tier of recovery, lowering costs at the expense of a few more minutes of RTO/RPO. Some backup software solutions can, leveraging disk backup, support instantiating a server’s data store directly from the disk backup system. We refer to this as recovery in place. Typically this requires some processing to prepare the data to be readable by the application and there may be a performance penalty when accessing a server’s data set from a disk backup appliance that is designed specifically for low cost data storage. For example many of these systems deduplicate and compress data to save cost, but these processes can hinder the performance of a production application.
Protected Primary Storage vs. Replication vs. Backup
Most data centers should use a combination of replication and backup, as well as add in the protected primary storage capabilities discussed in the last article. Protected Primary Storage meets the strictest of all RPO and RTO requirements. It should allow an application to return to operation in the shortened period of time, a few minutes, with the least amount of data loss (less than an hour). Replication has the advantage of providing a more rapid recovery time than backup’s recovery in place, typically a dozen minutes of downtime instead of 20-30 minutes. With modern applications both offer similar data loss, since the backup and replication process will often feed off of each other. And of course both offer faster recovery than traditional backup recovery, which has to transfer all the data across the network and could take hours to return the application to service.
Performance in the recovered state is also important. Protected Primary Storage should provide similar if not identical performance during a failover. Replication should also provide better performance during the time that it acts as production than backup since the secondary array is not typically burdened down with deduplication and compression. But while the replication software may be included with the backup software solution, the hardware is of course not. More than likely the storage array that is being replicated to will be more expensive than backup storage and it will not have the data efficiency capabilities of a backup appliance.
In either case a backup solution is still going to be a requirement for long term data retention. Both replication and protected primary storage are only suitable for short term recovery demands, typically recovering the last known good copy of data. A backup solution provides the cost efficient retention of this data, can disperse those copies to two different types of media (tape or cloud) and enable the off-site movement of that data.
Protected Primary Storage and Protected Storage as a Standby are to some extent competing designs. Which design makes the most sense depends on the data center and where that data center is in its storage refresh cycle. That said a combination of protected primary storage, protected storage as standby, as well as straight backup could be used in combination. In this design the most critical applications are assigned to a small protected primary storage strategy, important servers are assigned to replication and the remaining servers are assigned to the recovery in place capabilities of the backup solution.