In IT there is a tendency to treat all disasters the same, at least from a planning perspective. But the reality is that data is protected in a variety of ways; with snapshots, replication, backup to disk, backup to cloud and archive to tape, to name a few. Documenting how and when you should use each of these data protection strategies to recover an application or data set is critical so that the right data is recovered from the right backup source in the right situation. Lack of that documentation can lead to older data than necessary being recovered causing a recovery point objective to be missed.
A disaster, in the broadest terms, is the inability for users to get access to data or an application. That disaster can also come in many forms. We’ll highlight a few here but we detail more of them and discuss how to recover from them in our Backup 2.0 Technical Workshop that we are putting on in conjunction with TechTarget in cities across the U.S. and Europe.
At these workshops we offer a vendor-agnostic, deep dive discussion on designing the next generation backup architecture to meet even tighter Service Level Objective (SLO) requirements. You can learn about the basics of an SLO-driven data protection strategy by reading our article, “Backup Basics: What do SLO, RPO, RTO, VRO and GRO Mean?”.
Situational Recovery requires situational protection, which may mean having multiple data protection processes operating at the same time in the data center. In our workshop we outline a three-phase approach that uses replicated snapshots, recovery-in-place backups and a cloud or tape archive. The goal of replicated snapshots is to strike a nice balance between rapid recovery and affordability. You can read more about leveraging replicated snapshots to meet strict RPOs in our article, “Designing Primary Storage to Ease the Backup Burden”. The goal of recovery in place is to sacrifice some recovery time to save additional budget dollars. We offer a white paper on using recovery in place to meet RPOs and RTOs exclusively to workshop attendees. Finally, the goal of the cloud/tape archive is to dramatically cut the overall cost of both primary and protected disk storage.
When to Use A Replicated Snapshot Recovery
The application or data, the type of protection available and the type of disaster will all impact where a restore is triggered from. For example if there is an application that has a very strict RPO/RTO (say, less than 15 minutes) all the above data protection schemes may be applied to it. The typical recovery from data corruption or storage system failure will probably be from a local snapshot or from a snapshot on the secondary storage system. Again, we walk you through this process step-by-step in the workshop. But if the disaster is actually a virus that has worked its corruption through all the snapshots and replicated snapshots then a recovery from backup, cloud or even tape may be in order.
The problem is that if these different recovery situations are not documented and trained for, then a well-meaning IT staffer might pull an older set of data than they need to. This leads to a much longer RTO. Documentation and testing will prevent that. In the workshop I relay a story where the oldest copy of 14 available copies was recovered. I’ve had attendees tell me that just the lessons learned from that story is worth attendance!
When to Use Recovery in Place
If the application has the less strict RPO/RTO of two hours or if the budget simply won’t allow for a rapid recovery design, then a recovery-in-place solution is the next option. In most cases, recovery in place solutions should be able to comfortably meet a 1-hour RPO. But there are times where recovery in place may actually be slower than moving the data back into production. For example, if there is a failure that would required 10 or more virtual machines to all be instantiated at the same time on the disk backup appliance, that appliance may not be able to support the production I/O required by that many VMs simultaneously. Taking the outage and moving the VMs back to production storage may be a faster path to recovery.
In our workshop we cover setting RPOs/RTOs, as well as architecting specific real world designs to meet them, while staying within budget. Again, all this is done in a vendor-neutral way. Keep Updated Here