Disasters rarely strike at a convenient time, like when recovery polices are all up to date, and when there are plenty of IT staff on hand to deal with any issues that may come up. In most cases, the opposite is true, disasters strike when the organization is the least prepared and staffing is at its scarcest. As a result, most disaster recovery (DR) efforts, if they work at all, take longer than they should, which means that they rarely meet recovery point and recovery time objectives.
Disaster Recovery is Different
DR in the modern data center is changing. Instead of a handful of mission-critical applications, most data centers today have dozens, even hundreds of business-critical applications. These applications – and the virtual machines within them – have interdependency with other applications, as well as infrastructure components like Active Directory, DNS and web-services. Recovery means not only frequently capturing a high-quality backup and having access to features like instant recovery, but also ensuring recovery of the right applications in the right order.
When the data center has a handful of mission-critical applications, remembering application interdependencies is difficult but possible. Remembering application interdependencies when there are dozens of applications is impossible. Organizations need to maintain up-to-date documentation of various applications and their dependencies at all times, so that when disaster inconveniently strikes, IT has something it can turn to, ensuring it executes recoveries in the right order.
Recovery order isn’t critical only because of application interdependencies, it is also important to make sure that IT recovers the highest priority applications first. Again, the number of applications makes setting recovery priorities more critical than ever. Beyond the number of applications is the ever-growing reality of data set size. The capacity of applications is also a factor in setting recovery priorities. While features, like instant recovery, do help recovery speed, it still requires a human to interact with the application to make sure that it is online and ready for users.
Another concern is user and organizational expectations. Disasters that take an entire data center offline typically also make the entire building inaccessible. If the users can’t get to their office, they can’t access data making rapid recovery less of a concern. Today though, with many users and customers accessing an organization’s applications remotely, via a mobile app or website, the users may never know that a disaster struck the primary data center, and they don’t want to know.
Errors caused by recovering applications without their required supporting infrastructure, don’t typically lead to data loss but they do lead to wasted time and effort, and lengthier outages. Whether time is lost because of these errors or lower priority applications are brought online prior to business-critical applications, users will see IT’s recovery as a failure.
Getting Disaster Recovery Right, Every Time
For decades, the key to getting DR right is pre-planning. The problem is that in the modern data center, pre-planning requires a luxury that most organizations don’t have, a set of IT personnel dedicated to creating, validating and continuously updating the plan. This group of IT personnel must also make sure that the documentation is readily available when disaster strikes. IT can’t afford to spend hours looking for documentation.
The real key is automation and analysis. Organizations need to look for DR solutions that provide an automated way to create an orchestrated recovery workflow. The workflow should account for intelligent grouping and ordering of application interdependencies. The workflow should also factor in that future application recoveries can leverage infrastructure interdependencies already recovered by the backup solution.
The automation portion of the DR orchestration workflow enables an IT professional to predefine application interdependencies and set the order and timing of recoveries. The testing and analysis portion of the workflow should continuously determine if the recovery policy meets RPO/RTO requirements and warn IT if any recovery workflow is trending toward violation.
Recovery orchestration also enables organizations to leverage all of their various backup targets. Not all applications are created equal and replication to a second set of high-performance storage isn’t necessary for all applications. Recovering from a higher performing backup target is acceptable for many applications, and for some recoveries, using a third-tier backup storage device, like deduplicated disk or the cloud, is acceptable.
In blog 2 we look at how backup, with orchestration, can play an expanded role in the recovery process to drive down DR costs without risky RTO/RPO obligations.