It is impossible for all applications to be recovered instantly with no data loss after a disaster. But that is exactly what users expect. We find in some organizations IT avoids having the necessary conversation that brings users and application owners back down to reality. We call this mutual mystification. The problem is users think IT can do the impossible and IT knows it can’t. Instead IT needs to communicate reality to the organization. That is why Storage Switzerland has built its Backup and Disaster Recovery Training on the foundation of Service Level Objectives.
How to Eliminate Mutual Mystification
The best way to eliminate mutual mystification is to establish a service level objective for each application or data set within your environment. Each service level conveys how long it will take IT to recover the given application, how much data will be lost after recovery, how long data will be retained, what performance will be in the recovered state and what special considerations will be in effect during a major disaster where the data center is not available for a long period of time.
The first step in creating an SLO is for IT to decide what are the organization’s critical applications and establish recovery point, recovery time, recovered performance and retention objectives for each. IT should establish these objectives based on the software, hardware, facilities and personnel currently available. These metrics are the current reality.
The second step in SLO establishment is to then meet with key users and/or stakeholders to discuss the first draft. If there is agreement over recovery times and data loss, then the SLO is final. But that doesn’t happen often on the first meeting. Normally, there is some disagreement. The users or stakeholders will want less recovery time and lower data loss than what IT is currently equipped to deliver.
Believe it or not, at this point, the primary mission is accomplished. There is no Mutual Mystification. Users and application owners now know what IT can deliver. They may not like it, but they do know.
How To Achieve Mutual Agreement
The next big step is to move to mutual agreement. There are plenty of solutions to drive down the time to recover and to reduce data loss. For example, traditional once per night backup sent to cost efficient backup disk, is relatively affordable but it will also deliver longer recovery times and more data loss. Improving on traditional backup requires more frequent data captures and high performance protection storage. And there is an almost overwhelming number of options available for IT to consider.
Divide and Conquer
How do you eat an elephant? One bite at a time.
The same strategy holds for improving SLO deliverables. First, try to group the organization’s assets into logical buckets. We find that three work well.
First, there is a priority recovery tier for applications that need to recover in minutes. This tier requires near continuous backups or replication to a more performance capable data protection storage device. But the number of applications in this bucket should be relatively small.
The second recovery tier is for applications that need to recover in about an hour or two. These need frequent but not continuous backup. And the performance of the recovery device, while still important, is not as critical as the propriety tier. A larger percentage of an organization’s applications and data will fall into this category but it is also far less expensive to build than the priority tier.
The third tier is for applications the organization needs at some point, but are not vital to the business. In most organizations, the large maturity of applications may fall into this data set. These applications can typically be protected by traditional backup and recovery within a few days is often acceptable.
Despite this simplification of recovery tiers organizations often struggle with determining into which tier an application should go into. For example, some applications are used all the time but are not absolutely critical to the business. Our rule of thumb is that the priority tier is for applications where if they are down work has literally stopped. The middle tier is for applications or data where if access is lost, work does not stop but it does become more difficult. The final tier are applications that are a nice to have, they are needed but the organization can comfortably move forward without.
It is critical that IT communicates their recovery capabilities on a regular basis. While those conversations may be uncomfortable they are essential. As those communications continue IT can adjust priorities based on feedback from users and application owners.
To learn more about setting expectations and the tiers of recovery, watch our live webcast, “How to ‘Future Proof’ Data Protection for Organizational Resilience“.