Being in charge of the data protection process is a thankless job. The process you create can run perfectly 99% of the time but everyone will remember the 1 time it fell short, and they will blame you. The task of protecting an organization’s data is getting harder too, data is growing and users expect there to be no downtime. The key, we believe, is to manage the data protection process from a service level perspective instead of a job perspective, by setting a service level objective (SLO) for each application.
Within an application’s SLO, there is a specific demand for a recovery point objective, a recovery time objective, version retention, and geographical redundancy. Each application will have different requirements for each of these SLO components so it is important to understand what they are. The goal of this article is to set the foundation for understanding these components; in future articles, we will cover how to select solutions that meet these various objectives.
Recovery Point Objective
The recovery point objective (RPO) is the amount of data you can afford to lose if an application has to be recovered. In most cases, data protection is not a continuous activity; there is a time window between protection events. During this window, data is created, modified and deleted. The protection process does not capture the data changes until the next protection event occurs. That means if there is a failure those additions, changes and deletions are lost.
For example, if you are doing backups once per night, changes are made throughout the day. If you have a storage system failure at 4pm, all the data that was added, changed or deleted from the time the business opened for the day until 4pm would be lost. For user files this is an inconvenience, for a high transaction application it could be catastrophic.
The more important the data is, the more important it is to provide a narrow RPO. The problem is the more narrow the RPO, typically the more expensive the data protection solution becomes. Also capturing an application’s transaction logs cannot be considered a way to fulfill the RPO of an application. While it does help, since information does not need to be manually re-keyed, it does take time for those logs to be re-played. The more changes represented in a transaction log, the longer it will take to re-play those and the harder it will be to meet a narrow RPO. In a future column, we will look at solutions that narrow the RPO.
Recovery Time Objective
The Recovery Time Objective (RTO) is the time you are committing to returning an application online following a failure; in other words, the maximum length a given application can be offline. The RPO described above impacts the RTO, since an application can’t truly be considered back online until all the data that was lost is rekeyed or the associated application transaction logs are re-played. In general though, the key component of RTO is the time it takes to migrate information in the data protection storage tier into a production state.
In the backup use case, this could be copying data from a backup storage device (disk or tape) across the network and on to production storage. Not only does the transfer across the network take time but it also takes time to write all the data. Remember that writes are always slower than reads, even on flash storage. Writes are further slowed down because the data is written to a RAID protected primary storage device that means RAID parity calculations need to be calculated as well.
As with RPO, there are plenty of methods to reduce RTO including the increasingly popular method of recovery in place where virtual instances of servers are launched directly from the recovery device. But as we discussed in our article, “VM Recovery In Place vs. Changed Block Recovery”, recovery in place solutions have their limitations and you should make sure you understand those limitations before counting on them to meet your RTO.
Version Retention Objective
The next component of a protection SLO is the Version Retention Objective (VRO). The VRO incorporates how many copies of a given version of a file or application data set will be maintained and how long that copy will be maintained. Almost every organization has data that has specific requirements for retention based on various industry or government regulations. There may also be a requirement for how quickly that data needs to be retrieved and delivered to the requesting party. There are an increasing number of laws, for example, which require that discovery requests be fulfilled within a given timeframe.
The VRO is not only for protection against litigation, however, as more organizations than ever are mining the data in their environment to help with various decision support systems and even new product design.
Geographic Redundancy Objective
Finally there is the geographic redundancy objective; sometimes considered the disaster recovery requirement. This objective encompasses what data needs to be replicated off site, how often and how far. It typically will have its own set of RPO and RTO requirements that are often less stringent than the standard RPO and RTO. Simply put, users are more patient when you are in the midst of a regional disaster than they are if you are having a server failure.
The GRO may also include multiple geographic requirements per application. For example, a mission critical application may have to be synchronously replicated first; this means a relatively close location. Then it may have to be asynchronously replicated to a second location to protect against a regional disaster.
As with RPO, RTO and VRO, there are multiple ways to reduce the GRO as well as broaden each reach beyond a single disaster.
RTO, RPO, VRO and GRO may sound like a bowl of alphabet soup but understanding these terms is important. What is more important though is understanding the requirements of each application or data set for each of those three letter acronyms. Once understood they form the service level objective, which is critical to creating a backup architecture that meets the needs of the organization. The benefit for the backup manager is that as you begin to align your data protection strategies with these various objectives, you will find that the process of protecting data becomes less time consuming and less stressful.