Organizations tend to treat all backup data the same. Generally, IT stores backup data on a relatively low performing, high capacity disk backup system. The problem is that this type of device doesn’t fit the restore requests that most IT teams need to service. Even though backup applications can move data to different storage devices based on policy, surprisingly few data centers are tiering their backup data. Using a public cloud or on-premises object store as a place to store older backups can bring significant cost savings to the organizations and improve its ability to respond to restore requests.
Understanding the Realties of Backup Data
Most organizations retain backup data for a surprisingly long time, even though most restore requests are for data that is within a few days of the most recent backup. Most studies show that 95% of restores are from the most recent backup. It is rare when an organization needs to recover data from any backup older than last week, but it does happen. Many times, these restores are responding to particular recovery requests, in other words, there are legal or regulation issues in play, and the organization needs to recover data to meet a regulatory requirement or discovery request. As a result, the need to recover older data may be rare, but it is critical when it does occur.
Another backup reality is that the most frequent recoveries, those that come from the most recent copy of data, have the most pressure for rapid recovery. Backup vendors today, provide “instant recovery” type features where the data for a virtual machine is instantiated directly on backup storage to help IT meet rapid recovery demands. The problem is that if the backup storage system is going to act, at least temporarily, as primary storage, then it needs to provide performance acceptable for the task.
No Matter What, Tier Backup Data!
Rapid recovery demands and long-term retention requirements mean that backup data is used to service two dramatically different use cases. A request from the latest copy of data is likely time-sensitive and needs to deliver production-quality performance. A request from an older backup set needs to be accessible, but “instant” is not a requirement and in most cases does not need to be instantiated on backup storage.
Given the diversity between the two types of restores and the amount of data each restore type represents, IT planners need to consider dividing their backup storage into two repositories. The first repository should be relatively small, enough to store the last few backups, but on all-flash or hybrid-flash so that it can deliver the performance that an instantly restored virtual machine may require. The secondary repository should be a high-capacity, scalable, and cost-effective storage area, with the cloud being an excellent option.
It is essential that the backup software automatically move data between these two repositories based on policies set by the users. At their recent analyst summit, Veeam explained how their CloudTier feature automatically moves older backups from a performance backup storage area to a high capacity storage area including object storage or cloud storage.
Effectively managing backup data can not only dramatically reduce the cost of retaining this information, but it can also help cost-justify the investment in a higher performance backup storage area for the most recent backups. Instead of investing in a middle of the road backup storage system, the organization can invest in a very low-cost, long-term backup repository and a small high-performance recovery tier.
Tier to the Cloud? Why or Why Not?
Companies like Veeam bring two options for customers to consider for the secondary backup repository. It can be an on-premises object storage system or public cloud providers like Microsoft Azure, Amazon AWS, or Wasabi Hot Cloud Storage. While the cloud has a small upfront cost, the fee to rent storage, long-term in the cloud, adds up. A few hundred terabytes stored for five years eventually is more expensive than investing in an object storage system.
Investing in an on-premises object storage system, though, does have some disadvantages. There are the obvious upfront investment costs. Most object storage systems don’t make economic sense until the initial storage capacity reaches 100TBs or more. Buying 100TBs of capacity day one may be a significant capital investment for many organizations. The object storage system also has to be unboxed and installed into racks, which takes up data center floor space, as well as power and cooling resources. Each additional upgrade to the object storage system also consumes more of those resources, and of course, IT needs to integrate each upgrade into the architecture.
A cloud-based storage system, by comparison, has the low upfront cost and no data center floor space, power or cooling for both the initial implementation and future upgrades.
In the end, the organization needs to decide which is the better path for them, but either path is better than staying with the status quo. Over time, an on-premises object storage system should be the most cost-effective approach, and it enables the organization to leverage the object store for more than just backup data. An on-premises object store does, however, bring a higher operational overhead that the organization needs to take into account.