When considering integrating tape storage into their infrastructure, IT planners must consider the realities of how data is accessed. The IT planner needs to maximize tape’s strengths with the goal to store as much data as possible on tape because of its lower cost and inherent air-gap advantages while not sacrificing recovery performance when it matters most.
Understanding Recovery Realities
The natural inclination is to limit tape’s role in the data center to the archive function. Storage Switzerland believes that tape has a significant role to play in the backup process; a case can be made that tape is at least as well suited for backup as it is for an archive. The reason for our strong position on tape’s role in backup is the realities of recovery requests. Storage Switzerland continuously finds that the overwhelming number of recovery requests come from the most recent backup. Storage Switzerland finds that as much as 95% of all recoveries come from the most recent backup. That means that 95% of backup storage capacity is rarely called on for a recovery. We also find that the backup storage capacity is approximately 5X to 10X the size of production. That means in an environment that has 100TBs of production data, they may have a petabyte of backup data, but of that backup data, 950TBs of it will rarely be accessed.
Why keep that 950TBs of data around? The primary reason is so the organizations can meet compliance and regulatory requirements. The other purpose is that 5% of recoveries do come from this data set, and it is almost impossible to know what 5% of that 950TB will need to be recovered in response to a request. The organization is forced to keep all the data. Essentially, the capacity requirements are inversely proportional to the number of recovery requests.
It is interesting to note though that the recoveries from older backups tend not to be as time-sensitive as recoveries from the most recent backup. Generally, when an organization is recovering from the most recent backup, it is because a primary system has failed, and the organization needs to get that system back into production as quickly as possible. Recoveries from older backups are typically in response to discovery requests, regulatory responses, or a need for data to analyze. While these recovery requests still need to be served promptly, they do not need to be instantly available.
Another use case for older backup sets is when a major disaster strikes the organization causing destruction of the data center. An additional and more pressing threat that can cause an equal amount of damage is cyber-attacks such as ransomware. In these situations, the organizations need data that is both offsite and ‘air-gapped’ so that the disaster or cyber-attack doesn’t impact data quality. Organizations can provide an “air gap” by copying data onto removable media and sending that media offsite and offline, ensuring it is electronically disconnected from the network.
Designing a Backup Architecture for the Recovery Realities
Organizations should take advantage of the reality that not all backup data requires instant recovery and design backup architecture accordingly. An ideal architecture is to have a small and slowly growing disk (or even flash) front end coupled with a tape library-based backend. The flash or disk tier should store the most recent backup to facilitate the 95% of time-sensitive recovery requests, and the tape tier should save the rest of the data, again taking advantage of the inverse proportion of capacity as compared to recovery requests. In our example above, we’d allocate 100TBs of backup data to the flash or disk tier, and we’d allocate 900TB to the tape tier.
The cost ramifications of this design are significant. It eliminates the need for multiple or scale-out disk backup appliances, and we can make a case that even deduplication is no longer a requirement. Instead, the organization can invest in a more expensive, high-performance flash tier, with the more economical tape library providing extremely cost-effective storage capacity with almost limitless scalability and even more cost-effective power and resource consumption.
This architecture simplifies the storage capacity requirements at the remote site as well. The flash backup appliance or the backup software can replicate data to the alternate disk tier at a remote location. IT can easily contract for tapes to ship to a dedicated tape vault. In a disaster, organizations will want the latest copy of data, not data that is six months old. The organization can justify waiting a few hours for these tapes to be shipped to them while they are busy recovering the critical systems from disk. Additionally, storing these tape copies offsite, further leverages the air-gap advantage of tape as well as its long-term shelf-life of 30 years.
Once IT understands the reality of recoveries, the benefits of cost and power-efficient tape storage become immediately apparent. Combined with its natural air-gapped nature, the tape becomes the ultimate protection from disaster and cyber-attacks. Modern data protection software now includes the ability to automatically move data as it ages from the backup set to tape media. IT planners can now easily support moving data from disk to tape without adding management overhead.