Almost every backup solution on the market today claims some form of cloud storage support. IT planners need to understand what type of cloud storage their backup solution provides and if the architecture of the solution can genuinely take advantage of cloud storage. Surprisingly, most data protection solutions, while they may support cloud storage, actually deliver very little value as a result. Cloud storage can undoubtedly be a valuable complement to the backup infrastructure, but only if the backup software is architected in a way that it can take full advantage of it.
Why Use Cloud Storage as Part of the Backup Infrastructure?
Before understanding how backup software solutions are using cloud storage, IT professionals must understand the “why” behind using cloud storage. The primary reason to leverage cloud storage as part of the backup infrastructure is to reduce backup storage costs.
Surprisingly, a cost-per-gigabyte comparison with on-premises backup storage may not immediately reveal cost savings. Backup storage systems like high capacity, deduplicated appliances, and tape libraries may offer a price advantage over the subscription model offered by the cloud. However, these initial comparisons are deceptive. They don’t factor in that the organization needs to pay for on-premises storage upfront. The organization doesn’t realize that cost per gigabyte savings until it uses the storage system’s capacity completely. Once the total cost of ownership is factored in, many organizations find public cloud storage to be a more economical way to store backup data, especially if the data protection software uses the cloud storage intelligently.
Another advantage of using cloud storage is that it automatically creates an off-site copy of data. The organization doesn’t need to operate or pay for equipping a disaster recovery data center or maintaining another backup storage system in that data center. Cloud storage also scales and refreshes automatically. IT no longer needs to worry about upgrading their current backup storage hardware or, if a scale-out backup storage system is in use, adding a node. The cloud, at least from the customer’s viewpoint, automatically scales.
Understanding How Backup Software Uses Cloud Storage
There are three common ways that backup software uses cloud storage. The first is Cloud Mirroring, which essentially copies all backup data from on-premises to the cloud. The on-premises backup data is never archived. The cloud copy of the backup data is for disaster recovery. As a result, the customer only sees a modest reduction in backup storage costs because it eliminates the second backup storage system and the cost of maintaining the second site.
The second way that backup software uses cloud storage is Cloud-Only Storage, the polar opposite of Cloud Mirroring. This method only stores data exclusively on cloud storage. The Cloud-Only method of storing backup data reduces on-premises backup storage infrastructure cost significantly. The organization also gains all the benefits of cloud storage described above. The problem is the Cloud-Only method introduces cloud latency, which impacts both queries of backup data as well as recovery times.
The third way that backup software is using cloud storage is Cloud Tiering, which provides a better balance of leveraging on-premises data and the cloud. The idea behind this method is to move older backup jobs to the cloud as they age. The problem is that Cloud Tiering makes sense only for specific data sets, and the backup software must intelligently leverage the two tiers for the customer to see the benefit.
Will Your Backup Software Extract Full Value from Cloud Tiering?
Storage Switzerland finds that over 85% of recovery requests are from last night’s backup. In theory, if the organization only stored the two or three most recent copies of backup data on-premises, these two or three versions should facilitate almost all recovery requests. Storage Switzerland also finds that over 65% of organizations have backup retention policies of more than five years. If the organization archived or moved any backup data older than a few weeks to cloud storage, it can significantly reduce on-premises backup infrastructure and costs while not sacrificing their ability to meet various service level objectives.
What Type of Data Can Be Cloud Tiered?
To extract full value from the Cloud Tiering concept, IT needs to apply it on the right data set, and the backup software needs to store information in such a way that it can intelligently archive it to the cloud. There are two data sets of concern in most organizations today, databases and unstructured data. Databases are made up of records (or rows), and users have not accessed some of those records in years, if not decades. The problem is that if IT needs to fulfill a restore request for the database, IT needs to restore the entire database. In most cases, a database can’t run with 20% of its data on-premises and 80%+ of its data resting in the cloud. Since restoring the database from the cloud is slow, most organizations need to keep 100% of their database backup copies on-premises.
The second type of data is unstructured data, which are the files created by users and machines. Since each file is a stand-alone entity, other files are generally not dependent on it, so storing the older backup versions of files in a cloud archive makes sense. Unstructured data provides a higher return on the Cloud Tiering investment since the capacity that unstructured data consumes is generally much more substantial than structured data, and in the event of emergency recovery, the organization only needs the most recently created or changed versions of unstructured data.
How Will the Backup Software Support Cloud Tiering?
The Image Backup Challenge
How well the backup software supports Cloud Tiering largely depends on how the backup software stores data. Most backup applications store both structured and unstructured data as images of the volumes they are protecting. Image-based backup is an especially popular method for protecting unstructured data because the backup software doesn’t need to go through the very time-consuming process of scanning through potentially millions of files to find new or changed files. The backup software backs up the whole volume on which the unstructured data resides as one big blob creating a baseline, foundational copy. Subsequent backups scan the volume for changed blocks. If during that scan, new or changed blocks appear, then those blocks are stored together in an incremental backup job. If the backup administrator wants to restore a volume to the latest version, the backup software restores the baseline copy of the volume first and then overwrites the changed blocks from the incremental backups.
Most image-based backup solutions require IT to create a new baseline image periodically after a relatively small number (4-6) of incremental jobs are complete. The software can create a new baseline image by consolidating the incremental jobs into the baseline image, or IT can run a complete full backup. Either method, however, is time-consuming.
The only component within an image-based backup job that is eligible for archiving is the older baseline copy after the software, or IT creates a new baseline. The problem is the new baseline, and the old baseline is very similar in size. While the organization does not save as much total capacity as the name Cloud Tiering implies, it does save on some capacity expenses. Usually, the organization stores older foundational backup copies on-premises, but now they can store them in the cloud. The problem is the cloud storage tier then has a high level of redundancy between the foundation copies, and so the organization is paying extra per month to store identical copies of data.
The Next Generation File-by-File Backup Solution
By comparison, a file-by-file backup does scan the entire unstructured data set. These scans are more time consuming that the image-based approach. However, next-generation, unstructured backup solutions are making significant improvements in how long both the original, and especially, subsequent scans take. Legacy file-by-file backups, in addition to slow scan times, also still store each file-by-file backup job as a single blob. The primary reason is to make saving the data to tape more efficient. Writing 10,000 one megabyte files to tape is a prolonged operation. Next-generation unstructured backup solutions write data to disk both on-premises and in the cloud and as such, can store files, so it has granular access to them. Not only does the granular storage of files enable these next-generation solutions to provide deep insight into the types of files backed-up, but it also allows much more intelligent use of the cloud.
The next-generation solutions do not need to perform consolidation jobs to create new fulls. After the initial scan, the software can continue to back up the file systems, incrementally, forever. The incremental forever strategy plus the granular insight means the next generation unstructured data protection software can also granularly move (not copy) specific files based on change rate, type, or classification from on-premises backup storage to public cloud storage. It can use the same information to leave more critical files on-premises for more rapid recovery.
The granular move capability of these next-generation solutions can significantly reduce the size of the on-premises backup data set. It can also significantly decrease the size requirements of the cloud storage tier, in comparison to image-based backup solutions, since it is not continuously copying the same data to the cloud.
A vital component of these next-generation unstructured data protection solutions is to maintain the metadata about the backup on-premises. An on-premises copy of the metadata information means that the organization can execute all queries without accessing cloud copies. The result is the backup software responds to queries quickly, without incurring egress fees from the cloud provider, and the organization is enabled to use the deepest archive tiers available from cloud providers.
Using public cloud storage for backup storage makes sense for a lot of organizations because of its potential to reduce costs. The challenge for IT is making sure the backup software tiers the right data sets to the cloud. Typically organizations won’t see any savings by tiering structured data, and if they do, they may put meeting recovery expectations at risk. Unstructured data, if the backup software can protect it granularly and move it to the cloud granularly, can realize significant savings by leveraging public cloud storage intelligently.