The cloud is increasingly becoming the disaster recovery option of choice. In fact, it’s really hard to argue with the cost and functionality available from the various cloud vendors that are available today. Recovering to the cloud seems even better when you consider what has happened in August and September of 2017. Hurricanes have ravaged Texas, Florida, other Gulf states, Puerto Rico and the Virgin Islands. Fires have destroyed many buildings in California and other western states, and an earthquake ravaged Mexico City.
Storage Switzerland and its partners are doing our best to respond how we can to the incredible humanitarian crisis caused by the hurricanes and earthquakes. Our hearts go out to the individuals personally impacted by these crises. If you’d like to know more about our clothing drive, search the #Storage4Houston tag. With the acknowledgment that there is real human suffering happening at this moment, Storage Switzerland still feels these tragedies give other companies a chance to re-examine how they would respond to such a disaster.
As this article is being written, there are many data centers that have not recovered from the disasters that hit them days or weeks ago. The companies that are hit the worst are those that relied on regional solutions, where even the recovery data center or tape vaulting company were also hit by the disaster.
It is true that the devastation caused by Harvey, Irma, José, and Maria is greater than previous hurricanes, especially if you consider them together. The damage caused by the Mexico City earthquake is also hard to fathom. But there have always been hurricanes, floods, and earthquakes that take out entire regions. In addition, there are man-made disasters such as what happened on 9/11. Sadly, there were companies that had their hot site in the second tower. Those companies cease to exist on 9/11. It’s always been a good idea to get your recovery data as far away from your data center as possible; it’s just historically been hard to do.
Disasters like these are why we have disaster recovery plans, and such plans have always needed to take significant amounts of physical damage and flooding into account. The reason the cloud is becoming so popular for disaster recovery is the old ways of doing things have many limitations that are addressed by the cloud.
The Old Days And The Old Ways
In the old days we made tapes and handed them to an Iron Mountain driver (or similar company) who safely stored them off-site. Those tapes never went very far, and most of us never had to use them to recover from a disaster. If we did have to do so, the recovery would likely take a lot longer than would be acceptable – even by the standards of the time.
Recovering data via removable media required multiple steps before you could even begin the recovery. First, you have to restore the data center to some type of normalcy. Floods have to recede, fires have to die down, and roofs or walls need to be put back in place. In most disasters, hardware will need to be replaced. Once you have a fully powered, air-conditioned data center with fully functioning servers and storage, you can actually begin the restore. That is, if you remembered to contact Iron Mountain while you were rebuilding your data center. Otherwise, you also have to wait for that as well.
Once you have a fully functioning server, storage and media to restore it, you can begin the restore. That, of course, will take a significant amount of time. You will most likely need to prioritize critical servers, and a recovery of your entire data center will most likely take weeks or months.
Things have certainly come a long way since the entire world used removable media. Modern backup and recovery and disaster recovery now utilize disk-based protection and replication. But most companies still only replicate to another building within their own campus. And the disasters that have happened in the last few weeks really illustrate how risky doing that is. Each of the recent disasters were capable of taking out both copies of data of companies using that method.
What is needed is to get applications backed up to a much more remote location that is immediately available when a disaster strikes. For most companies, this is only possible with the public cloud.
The Public Cloud
Recovering to the cloud solves many problems. Not only can you make sure your data is far away from any disasters that might affect your company headquarters, you can even make sure that it is in multiple locations. If all of the companies affected by recent disasters has used the cloud for recovery, they could have immediately resumed operations, even as their region was still recovering.
Using the cloud for recovery ensures that your company can function even if your headquarters are unable to do so. Modern VPN technologies can allow your employees and contractors to work wherever they can get an Internet connection and power. While it is true that some of your employees will be impacted by the disaster and will not be immediately available for work, those who are able can work as long as they can connect to the Internet.
Cloud Recovery Options
There are three primary ways to use the cloud for recovery: replicated traditional backups, instant recovery from deduplicated backups, and instant recovery from non-deduplicated backups. Each comes with advantages and disadvantages.
The first method, replicated traditional backups, is a lot like the old way of doing backups. Customers backup their data to some type of deduplication system that replicates those backups to another location. In some cases, the data is replicated to another deduplication appliance located in a different data center of the same company. But some products are able to store these deduplicated backups in the cloud and use them for recovery in the case of disaster.
The downside of the replicated traditional backups method is traditional backups require a traditional restore. The cloud helps in that it can easily provide the infrastructure for the VMs you will restore. However, the company will still experience significant downtime while it waits for the VMs to be restored.
The second method, instant recovery from deduplicated backups, is a lot like the first method in that deduplicated backups are sent to a cloud storage provider and are used in a recovery. The difference here is backups are stored in such a way that VMs can run directly from the deduplicated backup data. VMs wouldn’t need to be restored; they simply need to be restarted from the backups.
The advantage of this method over replicated traditional backups is it does not require a restore in order to restart the data center. Assuming the proper steps took place prior to the disaster, the VM infrastructure should simply be able to be restarted at the recovery location. One disadvantage to this method is there may be performance challenges when running multiple VMs from a deduplicated backup store. In fact, depending on the deduplication technology, you may even struggle to start a single VM with decent performance.
Another disadvantage is the backups contain images of VMs of a certain hypervisor (e.g. VMware VMDKs). This means it would require the same hypervisor infrastructure at the recovery site. Historically, this would mean a customer would not be able to use a vendor like AWS or Azure as their recovery site, since they do not offer VMware. But this should be less of a problem moving forward as VMware is now available on AWS.
The final method, instant recovery from non-deduplicated backups, is when a vendor continually updates an un-deduplicated block image in the cloud. This solves the performance challenges with the deduplicated backups method, because the recovery image is stored in a native block format. Theoretically, this method could easily recover an entire data center without a performance issue.
It can also solve the second issue of the hypervisor limitations, because the recovery image can also be converted to whatever format is needed for a different hypervisor. For example, it would be possible to backup a VMware environment and convert the images to AMI format to be usable in AWS.
Companies that support this idea of continually updating an image usually do so incrementally; meaning each backup incrementally copies the new blocks to the recovery image so that it is up to date as of the most recent backup. The downside to this method is it requires storing one or more copies of your data on cloud block storage, versus using the less-expensive object storage that cloud vendors offer. In Amazon terms, this means storing at least one copy of your data in EBS, instead of storing all of it in S3.
Cloud recovery will become the standard recovery mechanism for most companies. It simply makes too much financial sense. Companies that only support replicated traditional backups will be forced to adapt to instant recovery in the cloud. Companies that only support instant recovery from deduplicated backups will be forced to either address the performance limitations of doing so, or convert to the final method of updating a non-deduplicated image that is only used in a disaster. In a few years, the only companies left doing their own recovery infrastructure will be the largest enterprises that can build it and manage for less money than the alternatives.