One way organizations can quickly leverage the cloud is through the use of a caching appliance or gateway. These solutions cache data locally so when users access active data they don’t deal with internet latency. When data is created or modified the gateway “instantly” replicates that data to the cloud. Providers of these solutions claim they don’t need further backup, since the modified data more or less gets instantly replicated to the cloud and then is also replicated to another site within the cloud. Do they need to be backed up? Maybe…maybe not.
I recently had a chance to attend Nasuni’s annual customer event to talk to a few of their customers. Nasuni’s product is an appliance that presents an NFS or SMB mount to the customer and copies all data stored on that appliance to the cloud storage vendor of your choice. The main partners are Amazon, Microsoft, and IBM. Initially, the appliance stores all data in both locations, on-premises and in the cloud. But at some point, the appliance runs out of capacity and needs to decide which files to keep on the local cache, and which files to store only in the cloud.
Once you reach this point, the appliance stores some of your data only in the cloud, specifically files not accessed in a certain number of customer-definable days. (You can also specify other rules that would “pin” certain data sets to the cache, so they would always be resident there.) So the question is, does the data only in the cloud need to be backed up? Put another way, does the value of the data created over n days ago and hasn’t been accessed in n days exceed the cost to backup that data? With caching appliances, data accessed within the last n days will be stored in two places: on the local appliance and in the cloud. So if something catastrophic happens to the data in either place, the other copy will still be available. But the same cannot be said of the older data stored only in the cloud.
One way around not having to answer this question is to size the local caching appliance to be big enough to hold all of your primary copies of data. Previous versions of those copies, and historical versions of deleted files would be stored in the cloud. Then the only thing that is not being protected (besides the built-in protection afforded by the cloud vendor) are essentially backups. One could argue these backups are neither less protected than backups stored in a cloud backup vendor, and they would be right.
But let’s not let ourselves off the hook so easily. Sizing the local caching appliance as previously described will significantly increase the cost of the appliance, which is why Nasuni representatives tell me their typical customer has an appliance big enough to hold only “active” data. Active, of course, is defined by every customer, but I inferred from their comment that it is a small subset of the data discussed in the previous paragraph.
Back to the question: Does data that resides only in a cloud storage vendor need to be backed up? Cloud storage, in general, is built to be self-healing, and the durability numbers for the major cloud vendors are quite impressive. To my knowledge there has not been any data loss caused by these cloud vendors themselves, such as a rolling code bug or something like that. There have been outages, to be sure, but they have not corrupted any user data to my knowledge. This doesn’t mean that nothing could happen (or has happened) to data in the cloud. Even if you assume data stored in an object storage system – regardless of manufacturer or service provider – will protect itself against corruption or natural disasters, it will never be completely safe from hackers or rogue administrators. Ask the people at codespaces.com about that, as hackers deleted their entire company when they wouldn’t pay ransom. The only copy of data was in Amazon, and that was the end of that.
There are ways to send data to multiple providers, but unfortunately all of the methods of which I am aware of only mirror data between like systems, which doubles the cost. What would be very beneficial is a WORM copy into cold storage, such as Amazon Glacier, which is ¼ the price of Amazon’s standard offering. Glacier also has a WORM offering that can protect your data from accidental erasure or purposeful erasure by hackers or rogue admins.
Another feature I’d like to see added to any such cloud caching products is two-factor delete for anything other than moving files into a trash bin. Do you want to delete an entire bucket, folder, or permanently delete some files? I think you should not be able to do that without two or maybe three authorized people turning the key, just like the two keys in a missile silo or a safe deposit box at your bank. This would be an easy feature to add, in my opinion.
I know I still haven’t answered the question. If money were free, then my answer would be a definite yes. I’ve said it before and I will say it again: If it is important enough to store once, it is important enough to store twice. (And storing it in a single object storage system doesn’t count as twice even if it uses replication.) If it’s not important enough to store twice, then I’m not sure why we would store it in the first place. But that’s a purist point of view, and does not take into account that there are now classes of data we are keeping around for a very long time that have very low actual value to the company. It may very well be that the data that is stored in a single cloud vendor may be lower than the cost of storing it in multiple cloud vendors. If the cost to the company of losing any particular data set is less than the cost of backing it up, than even I would say you don’t need to back it up. Plus given the fact that, to date, I can’t find a case of a cloud provider losing a client’s data, my concern over data loss may be considered old fashion.
But if you the data you store only in a single cloud vendor has value greater than the cost of storing it in a second cloud vendor, I would urge you to consider doing so. Activate the mirroring function of your cloud caching appliance. Better yet, look into third-party solutions to do it for you, such as a cloud backup company that will backup data as soon as it is put on the local caching appliance. That way you are protecting yourself from a rolling code bug in the cloud vendor and the cloud caching appliance vendor. In addition, I think all storage vendors should look into two-factor delete. This would also protect against malware such as ransomware, if any previous versions of a file or object that was changed are stored somewhere, and those saved versions were not able to be deleted by any single person — including an all-powerful administrator. Problems solved; world peace obtained.