A lot of people worry that a deep archive in the cloud will be about as useful as the boxes in the deepest part of your attic. The data will be stored in an economical way and the cloud vendor will ensure its preservation – but whether or not the data actually serves any future purpose is highly questionable. The deeper and cheaper the archive, the less likely someone will use or recall it.
This concern that data will go unused is one of the reasons people tend not to archive in the first place. They worry that if they put data anywhere other than online, immediately accessible storage, they will never use it again. Projects that might be able to use the data to bring value to the company will be unable to find it and therefore either re-create the data or go without it.
IT must overcome this fear in order to accomplish the very important task of archiving data to less-expensive storage, and unfortunately the fear is greatest when the archiving destination is the cloud. Simply moving the data to the cloud with no gateway, management interface, or searchability, justifies the fear. But what if a company can make cloud storage appear the same as local, onsite storage? This could assuage these fears altogether.
The fear of the cloud is twofold: object storage is still an unknown to many people and applications, and the performance of the data in the cloud would be extremely variable based on the throughput and latency to the cloud provider. This is why cloud gateways are becoming so popular, as they solve both problems.
Cloud gateways typically talk NFS and SMB, allowing regular users and applications to write data to a POSIX compliant file system they already understand. But a gateway is just the first step, it merely makes the connection. What is really needed is Cloud NAS. In this implementation recently written and recently read data stores in the local cache, and all data asynchronously goes to the cloud provider of your choice. The Cloud NAS seamlessly translates between the S3 API (or other cloud protocols) and NFS/SMB. It then creates a global file system that provides transparent access to data whether it stores on a local NAS or archived in the cloud.
A Cloud NAS system allows customers high-performance access to recent data while ensuring that all data is stored in the seemingly infinite capacity of the cloud. Again, data in the cloud through the Cloud NAS is accessible with the same mount point and pathname that stored it in the first place, regardless of which cloud provider the data is in. If a particular set of data was archived to the cloud and then went unused for a long time, it would eventually move out of the cache to make room for more recent data. However, as soon as users start accessing the older data via the same mount point where they placed it a long time ago, the data would be immediately copy to the local cache for easy and quick access.
Since most cloud providers are delivering multiple classes of storage, Amazon offers S3 and Glacier for example. The Cloud NAS system should also support tiering that data in the cloud. Data could potentially be written to the high performance tier that may have less expensive access fees and then eventually push it to the cold tier that has very low storage costs.
Cloud NAS makes a lot of sense as it virtualizes a lot of the complexities of using cloud providers by giving customers a simple mount point to write to via the same POSIX-compliant interfaces they already know with NFS and SMB. Customers do need to understand that the local cache needs to be appropriately sized to hold all concurrent workloads, but this is a relatively easy thing to do compared with the complexities of managing the amount of storage they need if they were not using the cloud as deep archive.