The cloud seems like an ideal place to archive data. It provides a pay-as-you-grow model and enables the organization to begin to shrink the size of its on-premises storage footprint. The problem is, though, the three big providers (Amazon AWS, Microsoft Azure and Google Compute) don’t provide a turnkey archive experience. It is hard to get data to them and even harder to get data out of them. To some extent, this is as it should be. These services are focused on providing Infrastructure as a Service, not Solutions as a Service. The problem is IT needs a solution.
The On-Premises Archive Problem
Every vendor in the archive market can create a return on investment scenario that shows the archive paying for itself months after its implementation. The problem is all of these vendors are somewhat guilty of fuzzy math. The typical ROI scenario is based on the fact that 80% or more of an organization’s data has not been assessed in years and is eligible for archiving.
The vendor will then suggest moving all this data to a less expensive archive storage. The pitch is the organization will save “millions”. The organization’s number one objective to the archive strategy is a concern over access. Users will revolt if they don’t get instant access to their data, no matter how long it has been since it was accessed. This concern was legitimate when most archives were tape-based and most primary storage systems were hard drive based. Now, though, most archive storage is often disk-based or at least disk front-ended, thus recall performance from the archive versus primary storage are almost negligible.
Disk-based archives have been around for over a decade, which means the recall problem has not been an issue for at least that long. Why still has there not been an overwhelming embrace of archiving? There is another issue that needs addressing: The storage system that is required to store archived data is the big challenge.
To achieve the price point and to meet the theoretical future capacity and scaling demands we see that most archive storage hardware is scale-out, which means a relatively large initial system investment. Most vendors want the initial purchase to be 100TB’s or more. The problem is that no IT professional in their right mind is going to, day one, archive 100TB’s of data to archive storage. Not only is this jumping into the deep end of the pool, but it is also unnecessary. Remember, the organization already has the storage. It’s bought and paid for. Thus, the reality is that a large capital investment to acquire 100TB’s of new archive storage is simply unattractive because it means empty arrays that will take years for most businesses to consume. Moreover, those systems will fall apart and break long before that happens.
This 100TB archive storage system takes up data center floor space. It needs power, cooling and maintenance. It may take up less space than those primary storage systems, but remember those probably didn’t immediately get thrown out. In many cases, it is a net loss in data center floor space when an organization embraces archiving. Finally, the archive storage system itself needs some form of protection, which either means a tape backup or a second archive storage system in another location.
The Cloud Archive Appeal
An optimal storage strategy involves gradually archiving data as it ages to secondary storage. This concept offers significant economic advantages as primary storage systems come off of maintenance or are fully amortized. This gradual data movement also means that IT will only be sending a few TBs at a time to the archive tier proactively, not hundreds of TBs in a reactive panic. The cloud is the storage “system” that can be purchased TBs at a time instead of hundreds of TBs at a time. Further, that storage is off-site, which means no additional data center floor space, power or cooling is required. Any maintenance is done by the cloud provider and replication is a simple checkbox option.
The Cloud Archive Problem
Unfortunately, all is not perfect with using cloud storage for archiving. The big public cloud providers are good at delivering the infrastructure, and while there are programmatic tools to store and retrieve data those tools are difficult to implement into the current data center process. The missing link is a software solution that can drive the process for IT in a way that seamlessly extends current storage investments, whether they be Windows, SMB, or NFS, and integrates them to the cloud storage capabilities in a convenient and secure manner. Essentially, the software component of traditional archive but built for the cloud era.
The other problems with cloud archiving have been lock in and a low-value storage experience. How can you manage your data in the cloud? How can you export or recover in an efficient manner? New cloud-era archiving solutions born in the cloud are finding new ways to simplify how we archive data to the cloud, as well as perform recall and recover from the cloud. Furthermore, they take advantage of cloud capabilities to solve the data management problem with elastic compute for things such as content indexing, storage analytics, video and audio transcription, sophisticated compliance and data governance, as well as other services.
If the cloud seems like the right place to archive your data, join Storage Switzerland and HubStor for our on demand webinar as we compare public cloud providers’ archive capabilities as well as provide IT professionals with a checklist of what to look for in a software solution that completes the picture.
All registrants receive a copy of Storage Switzerland’s latest white paper “Cloud Archiving – Amazon Glacier vs. Microsoft Azure Archive Blob Storage.”