Moving data from one tier of storage to another to drive down costs is the foundation of data management but there comes a point when too many tiers is a problem, especially when each tier of storage comes with strings attached. In the cloud, these strings come in the form of extra charges to access the data and time waiting for data to become available.
Betting Against the House
Most cloud providers provide instant and unlimited or nearly unlimited access to their primary storage tier and primary object storage tier. The major cloud providers, including Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure have all expanded the number of storage tiers they offer to their customers with a focus on low cost archive storage. The challenge is that each subsequently lower cost tier has somewhat odd service levels to enable the low cost. First, this class of storage from the providers is slower; it doesn’t perform as well as their standard object store or the production storage tier. Second, when the user needs to access data from these archive storage areas, they may have to wait two hours or more to have access to the data. Third, and potentially the most troubling of all, there is a significant surcharge for accessing data from these tiers.
To take advantage of the lower pricing of these deep archive tiers customers need a detailed analysis tool that not only identifies the oldest data but also uses some sort of machine learning to determine the likelihood and cost of future access. Also, it would work with a cloud-based software as a service (SaaS) solution that makes these decisions for them and takes the hit if there is a mistake. The advantage for the SaaS provider is they control the software code and can program into the application the ability to analyze what data to place on what tier. For example, a full service backup solution that stores data in the cloud could automatically control movement to these lower tiers.
Essentially, these SLAs for the customer place a bet with the cloud provider. They are betting that they won’t access their data and the cloud provider is betting they will. This is a bet that customers often lose. Cloud providers don’t provide a lot of detail as to what infrastructure is driving some of these lower costs tiers. It is safe to assume that hard disk based systems still play a role but most insiders believe tape and tape libraries are playing an even more significant role. Given the SLAs used by these vendors and the price points, tape seems like the likely target. Even given the use of tape, the surcharges to customers when they need to access data they never thought they’d access, should be a significant source of profit for the providers. In other words the house always wins in the end.
How About No Tiers?
Creating multiple tiers of storage with each tier being lower priced is the typical way to lower overall storage costs. The challenge is that even one additional tier adds complexity. Unless the application handles it, the customer has to decide what data to place on what tier. The operational overhead of analyzing and placing data makes the low cost storage much less attractive. Each of the major cloud storage providers all have at least four storage tiers and, again, one mistake may wipe out all the cost savings of going to the less expensive tier. For these reasons many customers decide not to take the bet and leave their data on the more expensive tier.
An alternative approach is to just use one tier. To make a single tier work, the solution has to provide both excellent performance and be cost effective. From a provider point of view it also means that the storage system needs to scale to exabytes worth of capacity. These three capabilities being available within a single storage tier almost automatically rules out block storage since it can’t meet the customers cost effectiveness expectation and won’t meet the providers need for scale. Object storage seems like a more likely candidate but the provider needs to rewrite the object storage system so that it is very fast. Given that the object stores used in AWS, GCP and Azure were likely all created over ten years ago it is reasonable to assume that an object storage system written from the ground up using the latest technology may deliver the on the performance requirement.
This is the approach, for example, that Wasabi took. They started a little over four years ago, from the ground up, creating an object storage system that delivered excellent performance, Wasabi claims better performance than S3. In addition to meeting the performance requirement it also still provides the scale and cost effectiveness of traditional object storage.
In a recent episode of Storage Intensity Storage Switzerland sat down with Wasabi’s Director of Product Marketing, David Borland to discuss a wide range of cloud storage subjects including our dislike of egress fees and cloud storage tiering.