The low upfront cost of cloud storage has great appeal. Almost any organization can purchase 100TB of storage instantly and only make the initial payment (month, quarter, and year). However, there is a problem with cloud storage that most cloud vendors hope you overlook. First, unlike processors, which can scale up and scale down as needed, cloud storage is more permanent. That 100TB placed in the cloud stays there, month after month, year after year and eventually decade after decade. Multiply those cost per TB out for a decade and cloud storage gets expensive. Secondly, storage grows. That 100TB doesn’t just stay 100TB, it will grow 50% or more year after year at a compounded rate. In a decade, your periodic bill may have to cover 3.8 PBs (3,800 TBs) of storage.
Don’t Forget Soft Costs
Of course, the value of cloud storage is more than just hard costs. There is also a storage management/administration element to think about, also known as soft costs. When considering using the cloud to store cold data, two on-site processes must be addressed. First, there is the process of how to identify the cold data that will be moved to the cloud. The cloud has nothing to do with that; that is an on-premises function. Second, there is the actual movement of data to the cloud, which is typically done by an appliance that will sit on site and act as a gateway to the cloud. It communicates NFS or CIFS on one side and integrates to the cloud on the other.
What Can Go Wrong With Cloud Storage?
It is important to remember that cloud storage providers are organizations that run a data center and like any other data center, things can go wrong. While protecting the organization is the cloud provider’s responsibility, it is the organization’s IT professionals that need to worry about the provider’s processes and make sure they can meet their service level agreements. Most low cost cloud storage providers, as opposed to data protection providers, provide almost no assurances about protecting the assets being stored in their cloud. Considering that the “archive” may be the last good copy of a data set, making sure it is well protected is critical. The IT professional has to make sure that the cloud provider has proper backup and replication procedures. The alleged cost savings of not having to protect data quickly goes out the window.
A Cloud Alternative
Given the long-term cost issue of the cloud, does it make sense to consider an on-site alternative? An on-site alternative to cloud storage for cold data has to be more cost effective, just as reliable and provide strong data durability. It also has to be easy to use and operate.
Some Quick Cost Comparisons
While we will detail the cost of cloud archive vs. on-site archive in a future column, let’s take a quick glance. On-site costs, once the initial investment is passed should be less expensive, no matter what the archive target is.
That said if the on-site storage repository is disk only it will be difficult to make a strong economic case against the cloud. But a tape-based cold storage system, front-ended by disk for ease of access, may have a distinct advantage. It is substantially cheaper per TB than cloud storage and far more power efficient than on-site disk.
Can Tape Win the Day?
The question is, can tape technology win the day in your data center? The short answer is NO. Tape technology by itself cannot, but combining tape technology with a solution that can identify and seamlessly migrate cold data to an appliance makes the technology more compelling. At the same time, the tape integration solution should not make on-site tape or cloud storage, an either or proposition. It should allow the user to choose. There may be cases where tape and cloud can work together.
With the right software and hardware, interacting with tape becomes no more complex than interacting with the cloud or a file share. Mere mortals could implement and manage this system and save lots of money in cloud costs.
Sponsored by Fujifilm Dternity, Powered by StrongBox
Well, it is true that storing data in a provider’s “bit barn” does generate a “forever” expense, but so does keeping it on-premises where you have to pay the fully burdened life-cycle cost of owning it and storing it. Identifying what data to retain for the long term is an issue. You obviously need a way to analyze the data you are keeping in order to make some determination about what to do with it. Compression and deduplication can only do so much. A startup called Komprise has an interesting approach to this problem. DataGravity is also addressing this issue in a number of ways.
An on-premises approach for archive or cold data storage that involves both disk and tape would be a good solution for data that needs to be kept longer than say, seven years. There should a “rule of thumb” to help make the decision about how long you keep your archive or cold data on disk. The “glue” that will hold this architecture together is the AWS S3 API. The software that runs the analysis against the primary data tier can use the S3 API to move it to a AWS S3-compatible object-based storage (OBS) tier. The OBS tier can run software to analyze data buckets to determine what to move to the archive or cold storage tier, which would be tape. What you get is data life cycle management that works to keep data protected on the most economical storage platform(s) over time.
[…] my colleague, George Crump, discussed in a previous article, “What is Better than Cloud Storage for Cold Data”, cloud storage is great for processing active data but becomes increasingly expensive for […]