As my colleague, George Crump, discussed in a previous article, “What is Better than Cloud Storage for Cold Data”, cloud storage is great for processing active data but becomes increasingly expensive for storing cold data that is seldom accessed. While we have previously examined a few weaknesses of cloud storage such as latency and bandwidth issues, we have not really examined the actual costs of cloud storage in any detail to see the potential costs of storing large quantities of cold data and archive data long term in the cloud, or retrieving any of that archived data until now. There is a reason that many organizations are now starting to question their decision to store large quantities of cold and archive data in the cloud long term.
Doing the Math
First, we will examine what a relatively large amount of cold data archives would cost if stored long term in the cloud.
Checking long-term storage pricing from several major Cloud Service Providers (CSP) we find that the lowest price is $.007 per GB per month for archive class storage.
If an organization needed to store just 100TB of data in the cloud for the next 10 years, what would it cost? At $.007/GB per month, the numbers would look like this:
- Average Monthly Cost for 100TB stored in the cloud is $700.00
- Annual Cost for 100TB: $700.00 x 12 = $8,400.00
- Cost for 100TB for 5 years: $8,400.00 x 5=$42,000.00
- Cost for 100TB for 10 years: $8,400.00 x 10=$84,000.00
If the same organization needed to store 500TB of data in the cloud, the numbers would look like this:
- Average Monthly Cost for 500TB stored in the cloud would now be $3,500.00
- Annual Cost for 500TB would be $42,000.00
- At 5 years, the same 500TB would have cost $210,000.00
- At 10 years, that 500TB would have cost $420,000.00
Additionally, if that same organization needed to store 1PB of data in the cloud, the costs would be:
- Average Monthly Cost for 1PB stored in the cloud would be $7,000.00
- Annual Cost for 1PB would be $84,000.00
- At 5 years, the same 1PB would have cost $420,000.00
- At 10 years, that 1PB would have cost $840,000.00
These base costs are simply for storing the data quantities listed above and do not take into account that data will typically continue to grow 50% or more each year at a compounded rate. Also not included in these costs are other additional transaction costs that are incurred when data is accessed, deleted, downloaded or transferred to another region or location.
Costs to retrieve archive data are often complicated to compute and can result in substantial charges depending on the quantity of data to retrieve, how the data was stored (individual files or large archive files) and how it is retrieved.
Ordinarily cold and archive data is seldom accessed but unexpected e-discovery requests triggered by litigation from private or government entities can result in the need to retrieve significant amounts of data in a very short time frame. Additionally, many organizations are beginning to see potential value in their historical data and this is leading to data mining operations to monetize that historical data, which can also require the retrieval of substantial amounts of data.
There are also some other considerations such as cloud side encryption; who controls the encryption keys, the actual chain of custody of your data and security. Data that is stored electronically on any network is always potentially susceptible to hacking attacks.
A good example of this from a couple of years ago was Code Spaces, a code services hosting company that was well known in the IaaS (Infrastructure as a Service)/DEVOPS community, and that was forced out of business by a hacking attack. This company was state of the art in 2014 and had the bulk of its resources and infrastructure fully in the cloud. This included its backups, some of which it considered as being off-site because multiple copies were distributed throughout their CSP’s network. However, a hacker was able to gain control of their CSP account and when Code Spaces tried to regain control, the hacker proceeded to destroy almost all their data, machine configurations, virtual machines and their backups. Their backups may have been off-site but they were still online. Unable to recover their data, the company was forced out of business. This incident underscores the importance of having off-line backups of your data as well as proper security and backup protocols.
But What about Tape?
A check of the web shows a current average price for LTO-6 tapes at approximately $30.00 each. Using native capacity, this makes the cost per GB for tape approximately $.00012, which is less than cloud storage cost. There is also one other key difference between tape and cloud storage. The cost of cloud storage is a recurring one that you keep paying month after month, year after year, decade after decade, but the acquisition cost for tape is a one-time expense.
Storing 1PB of data on LTO-6 tapes at their native capacity of 2.5TB each would require 400 tapes. At $30 each, that would be a onetime cost of $12,000.00. At the compressed capacity of 6.25TB each, you would only need 160 tapes for a onetime cost of $4,800.00.
Another cost factor with tape would be for offsite vaulting at a vendor that provides a secure climate controlled facility with proper chain of custody safeguards. The cost of this type of service will vary depending on the vendor, number of tapes being stored, how they are stored, and charges for each pickup and delivery trip as well as any fuel surcharges.
Using figures for one secure off-site vaulting service, storing our 1PB of data on 160 LTO-6 tapes, in containers, with one pickup/delivery trip per day yielded the following costs:
- Storing 160 containerized tapes at $0.89 per tape is 160 tapes x $0.89=$142.40 per month
- Storing 160 containerized tape for one year would cost $142.40 x 12 = $1,708.80 per year
- Storing 160 containerized tapes for 5 years would cost $1,708.80 x 5 = $8,544.00
- The cost at 10 years would be $85,440.00
Additional ongoing costs that would also need to be considered are for tape hardware and maintenance as well as new media. Since LTO-6 has been out for a few years now, most if not all organizations using tape libraries have already upgraded their drives so the main cost here would be for maintenance on their existing hardware. These costs will vary by organization depending on their service providers and the hardware being covered and response times the organization requires.
Another soft cost will be for the worker that handles loading and rotation of tapes to and from the library and the vaulting facility. This will also vary by organization depending on their hardware setup and tape rotation requirements.
Something else to consider is that well established large organizations and some of the larger SMBs (Small and Medium-sized Businesses), that have been around for over a decade or more and that have large quantities of data to store, also have data centers, infrastructure and storage, including tape libraries and cloud gateway appliances. Therefore, they have already invested in the things that cloud storage might save in costs for an organization that does not have them. It is also fairly certain that these organizations are not likely to rip out and throw away all this equipment.
Tape today provides very compelling features such as low acquisition cost, backwards compatibility, scalability, high performance, longevity, high capacity, and portability. The LTFS open tape format and backward compatibility with earlier LTO versions, helps ensure that you will be able to read and restore data in the future without the need for proprietary applications.
More importantly tape can also provide a final line of defense against hacking attacks and data corruption that may affect copies of data stored on disk in the enterprise or in the cloud.
Ultimately, each organization will need to examine closely the costs of storing their cold data and archive data locally versus in the cloud to determine which strategy, “rent vs buy”, will be the most cost effective for them.
Sponsored by Fujifilm Dternity, Powered by StrongBox
Well, the case for storing archive data in a public cloud doesn’t hold up once you get beyond the apparent “cheapness” of it. That’s the seductive part of the proposition is the way people get sucked into gradually parking a lot of archive data in the public cloud. And then one day they realize they are going to be paying a non-trivial amount of money to their public cloud storage provider forever. At that point, but hopefully before that point, people will get a clue and plan on keeping their archive data on-premises.
A local storage cloud used for warm and cold data should be able to make use of an archive tier based on LTO/LTFS tape libraries. I’m not keen on shipping tapes to some mountain or salt mine for long term storage. Tapes can and do get lost in shipment, and tapes can be stolen or damaged in transit.
More to the point, the Spectra Logic DS3 Black Pearl, introduced in October 2013, provides for tiering data from an object store to an LTFS tape library using the DS3 Black Pearl appliance. The interesting thing about this is Spectra Logic created a couple of extensions to the AWS S3 API to deal with tape operations, so that their tape libraries can act as an object store. This is a good way to treat archive data. Spectra Logic makes an SDK available for anyone who wants to create a DS3 Black Pearl client along with a simulator for testing it.
Storiant, formerly Sage, is focused on storing archive data using their software, HDDs and the ZFS file system all packaged in PB+ scale cabinet enclosures. Storiant can keep nearly all of the HDDs in a power off state until they need to retrieve some data, which will presumably greatly extend the life of the HDDs and save on electricity costs. They also claim to prevent the “bit rot” associated with the long term use of HDDs for data storage. They don’t rely on LTFS tape libraries, but compete with it.
Archive data will only be increasing as a relative percentage of all the new data being generated. Having useful and affordable ways to keep this data on-premises is a good thing.
You have to remember that the cloud storage providers lower their prices periodically, sometimes more than once per year – a fact that should be material to a 10-year TCO discussion. You also need to remember there’s a soft cost with keeping your long term archived data on current generation (or relatively) tapes. If I backed something up 10 years ago on LTO2 tapes, can I restore it now? I either need to maintain a restore environment with older hardware (which tons of admin costs) or I need to move the backed up data to newer tapes.
I am investing a hybrid cloud approach with gateways that add deduplication and compression. For my financial clients that are skittish on the cloud, their data archives to on-prem object storage and replicates offsite (to meet the offsite data storage requirements). For everyone else, warm data goes to Amazon S3 and colder data goes to glacier. Remember, after deduplication and compression, that $.01/GB/mo becomes more like .003/GB/mo.
Looking at HDS HCP, Scality, Cloudian, Netapp Storage Grid for the object storage
Looking at Netapp altavault, Panzura, Nasuni and others for the gateway.
Looking at Amazon, Google and Oracle for the cloud storage.
Dave, I agree that you need to be aware that a given LTO drive can usually only read LTO tapes going back 2 generations. Your LTO-2 tapes could be readable by a LTO-2, LTO-3 or LTO-4 drive. The LTO-2 tape could not be read by a LTO-5, LTO-6 or LTO-7 drive. You would need a LTO tape duplicator to keep your older tapes readable by newer LTO drives. Starting with LTO-5 you can use an LTFS format on the tapes, which may be more useful over the long term when dealing with archive data.
In terms of your “looking ats” list for object storage, gateways and cloud storage, I can give you a somewhat biased recommendation in one word…Cloudian. Cloudian is the only fully AWS S3-compliant (all 51 S3 operations) object storage vendor in the market, outside of AWS itself. Cloudian will also be releasing a “Panzura-like” global file management gateway call HyperStore Connect for Files in March. It will cost substantially less than Panzura’s gateway. And you can tier data from a Cloudian cluster to AWS S3 or Galcier. One vendor, one throat to choke, and complete confidence that any service or appliance that works with AWS S3 will work with Cloudian. Enough said.
Nothing new: https://thestoragetank.wordpress.com/2016/02/04/cloud-vs-tape-keep-the-kittens-off-your-data/
Strange tape pricing here. No drives, no frames, no maintenance, no software, assumes your tapes will always be 100% utilized, no assumptions that you would need like hardware in a DR site, no labor for tape generation migrations, etc. etc. Ever seen the picture of the iceberg with the tip exposed and the bulk of the ice under the surface?
Technology has progressed over time from a mechanical to a digital state. Tape libraries systems are grounded in the mechanical world and retain the limitations of mechanics, that from a former tape bigot. The primary purpose of tape has been to store and protect data more cheaply than using disk.
With the data durability of cloud technology (on or off prem or hybrid), one has to wonder why we still struggle with all of the hidden costs associated with tape operations and media mgmt., or the very concept of backup, for that matter. It would seem that archiving with versioning would cover a lot of the same ground here without multiple copies, saving considerable sums of money.
If your employees work for free, and your customers don’t mind waiting, tape is definitely a bargain. If you don’t mind finding out your backups are corrupt during an emergency restore while your web based order system is down, tape is fine. If you don’t mind having one current copy of things in transit, tape is wonderful.
B rad Jensen CEO LaserVault
Tape people shouldn’t have to resort to the same distortions that disk people have used over the years.
The savings for tape are significant but there have been a large number of expenses left out of the calculations.
1. Without adequate tracking mechanisms that integrate between the backup software, the robotic libraries and the offsite vendor there is always always a 10% error rate between where people think the tapes are and where they actually are. Without this tracking, and the associated costs there is a high chance of data loss.
2. There is the cost of maintaining the backup software licenses to be able to restore the data. These costs are not trivial.
3. Backup software is often decommissioned and the cost performing re-inventories on these tapes can be as high as $400 a tape.
Only with the correct inventory, chain of custody and backup software licenses can these the true cost benefit of tape be realized. These costs are not huge, but they need to be added, if for no other reason than people knowing they need to spend this money.
I agree wit Gerald, but his accounting of backup related costs barely scratches the surface.
The problem with the post is it ignores the fundamental differences between backup data – data stored for the purposes of disaster recovery – and archival data – data stored for long term retention. These formats are opposites. The argument holds only if those differences are ignored.
Take only one obvious issue. Redundancy has great value when the goal is to restore data to a specific point in time as quickly as possible, but creates enormous waste when data is kept under a long term retention policy.
I’ve outlined in detail the difference between backup formats and archive formats here: https://www.linkedin.com/pulse/real-difference-between-backup-archive-tim-williams, and how differences impact the relative TCO of cloud storage vs tape storage for long term retention here: https://www.linkedin.com/pulse/seven-reasons-why-tape-isnt-cheaper-than-disk-tim-williams.
I think you’ll find the nature of data is changing very quickly.
The days of ASCII and EBCDIC text as the only important data are all but over and this will bring tape back into the mix of primary data again.
If you consider a hospital storing X-Ray or CT Scan data, this data will be cached to disk as it is captured but very quickly written to tape. It makes absolutely no sense to keep this on disk for more than a day.
Moving this data off to tape isn’t backup, and it isn’t archive, it is hierarchical storage.
The costs aren’t astronomical to manage tape properly, but they are a factor and they should be included.
As I said, tape people don’t need to twist reality to push their arguments. I’m a tape guy, that’s not to say I don’t use disk, but my comments were not intended to support your one sided narrative.
A lot of the problems that have been associate with tape have been caused by people not properly understanding the technology, the need for the technology and the future applications of the technology. The blame for this falls equally between those who have trashed tape as a technology and those who have failed to competently advocate for tape.
Thank you for all the comments. We are working on a complete response. Short answer is this was designed to get people thinking and talking. Mission accomplished. Second, I’m really getting tired of the “tape corruption” comment. When properly managed tape does not have any more corruption issues than hard disks. Waiting is also not an issue if expectations are set correctly. It depends on the use case. On the strange pricing, the post clearly says that we did not factor that in. If you are using the cloud for Archive deduplication better not work or you have poor data management policies. Compression yes, if the cloud vendor passes the compression savings on to you AND if the data we are talking about is compressible. We can’t predict the periodic lowering of cloud prices, and if we did tape still has a compelling advantage.
In short tape is not perfect for all use cases, nor is cloud. IT has to make the choice based on their SLAs and budgets.
Well, there are clear and meaningful differences in costs and operations when it comes to using tape for backup and using tape for archive. BTW, Google’s last resort for restoring Gmail boxes is tape.
Your backup product should be able to move data off LTO2 to LTO3 as a background job, I operate backup software that has data contained within it that was originally written to DLT8000 and now sits happily on LTO6.
I’ve seen corruption both on disk and tape… the difference is it does not cost much to have two copies on tape, two disk systems… disk systems exposed to firmware/software bugs at the same time….
Yes cloud storage gets cheaper over time….. so does tape…. I can buy a LTO6 tape now for half the price I purchased an LTO5 tape for 3 years ago…. and remember the value of my $ has dropped due to inflation…. so that’s less than half the cost to store xTB than it was 3 years ago.
My backup software and tape library maintenance have gone up a bit but the move from LTO5 tapes/drives to LTO6 tapes/drives did not double the maintenance costs, yet I got double the space, in the same negligible power footprint… DC’s love my low power/cooling libraries…. and I love that they double in capacity with every LTO format change.
Your tape math doesn’t accurately take in to account generational upgrades needed to keep refreshing the LTO technology to keep if backwards readable. Industry average is an upgrade on alternate generations, e.g. 2->4->6 or 3->5->7. This is a considerable cost that is not included versus cloud.
Also, costs and *time* to retrieve from a physical vaulting service. How are you tracking those assets that are offsite? Where is the cost of the asset management system, database, or spreadsheet maintenance?
The list goes on.
Math from a tape company makes cloud look expensive
Math from a cloud company makes tape look expensive
Forgot to add, using Compressed capacities for the tape math further skews the numbers. Bulk data archives, particularly in M&E, will already be ‘compressed’ prior to archive. So you need to use raw capacity on LTO for a fairer comparison.