The use of tape has certainly declined in today’s data center, but it has not disappeared – much to the chagrin of some people who are self-professed “tape haters”. They simply don’t understand why anyone would use tape in a modern data center.
So this latest entry in our backup basics series will explain the reasons why tape continues to be a smart choice for certain applications. The short list includes speed, cost, and data integrity – especially over long periods of time.
Before saying good things about tape, it’s appropriate to mention tape’s downfall: the mismatch between a serial access device (e.g. tape) and a random access device (e.g. disk). Modern tape drives require a constant stream of data at a high speed (>700 MB/s for LTO-7) in order to function properly. Disk drives – especially incremental backups from said drives – are not capable of supplying a stream anywhere near what tape needs. As a result, the tape drive performs sub-optimally and takes all the blame for this low performance and reliability. Unfortunately, what tape is the worst at is what we have been using it for for a really long time – incremental backups.
Setting aside what tape is bad at for the moment, let’s talk about the tasks tape is good at performing. Tape is better at writing data and holding onto data for long periods of time then disk is. The reader may be surprised to learn modern tape drives have an undetectable bit error rate (UBER) higher than that of any modern disk or flash drive. Consider the following table that demonstrates that tape is literally 10,000-100,000 times better at data than disk.
Tape is also much better at holding on to data for long periods of time. Any research into the concept of coercivity (colloquially referred to as bit rot) will show you that data stored on disk for longer than five years is not reliably retrievable. In contrast, tape is able to hold on to data for up to 30 years with no bit rot.
Tape’s limitation of needing a very fast stream of data is also one of its greatest strengths, leading to the phrase “never underestimate the bandwidth of a truck full of tapes.” 700 MB per second is over 2.4 TB per hour, or over 60 TB per day. A bank of ten LTO-7 tape drives can copy a petabyte (PB) of data in just over 1.5 days onto 66 cartridges. They can then be fit into a box that is 12” X 12” X 8” (31 cm X 31 cm X 21 cm) weighing under 30 pounds (13 kg). Fedex says you can overnight that to anywhere in the United States for under $300, after which the organization can read that data off those tapes with another bank of 10 tape drives in about 1.5 days as well. Including writing the data, shipping the tapes, and reading the data, that’s about 100 hours to transfer 1 PB of data. Even a very expensive 10Gb – if you were able to actually fully utilize such a connection – would take over nine days to do the same task.
The final area where tape has always excelled is cost. Unlike disk or flash, tape allows you to separate the media from the drive. The higher your tape to drive ratio is, the more this is likely to save you money. A properly managed tape system should cost you less than the cheapest disk system, even including Amazon Glacier and Google Coldline. In fact, there are vendors that compete with Amazon by leading with tape as their primary storage system, and they offer competitive solutions at less than half the cost of Glacier.
As I mentioned in the opening paragraphs, tape is really bad at receiving incremental backups. I would add that they’re not that great at receiving full backups, either. They are still faster than what most disk arrays are able to supply in a full backup scenario. This leaves two applications that play to tape’s strengths: large-scale data transfers and long-term archiving.
Industries that do large-scale data transfers as a day-to-day operation can find a lot of benefit from tape. Direct transfer from disk to disk is certainly easier, but will only scale to a certain point given available bandwidth and the ability to pay for. Other companies can use tape’s ability to transfer lots of data by using tape to send a copy of their backups offsite. If backups have already been transferred to some type of disk system, that system should be able to reliably transfer that data to tape at a speed that keeps the tape drive happy. These tapes can then be transferred offsite using a traditional vaulting vendor. This does not match the elegance of a backup system that uses the duplicated disk and replication, but it should be less expensive.
Finally, the longer you store data, the more tape makes sense. If there is data that you write once and then store for 30 years, there is no way that any disk-based system can approach the cost of storing that data on tape. You would, of course, store such data on multiple tapes and store that data in multiple locations, just as you would with disk. Tape may be better than disk at storing data, but it is not perfect. So make multiple copies.
Someone once said that after a nuclear apocalypse, you will be able to find a tape salesman and a mainframe salesman. Hopefully this blog article helped you to understand why. If properly used, tape is more reliable than disk at storing data and way more reliable at holding onto it for long periods of time. It’s also less expensive. Until someone designs a device that matches the reliability, speed, and cost of tape, it’s here to stay.
Sponsored by Commvault