Thanks to the requirements of Big Data, compliance and regulatory controls organizations are now faced with the daunting challenge of storing a much larger amount of the data they create for a significantly longer period of time. Also, the need to store multiple copies of that data for protection has increased because old data may have new value in the future. The retention and protection demands in many cases now span decades, which in terms of IT storage might as well be forever.
Scalable disk storage systems, that leverage a cluster of servers to “scale-out” have the potential to meet these raw capacity demands, but are they the correct solutions for storing what could be petabytes of information for decades? Tape would be the ideal, more affordable alternative, but concerns about its performance, interchangeability and reliability have kept it on the data center sidelines.
With Generation 5 of the LTO format, tape is more often being included in the data center game plan. LTO-5, along with hard work from tape library vendors like Spectra Logic, has addressed the issues of reliability, while also improving performance. LTO-5 also introduced LTFS. As we discussed in our article “What is LTFS” this new format for LTO allows for USB- (“thumbdrive”) like interchangeability between systems. Tapes can be mounted on any workstation or server with an attached tape drive.
LTO’s next generation, LTO-6, promises to move tape from just being included in the data center storage game plan to a starring role. Storage Switzerland believes that most organizations will benefit greatly by a hybrid strategy that leverages scale-out storage systems and tape-based systems as articulated by the members of the Active Archive Alliance.
What is LTO-6?
LTO-6 is the next generation of LTO tape drives and media available from vendors of the LTO Consortium and library manufacturers like Spectra Logic that choose to integrate the technology. The sixth generation is expected to be released late in 2012. This generation is a significant upgrade from the previous generation that was mostly capacity focused. The sixth generation of LTO features the now expected doubling of capacity as well as a significant performance improvement.
LTO-6 will store 3 TB of information, native, and 8 TB with compression. Its transfer speeds will be 210 MB/s, native, and 525 MB/s with compression. LTO has a significant density advantage, more capacity on a per form factor basis, when compared to the typical hard disk drive. It also has a significant throughput advantage on a per device basis, one that’s twice the capacity per slot than what a disk drive can now deliver.
As mentioned above LTO-6 will continue the LTFS tape format that was introduced in LTO-5. LTFS tapes can now be read by foreign systems without the need for the software that originally created them. This brings flexibility when using LTO-6 as the core of a storage strategy, allowing individual tapes to be transported between systems and environments.
LTFS is a significant milestone for tape initiatives. With it there is no lock-in to a particular vendor or archive software strategy, something that is critical as data is stored for potentially decades. The cold reality is that the archive system chosen today may not be around in 20 or 30 years, but LTFS solves this problem by allowing the movement of data between different archive systems.
The density, transportability, interchangeability and the performance of LTO-6, with LTFS, can significantly change how data centers deal with the data growth explosion should they opt to take advantage of it. It should allow them to address the need to keep all data for decades (or forever) without breaking the IT budget. In short with LTO-6, tape should move out of a backup role and play a key role in all facets of storage, including primary data storage.
Comparing LTO-6 To Scaleable Disk For Long Term Retention
Disk system development methods, like scale-out storage and object storage, allow for seemingly limitless capacity per system. In addition, technologies like deduplication and compression have made the data stored on those systems more efficient. In other words, now more data can fit in the same physical capacity and as more space is required the system can scale to virtually any level required.
Unlike disk, tape has not had to undergo foundational changes with advanced technology in order to meet today’s capacity demands. Instead of having to create a potentially complex cluster of servers to act as a storage farm, tape has been able to maintain the same evolutionary track that has continuously increased capacity and speed on a per media basis. Also, innovative tape automation vendors like Spectra Logic have leveraged the evolving world of robotics to create tape libraries that can support larger, denser configurations while improving fetch times and cartridge selection accuracy.
As each capacity challenge presents itself, disk requires creative innovation like scale-out storage, object storage, compression and deduplication. Tape on the other had has followed the more natural evolution described by Moore’s Law; doubling capacity and performance every 12-18 months.
The storage infrastructure that implements a data retention strategy can no longer be the ‘lazy’ storage area that casually returns data eventually. While it doesn’t need to outperform high-end storage systems, like solid state, it does need to provide consistent and predictable performance. This is the motivation that has led many organizations to initially consider a disk-only strategy.
The scale-out systems described above can increase performance as nodes are added because capacity and performance is included in each node. But the performance added is designed more to maintain performance consistency as the capacity increases. Traditional disk systems will lose performance as the same controller pair has to drive more disks. These scale-out architectures are designed to hold PBs worth of information for decades, while providing the same performance on the first day as they do a decade later.
Tape offers much the same consistency of performance that scale-out storage systems do. Tape performance does not degrade as more cartridges are added to the environment because tape doesn’t need to power and have online access to every single piece of media.
It is the upfront performance of tape that typically causes concern with the platform. Now, especially with LTO-6, tape is actually faster than disk in many operations, especially streaming reads and writes. With the advent of LTO-6, tape will be significantly faster than disk. LTO-6 can transfer data at 210 MB per second native or 525 MB per second compressed.
Where disk does have a performance advantage is for random I/O operations. In these situations tape will have to sequentially scan the media, where disk has random access. Clearly this could have an impact when delivering data from the more active sections of an archive. While even in random I/O situations tape can sequentially scan media in about a minute, most environments will be well served by a hybrid approach that leverages a smaller scale-out storage system and a large tape library.
Comparing Data Protection
All types of media will degrade over time and with use. Recording media can also degrade faster as its density is increased. So while the capacity per disk drive is increasing, the likelihood of an error on that disk drive is also increasing. This means that data protection schemes like RAID 5 and RAID 6 will become increasingly important as drive densities continue to increase. As the chart below shows, while tape also has a bit error rate, it’s not increasing at the pace of disk’s error rates.
Tape is actually becoming a more reliable medium than disk as data densities increase.
Tape’s key advantage is its off-line nature. First, it’s significantly easier to manually transport tape compared with disk, which requires data replication. This helps when dealing with large capacity data sets. While it may not be as elegant to ship PBs of data in a FedEx package it’s certainly faster than doing so electronically over a WAN connection. With LTO-6 this means that a single cartridge slipped into an overnight shipping box can deliver 8 TB of information in less than 24 hours. Not too many organizations have a WAN connection capable of doing the same electronically.
Meeting the cost reality
Finally and probably most important is dealing with the cost ramifications of storing all data for decades. This is where disk, despite its help from technologies like deduplication and compression, has a fundamental disadvantage when compared to tape. The key challenge for disk technology is that the electronics required to control the drive have to be repurchased with every piece of storage media. This of course adds significantly to the cost.
Tape on the other hand spreads this expense of drive electronics over hundreds of pieces of media. The media has no significant electronics on it and therefore is less expensive per unit. LTO-6 increases this advantage thanks to its ability to store 8 TB per cartridge as mentioned above.
A second cost disadvantage that disk has when compared to tape is the very online nature that makes it so desirable for random I/O. Every drive has to be online with power attached to it. Even though disk systems targeted at the archive market can grow very large and have the potential to store limitless amounts of information the entire storage system typically needs to be powered. It also means that all the physical systems need to be housed in the same facility. While there are some power down techniques like MAID and even nodal sleep within a cluster, these systems still need to be “awakened” frequently for reliability analysis.
Tape on the other hand, unless the cartridge is in the tape drive and is actually being used, requires no power. Compared to the sleep method described above OFF is far more power efficient. Tapes can sit on a shelf inside the tape library for years without ever needing to have power applied to them except for the occasional integrity check.
The data center should choose not to involve itself in the vendor battles of disk vs. tape, propagated mostly by the disk vendors. Instead IT managers should focus on the job at hand, finding a way to retain as much information, for as long and as reliably as possible. A mixed approach that leverages a smaller scale-out disk storage system back ended and potentially integrated into tape may be the ideal combination to provide a cost effective retention area that is both responsive and cost effective.
Spectra Logic is a client of Storage Switzerland