Enterprises want more value from their archived data assets but tape doesn’t really enable the real-time archive that organizations need. The challenges is that any enterprise in existence for 10 years or longer more than likely has terabytes, even petabytes, of data on tape media. The problem is tape-based archives can’t really be queried or indexed in real-time. Enterprises need to de-tape the archive so they can get more value from their archived data.
Is it Worth it?
The first question when starting any initiative, but especially a migration project, is to determine if it is worth the effort. There are two parts to that equation, what’s the worth and how much effort will there be? The worth part will vary by organization. Is the data in the archive because of a compliance or regulatory requirement? Or was it stored just in case the organization might need it again? Can old data be used to compare against today’s data to help with decision making? Or can that data be re-purposed so that it has some monetary value?
One example is old test results data the organization can compare against new test data to determine long-term wear and tear on a design. Another is a recorded version of a decades old sporting event the organization can sell to avid fans.
Still another example are companies that store archives of seismic data for oil companies. When fracking came about the oil companies recharacterized their reserves using the new technology and old data. The data was on tape, but customers wanted faster access to it, so the companies are migrating to disk now. Fifty years of seismic data on the move, not because the data changed but because the technology to analyze that data changed.
The other part of the new archiving process worth is examining is the value of querying this information at a moment’s notice. For example, a sports hero announces retirement. Does quickly creating a montage of their greatest highlights increases the chances someone will buy that montage? Another might be when a doctor walks into a patient room instantly having medial charts available that are even decades old.
The other side of the “is it worth it” question is how much effort will it take to move the data from tape storage to a new modern disk target. It is safe to assume the software that created the tape archive is likely still accessible, but can modern storage present itself in such a way that the library can access it. Can the data be organized in some way on its way to the new destination? The faster and more organized the process can be the better.
The Weakness of Tape for Active Archive
Tape will, for many organizations, continue to play a key role in their overall archive strategies. But as these archives become more active, tape’s role probably shifts to be the backup for the archive not the archives primary store. The problem is not that tape is slow, tape is actually quite fast on bulk transfers. The problem is data on tape is hard to index, hard to query and slow to respond to thousands of small, random requests. Disk archive excels at all of these operations.
While most organizations can start archiving data to disk, the real challenge is how to bulk load decades of data from disk to tape.
How to Move Data?
Most organizations have been migrating the data in their tape-based archive for years. Standard best practice is to move data from one generation to another to protect against media degradation and bit rot. Most organizations perform this migration every other media generation. They can leverage that same process to move data from tape media to a disk archive.
Many archiving systems that originally supported tape for the archive store now have added support for the Amazon S3 protocol. Support of S3 by these archive software solutions was originally intended to move archives to Amazon cloud storage. While cloud storage is very compelling upfront the long term costs of renting PetaBytes of storage is a concern for many organizations.
The good news is a few modern object storage solutions support the S3 protocol. That means that the organization can purchase an on-premises cloud storage solution based on object storage and not have to pay the repetitive and never ending license costs.
Of course not all archiving software solutions support S3 but most support NFS or SMB mount points, which object storage systems tend to support.
Selecting the Right Disk Target
The competitor to object storage is network attached storage (NAS) systems. NAS systems are well known to IT professionals so there may be less resistance to their use. In addition these systems do support high capacity hard disk drives. The problem is they typically only support NFS/SMB as protocols, not S3. While there are certainly scale-out NAS solutions they typically aren’t designed for the massive scale of object storage. And those systems that scale-out, do so on proprietary not commodity hardware like object storage, so their cost per node is typically much higher.
Active Means Active
NAS does have its challenges when trying to be the active archive store. The key differentiation is how the data will be processed when it is moved to the active store. The first active archive challenge facing legacy NAS systems is the sheer number of files (objects) that it will store. Many of these active archives will store billions if not trillions of files. Most NAS operating systems weren’t designed to store that quantity of files. Object storage systems though, because of their flat design can store an almost unlimited number of objects.
The second aspect of active archive is how the analysis will be performed. Most data analysis is done by modern applications like Hadoop, Splunk, Spark among many others. For the most part these applications need to connect to data either through a S3 or an object storage interface. Both of these interfaces are provided by object storage.
There is a perfect storm heading at the data center. First, the ability to capture data points, thanks to sensors and machines, is at an all time high. Second, there is a legitimate need to not only retain old information indefinitely, there is also an equally legitimate need to access and process that information in near-real time. Tape has a role, but the active archive needs to be on disk. Of the available options, object storage provides the ideal match to the requirements of these active archives.
Sponsored by HGST