Historically, tape as a storage medium in the open systems world has had issues with performance, complexity, reliability and cost, compared with disk systems. The mainframe market has encountered many of these same issues with tape as a backup solution, but also as a batch processing storage platform, where specialized, high performance tape formats are used. Deduplication, combined with disk backup systems has alleviated many of these issues in the open systems world. With its incorporation into disk libraries, deduplication can provide these same benefits to mainframe systems and even some additional capabilities.
Compared with disk arrays tape is slower, due to seek latency and tape handling. The mechanical nature of tape drives and cartridges makes it less reliable and in most cases it doesn’t support data redundancy techniques like RAID. Handling the large numbers of cartridges can be inefficient and adds to operating costs and management overhead, especially with the large libraries common in mainframe environments. Tape format changes, which are the primary method for improving performance, add to costs in media, drive hardware and data migration. They also add operational complexity as steps are taken to ensure that the old format can be read for the duration of the data retention period. Finally, disaster recovery using tape can be more complex, more costly and more apt to encounter problems than that of disk-based data protection systems.
Mainframe’s use of tape as a semi-active storage platform for batch processing has led to the development of high speed, high duty-cycle tape formats. While these fast access, high throughput tape drives are certainly capable of storing backup data, it’s hardly a cost-effective use for these specialized assets. In some data centers this has led to the use of both linear tape (LTO) and these high performance drives. But the net effect of this dual role for tape has still been an increase in the overall cost per GB of data stored on tape.
Functionally, tape in a mainframe environment has most of the disadvantages it has in open systems. And, given the fact that mainframe systems often have some of the largest tape library infrastructures, the cost issues inherent with tape can be even greater. With data growth still a fact of IT life, the prospect of adding more backup data to the mainframe tape infrastructure would just make these problems worse.
Disk improves tape backup
Disk backup systems have helped with many of tape’s inherent shortcomings for years. As a front end cache, or Virtual Tape Library (VTL), disk provides a random-access storage pool to accept multiple backup data streams and enable them to be serialized for transfer to tape. Adding deduplication to these disk-based systems has extended capacities significantly, improving their ability to support larger enterprise environments, like those that also have mainframes. In addition, this capacity increase gave them a ‘restore from disk‘ ability that helped create a compelling alternative to tape-only infrastructures. But another option is a ‘disk only’ backup system or disk library. These solutions can also leverage deduplication, as VTLs did to increase backup performance and provide some additional functionality.
Dedupe for mainframes
Systems like the EMC Disk Library for mainframe (DLm960) with the Deduplication Storage Expansion option, provide an effective data reduction of well over 10x for backups, significantly reducing the per GB cost of storage and increasing the effective capacity by the same factor. Disk libraries also leverage capacity-centric disk storage formats, like SATA, which can reduce operational costs (power consumption and data center footprint) as well as capital expenditures over traditional mainframe-class disk.
Deduplication helps make a disk-only backup option feasible. Rather than simply providing a front-end cache for a tape-based system, this 10x effective data reduction can provide the capacity to store enough backups to replace a tape library altogether. For many environments, the elimination of tape can bring another set of benefits around reduced operational and support expenses as tape libraries and tape drives are taken out of the equation. Like the VTL scenario, the disk library provides faster backups, as multiple data streams can be established and run directly to disk. But the disk-only scenario also eliminates the complexity of scheduling and running tape copies on the back end and the handling of tape cartridges.
Disk libraries improve other areas as well. Since no delays or bottlenecks related to tape handling, tape mounting or linear file seeking are experienced, the speed of restores from disk is significantly faster than a tape system. And, the potential reliability issues inherent in tape media can be addressed by disk-only backup which employs RAID data protection, as well as other sophisticated data integrity processes.
The same data reduction that deduplication provides also improves the efficiency of any off-site replication activity by reducing bandwidth requirements. This makes DR more cost effective and reduces the storage capacity needed at the remote site.
Similar to backup, mainframe’s use of tape as a temporary data repository for batch processing operations can also be replaced by a disk library. But given that this process is essentially a caching operation, the use of disk instead of this special tape format can produce even more dramatic results. And deduplication’s ability to expand the capacity of a disk library just makes it more cost-effective.
Deduplication has proven to be a game-changing technology for disk backup as it has increased effective capacities an order of magnitude or more and led to a dramatic simplification in disk backup architectures. It has helped to make disk-to-tape backup systems work in the open systems world and helped make ‘tapeless’ backup feasible. In the mainframe environment, deduplication offers these same efficiencies and capacity expansion which have served to make disk-only backup a more cost effective alternative to tape and VTL environments. In mainframe disk libraries deduplication can provide the fundamental improvements needed to make a tapeless infrastructure a better option in the mainframe data center.
Data Domain is a client of Storage Switzerland