The cost of disk capacity has come down dramatically over the last two decades and, thanks to technologies like scale-out NAS and object storage, the ability to manage petabytes of data is certainly a possibility. But the cost to power and cool this storage plus the cost of data center floor space have increased dramatically. The net impact is that even if storage vendors gave disk capacity away, the cost to maintain that capacity could bankrupt you.
Even if Disk Were Free the System Would Not Be
Primary storage or even disk archives are not just a bunch of random drives plugged into an outlet. They are their own ecosystem of storage software, a compute engine to run software, hardware to house the compute engine and a series of hardware shelves into which the drives are inserted. Great care is taken to make sure that the system is reliable, serviceable and scalable. Finally, there is a cost for the system’s support and maintenance.
While it is true that the cost of disk capacity will continue to be driven down, the actual system cost will never be zero. The customer has to pay for the physical capacity as well as the physical system that surrounds the raw disk drives. A key feature of a storage system is its ability to gracefully expand. But this is also its biggest weakness. As the system scales, users will have to pay for the power, cooling and data center floor space that the storage system consumes.
The Power, Cooling and Floor Space Problem
The long-term care of a disk archive can be daunting. As described above, most of these systems are designed in a scale-out manner so that capacity can be added on a nodal basis. This provides a seamless scaling capability to the archive, many of these systems can now provide greater capacities than most data centers actually need. But each one of these nodes occupies rack space and requires power and cooling.
While the disk archive has a scale-out architecture, most data centers are not “scale-out”. Building or expanding a data center is simply not a viable option for the overwhelming majority of companies. The problem is that expansion requires more physical space, not simply adding another ‘module’ of capacity. This means working within the limitations of physical surroundings.
The floor space and power problems have become so severe that many data centers have a policy that whenever a new device is installed something has to be removed. That policy applied to disk archive would mean that a newer, high capacity node would have to replace an older node, making the scale-out architecture far less seamless and far more expensive.
The Deduplication / Compression Problem
Disk archive vendors may quickly point to deduplication and compression as a means to cure these ills. These storage efficiency technologies can effectively bringing down the cost of storing data by reducing capacity requirements as well as the physical footprint.
However, there are a number of problems with this “solution”. First, most archives don’t deduplicate well. If the archive is managed correctly there should be a limited number of redundant data sets so the archive will certainly not see the effective reduction rate that full backups do. Also, the type of data that consumes most of an archive’s capacity (images, audio and video) is already compressed so additional compression efforts have limited effectiveness.
Another challenge is that deduplication specifically increases power consumption. This is because all nodes within a scale-out storage cluster need to be powered on so they can be checked for redundant data. While in theory this could be programmed around, we know of no system that does it currently.
Finally, the best practice for an archive is to have three independent copies of data; one in the local archive and two off-site, preferably in two different locations. If the archive is entirely disk based, this means multiplying the required disk capacity by a factor of 3.
The 6TB Drive Rebuild Problem
Disk archive vendors might point to the upcoming release of high performance 6TB drives as the potential solution. Again, while these drives will bring down the effective cost per GB somewhat they still won’t be free. The larger concern, however, is that these high-capacity drives are going to aggravate the problem that is already haunting storage administrators – long RAID rebuilds.
If the archive uses a traditional RAID protection scheme like RAID 5 or RAID 6 the time it takes to recover from a failed high capacity drive can now be days or even weeks. During this rebuild time, performance suffers, and the system is exposed to data loss if a second drive were to fail.
The Data Protection Problem
The issues with RAID recovery has led disk-only archive customers to consider some form of data protection, but this is a difficult undertaking. The common vendor solution is to replicate the archive to a second location, which of course doubles the cost of the archive and exacerbates the power and floor space issues.
Customers, looking for a cheaper means to meet the protection requirement, often resort to backing the archive up to tape. This means NFS or SMB/CIFS is mounting the disk archive to a backup server and backing up all of its data on a regular basis. Not only is this process slow and arduous, it also places undue strain on the already-overburdened backup process.
The Cloud Problem
Increasingly, disk archive solutions have been leveraging the cloud as a way to keep on-site growth of the archive in check. While this can appear to solve the immediate power and data center floor space problem, capacity in the cloud isn’t free either. Cloud Storage Providers (CSP) after all must show a profit and eventually make money by charging organizations for the same capacity every month. If you are paying $X per month for 1TB of storage space, you are paying for that capacity over and over again.
Disk Archive Should Support Tape Archive
Even with all these challenges facing a disk-based archive, disk still has a role within an archive strategy. But instead of being the sole performer in that strategy, it should exist to support tape, by leveraging its rapid access advantages over tape. As we detailed in our article “Unstructured Data Meet Tape Archiving Efficiency“, tape suffers none of the problems mentioned above.
The key is for disk archive vendors to develop a way to move data between their platform and a tape archive. While they may see tape as a threat to their business the reality is that data centers are going to be forced to do something besides disk anyway. The physical constraints of power and space can’t be ignored.
The good news is that disk archive vendors don’t need to immediately create direct support for tape libraries. Thanks to tape-integrate network attached storage solutions like Crossroads Systems StrongBox, all they have to do is support migration to an NFS or CIFS share. Once on disk StrongBox can seamlessly move data from disk to tape so that the disk area does not need to be expanded.
The problems associated with a disk-only archive are numerous and worsened as the infrastructure scales. Disk clearly has a role in the overall archive process, but it should be a supporting role where its advantage of rapid access can be leveraged. Tape’s role should be larger, acting as the long-term repository and the place to store a disaster recovery copy of data.
This combination should allow for a simpler disk archive solution that’s cost effective enough to retain all of an organization’s data. The result should not only be a better archive but as we discuss in our article “How Tape Can Fix The Unstructured Backup Problem” one that can greatly simplify the overall data protection process.
Crossroads is a client of Storage Switzerland