Hardware-based deduplication legitimized the concept of backing up to disk. But deduplication alone is no longer as important as it used to be. The problem with hardware-based duplication is the data must be transferred to the backup appliance prior to the efficiency being applied. Many backup software solutions now provide deduplication, compression and change block backup to thin the size of the data transfer, as well as the amount of disk space the backup data set consumes.
Deduplication started as a method to optimize disk backup storage, enabling the price per GB of disk backup to get close enough to its nearest competitor, tape, to justify the advantages of backing up to disk. By moving deduplication to the software stack it can be combined with compression and change block tracking to not only make space consumption more efficient, but also make network transfers more efficient and lower the time that the backup application impacts the application being protected.
The Impact of Software Defined Deduplication
Moving deduplication and other data efficiency techniques to software is another part of the movement towards software-defined data center and software-defined storage (SDS) technologies. The same processing-power excess that enables sophisticated SDS solutions to run on commodity servers, also enables backup software solutions to provide these features. Furthermore, by moving data efficiency to the software, the organization also gains flexibility when selecting their backup hardware, a benefit that can significantly reduce overall costs.
Traditional hardware-based deduplication is locked into the hardware a vendor offers. In the modern data center, this hardware lock-in strategy is no longer acceptable. Data centers need the opportunity to leverage the hardware of their choosing, to adopt high-capacity hard disk drives as soon as they become available, and to even support flash media when the need justifies the investment.
If your organization is moving towards a cloud-centric operating, here’s another key consideration: Software-based data efficiency provides flexibility in where the data protection hardware can reside. It can be either on-prem or in the cloud. If your data efficiency relies on specific hardware, you’ve lost that option.
What Should Dedupe-less Hardware Look Like?
Deploying data efficiency within your backup servers provides several other benefits. First, it frees up the backup storage to focus on a single task: storage. Deduplication does extract a toll on CPU and RAM. If that function is deployed on the storage, it naturally increases the performance requirements and therefore the cost of the storage hardware. Locating that function on the media servers drives down overall costs.
The second benefit is the organization can more easily resolve the major challenge that all backup architectures face eventually: scaling to meet capacity demands. The problem with most hardware-based backup appliances is their scale-up architecture. Meaning they are purchased with excess CPU and memory, and then dumb shelves are added to the main controller. Ultimately, you reach the limit as to exactly how much capacity they can support. At that point you have to either add and manage another system, upgrade the controller of the current system or replace the entire architecture.
Object storage is gaining traction as a backup target, and scalability is one important reason for this. Scaling is something that object storage systems provide more naturally. They were built to scale through a cluster of storage servers (nodes) enabling organization to pay as they grow.
In addition object storage that supports the S3 protocol can seamlessly interconnect to the cloud. The organization can leverage that connection for something a simple as a disaster recovery copy or can subsequently leverage cloud compute to work on the cloud-based data.
Disk deduplication is something that may be best applied in software – the hardware that used to count on the deduplication feature to justify its existence can do so no longer. This fact, combined with the larger trend towards software-defined storage, explains why object storage is rapidly gaining acceptance as a backup target. It provides hardware flexibility, scaling and location independence, including the cloud.
Sponsored by Cloudian