Hardware-based deduplication legitimized the concept of backing up to disk. But deduplication alone is no longer as important as it used to be. The problem with hardware-based duplication is the data must be transferred to the backup appliance prior to the efficiency being applied. Many backup software solutions now provide deduplication, compression and change block backup to thin the size of the data transfer, as well as the amount of disk space the backup data set consumes.
Deduplication started as a method to optimize disk backup storage, enabling the price per GB of disk backup to get close enough to its nearest competitor, tape, to justify the advantages of backing up to disk. By moving deduplication to the software stack it can be combined with compression and change block tracking to not only make space consumption more efficient, but also make network transfers more efficient and lower the time that the backup application impacts the application being protected.
Thanks to capabilities like change block tracked backups and block-level incremental backup make even backup software based deduplication less critical. Modern applications require better backup storage targets, than the solutions that have been on the market for the last twelve years.
The Impact of Software Defined Deduplication
Moving deduplication and other data efficiency techniques to software is another part of the movement towards software-defined data center and software-defined storage (SDS) technologies. The same processing-power excess that enables sophisticated SDS solutions to run on commodity servers, also enables backup software solutions to provide these features. Furthermore, by moving data efficiency to the software, the organization also gains flexibility when selecting their backup hardware, a benefit that can significantly reduce overall costs.
Traditional hardware-based deduplication is locked into the hardware a vendor offers. In the modern data center, this hardware lock-in strategy is no longer acceptable. Data centers need the opportunity to leverage the hardware of their choosing, to adopt high-capacity hard disk drives as soon as they become available, and to even support flash media when the need justifies the investment.
If your organization is moving towards a cloud-centric operating, here’s another key consideration: Software-based data efficiency provides flexibility in where the data protection hardware can reside. It can be either on-prem or in the cloud. If your data efficiency relies on specific hardware, you’ve lost that option.
What Should Dedupe-less Hardware Look Like?
Deploying data efficiency within your backup servers provides several other benefits. First, it frees up the backup storage to focus on a single task: storage. Deduplication does extract a toll on CPU and RAM. If that function is deployed on the storage, it naturally increases the performance requirements and therefore the cost of the storage hardware. Locating that function on the media servers drives down overall costs.
The second benefit is the organization can more easily resolve the major challenge that all backup architectures face eventually: scaling to meet capacity demands. The problem with most hardware-based backup appliances is their legacy architecture. One of the limitation is they can’t scale and they force customer to use complex scale-out architectures to overcome the limitation of poor software design. This means they are purchased with excess CPU and memory, and then dumb shelves are added to the main controller. Ultimately, you reach the limit as to exactly how much capacity they can support. At that point you have to either add and manage another system, upgrade the controller of the current system or replace the entire architecture.
Disk deduplication is something that may be best applied in backup software (if at all) – the hardware that used to count on the deduplication feature to justify its existence can do so no longer. This fact, combined with the larger trend towards software-defined storage, explains why organizations need a differing storage strategy that can advance the state of backup storage targets to provide hardware flexibility, scaling and location independence, including the cloud.