Software Defined Deduplication is Critical to the Cloud

The goal of any cloud initiative is to create a cost-effective, flexible environment. The architectures will typically store large data sets for long periods of times, so one of the challenges to being cost-effective is the physical cost of storage. Deduplication is critical to extracting maximum value from a cloud first initiative but the cloud requires a different, more flexible software defined implementation.

Why We Still Need Deduplication?

While the cost per GB of hard disk and even flash storage continues to plummet, when purchased in the quantities needed to meet the typical cloud architectures capacity demands, storage continues to be the most expensive aspect of the design. And it’s not just the per GB cost, it is the physical space that each additional storage node consumes. Too many nodes can force the construction of a new data center, which is a much bigger cost concern than the price per GB of storage.

Deduplication provides a return on the investment by making sure that the architecture stores only the unique data. That not only reduces the capacity requirement it also reduces the physical storage footprint.

Organizations will have different cloud strategies. A few may only use the public cloud. Some may only use private cloud architectures. Most, however, will take a hybrid approach, leveraging the public cloud when it makes sense and a private cloud when performance or data retention concerns force them to. In the hybrid model data should flow seamlessly and frequently between public and private architectures.

If the same deduplication technology is implemented in both the hybrid and public cloud architectures then the technology’s understanding of data can be leveraged to limit the amount of data that has to be transferred, making the network connection between the two more efficient because only unique data segments would need to be transferred.

Why We Need Software Defined Deduplication

The other aspect of a cloud initiative is flexibility so IT can more quickly respond to any issues. Part of that flexibility is defined in the hybrid model itself. The storage architecture is split into two parts. The public cloud owns a section of it and a private cloud owns the rest. While the public cloud has the advantage of low upfront costs, IT can not specify what types of storage hardware, if any, it uses.

The public cloud’s consumer-only model requires all storage services will be available as software only. This includes deduplication, hence it has to be available as a software-defined component of the overall data management solution. Software defined deduplication allows the data management software to execute and manage the data efficiency process, which should allow it to use anyone’s hardware.

Most private cloud solutions will leverage an object storage system as part of the architecture. It may or may not come with its own data deduplication feature but it is unlikely to include a robust data management engine. Implementing a data management software solution that includes a deduplication capability on top of the object storage system provides more flexibility. The organization is free to select any storage hardware. And because it is software, IT can implement the same data efficiency in the cloud, redundant data between private and public cloud does not need to be re-transmitted, improving network efficiency.


Storage costs may eventually get low enough that deduplication is, well, redundant. But that day is not any time soon. In addition, even if storage costs drop to that point deduplication will become obsolete. The greater density that a deduplicated storage node will achieve should reduce the physical footprint of the cloud storage cluster. A hybrid cloud model will also benefit from the network savings obtained by not transferring redundant data. Most critical though is that the technology be software defined so that it can provide the functionality regardless of hardware or location.

Sponsored by Commvault

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: , , , ,
Posted in Blog
One comment on “Software Defined Deduplication is Critical to the Cloud

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 25,542 other subscribers
Blog Stats
%d bloggers like this: