The difference between Retroactive Deduplication and Software Defined Storage

Posted on January 12, 2015 by George Crump

Short of buying a new storage system with storage efficiency (deduplication and compression) built in, it is difficult to add the technology to an existing storage array. Today there are two ways to accomplish this. First, you can wait for your storage vendor to add the capability and hope it is backward compatible. Second, you can add a software defined storage layer to replace your current storage software, but leverage your existing storage hardware. Now there is a third option, retroactive deduplication, which adds storage efficiency to the existing storage hardware but does not replace its software.

Flash Motivations

Storage efficiency (dedupe, compression and thin provisioning) has been available for over a decade and while it has seen widespread adoption for data protection and WAN acceleration, its adoption for use in primary storage has been limited. Part of the reason for this was there was not a compelling reason for the storage platform vendors to drive down the cost per gigabyte of hard drive based systems.

Flash has become popular in the data center because of its performance capabilities which leads to denser virtual infrastructures and more scalable database environments. But this performance comes at a premium price. As a result, there is a need to drive down the price per GB of the technology to close the price gap between HDD. Deduplication and compression, storage efficiency, was the obvious choice. Storage efficiency made flash storage more affordable and flash storage made storage efficiency perform better. But until recently, the implementation of storage efficiency required a new storage system with the technology integrated or the addition of a software defined storage (SDS) layer to add the feature to existing technology. Recently retroactive deduplication technology has been introduced.

The Difference Between SDS and Retroactive Dedupe

Retroactive deduplication adds storage efficiency features like deduplication and compression to an existing storage array, allowing it to remain competitive with newer storage systems that integrate flash and storage efficiency. But IT planners often confuse retroactive deduplication and SDS as being the same thing. As we will outline in this article, they are not.

Instead of replacing all the features on the array, retroactive deduplication adds storage efficiency features to the current feature set, complimenting what is already in place. SDS on the other hand abstracts the storage software from the storage hardware. While SDS can leverage existing storage array hardware it typically replaces the storage software.

Keep Your Features

Most of the top, in terms of market share, storage vendors do not offer storage efficiency (deduplication and/or compression) in their core product offerings. But the rest of their storage software (snapshots, cloning, and replication) is robust, application aware and well understood. It is also tightly integrated into the data center’s operational practices.

Many SDS solutions do offer some limited storage efficiency while their other features are often not as complete nor as well tested as this software on legacy storage arrays. Most robust features require a complete commitment to that platform. When added to an environment with existing arrays, the storage capabilities on those arrays need for the most part, to be disabled.

Retroactive deduplication on the other hand typically runs on a dedicated appliance that sits in between the storage array and the storage network. It adds storage efficiency to the existing storage platform feature set with which IT is already comfortable. There is no change to operational processes or scripts that have been written or no additional processing load because the processing is done on the appliance.

There is also no need to “backfill” for features that the SDS solution may not be providing yet. A good example is replication for disaster recovery. Many SDS solutions have yet to add this feature, or if they do have it, it has been recently added. By comparison many legacy storage solutions have had this capability for over a decade. An SDS solution without this key capability forces the IT planner to go looking for a replication only software solution and forces the customer to go with another software solution to provide that functionality. And while many exist, buying another piece of software means additional acquisition costs, software licensing costs and another software management point.

Retroactive deduplication is again additive. There is no need to backfill with yet another software product, nor is there the need to learn and test another solution. Not only that, but retroactive deduplication actually makes the current replication software more efficient by eliminating redundant blocks that is has to consider when performing replication.

Purpose Built Storage Efficiency

SDS runs storage efficiency along with all the other tasks on the same processor. In the modern storage infrastructure, the cumulative impact on processor resources can be significant. Features like caching, auto-tiering, snapshots and thin provisioning are involved in every I/O which means that the available compute processing power can quickly be consumed.

By comparison, retroactive deduplication is typically implemented on a dedicated appliance that is purpose built for performing deduplication and compression on I/O streams. This means the system has properly provisioned network bandwidth and plenty of processing resources. Most importantly storage efficiency is all the appliance does; it is not loaded down with storage features that would also compete for compute resources or impact their performance.

Permabit Technology’s SANblox storage efficiency appliances for example, are capable of absorbing 180K IOPS each. Considering that most environments will have a pair installed for redundancy that is 360K IOPS for the first pair. This performance can be expanded with a scale out approach, leading to millions of potential IOPS.

Instant Storage Efficiency

The single biggest advantage retroactive deduplication has over SDS is its instantaneous nature. With a retroactive deduplication solution the appliance is installed into the storage infrastructure, new volumes are created and assigned to the appliance and storage efficiency begins. There is no need to learn a new set of software tools or features.

Conclusion

For data centers looking to move to a new way of managing storage or to integrate new storage with old software, SDS certainly has its usefulness. But there are plenty of data centers that like their current storage solutions and the capabilities they have. They are simply looking for a way to cost effectively add flash storage and/or to drive down the cost to store tier two data.

Storage efficiency is ideal for this use case, but implementing primary storage data efficiency has typically meant the purchase of a new storage solution or adding a SDS layer. Retroactive deduplication allows IT planners that like their current storage array solution and simply want to add storage efficiency features with no performance impact or additional processor load while continuing to leverage existing vendor based feature sets that are well understood and have been utilized for many years.

Sponsored by Permabit Technology

Click Here To Sign Up For Our Newsletter

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Compression, Deduplication, Efficiency, Flash, HDD, Permabit, Purpose Built, Retroactive Deduplication, SDS, Software Defined Storage, Thin provisioning
Posted in Article