The State of Deduplication in 2015

At its core, deduplication is an enabling technology. First, it enabled disk based backup devices to become the primary backup target in the data center. Now it promises to enable the all-flash data center by driving down the cost of flash storage. Just as deduplication became the table stake for backup appliances, it is now a required capability for flash and hybrid primary storage.

The Changing State of Backup Deduplication

The implementation of deduplication in primary storage will follow a similar path to that of backup storage. An IT planner wouldn’t think of buying a backup system without deduplication today, and the overwhelming majority of backup systems have the feature built in. In fact backup appliances have moved on, the debate about if and how to implement deduplication is over. These vendors are now focused on next level features. For example EMC/Data Domain’s source side application agent technology (DD Boost) reduces data being transferred to the appliance. Another example is ExaGrid, which now can host a backup agent on the appliance to minimize network communications and optimize advanced features.

Deduplication in Primary Storage

The next big wave is of course flash/hybrid primary storage. Here, deduplication is becoming a table stake feature. The first group to implement this capability was all-flash array providers like Pure Storage, XtremIO, Tegile and SolidFire. Thanks partly to an aggressive push of deduplication, these vendors were successful enough to get the attention of established storage vendors, to the point that EMC bought XtremIO and HP has added deduplication to their systems.

Today we have not seen a wholesale dedupe adoption of primary storage system providers. Dell, IBM, EMC, HDS and Oracle still dominate the market, but they all need to respond to the deduplication demand. There is still an opportunity/requirement for these vendors to add deduplication because users are sometimes hesitant to abandon ship and switch to a startup. As we all know, IT professionals hate change just as much as anyone else. A change means learning new storage system software and designing data protection processes. But that window of opportunity is closing and legacy vendors need to bring features like deduplication and compression to market quickly or lose their incumbent position in primary storage.

Will Legacy Vendors Develop Their Own Dedupe?

There is little doubt that legacy vendors will need to offer deduplication and compression, especially as flash and hybrid storage becomes more entrenched. The question is “will these legacy vendors develop it themselves?” After all, most of them have some form of deduplication technology in house, but those are mostly focused on backup and not primary storage. Primary storage deduplication’s use case is very different. It should be safe, inline, fast, scalable and resource efficient.

Another challenge facing legacy vendors is that 2015 is going to be a big year for flash deployment in the data center. It might not be “the” year, but it will be a big year. While there are a lot of smart people at these organizations, it simply becomes an issue of time vs. priorities. All these companies have more to develop than just deduplication.

Retro-active deduplication

Legacy vendors need to quickly add inline deduplication and compression to their existing storage offerings to hold off the hard charging startup storage vendors encroaching into their incumbent positions. Storage Switzerland has dubbed this technology Retro-Active deduplication. As we discussed in our article “Retroactive Deduplication and Compression”, the technology allows legacy vendors to be instantly competitive with the startup market. Retroactive deduplication starts as an appliance that sits inline and the customer assigns specific volumes to that appliance for optimization. Permabit is a good example of a company providing this solution and their SANblox offering is now sold by EMC, HDS and NetApp with more to come.

Ret3747 Permabit Homepageroactive deduplication will more than likely evolve into embedded deduplication using the same code in the appliance offering so that existing deduplicated volumes are compatible with new controllers. This evolution can happen at a pace that is compatible with the vendor’s product development and roll-out strategy. It is critical that vendors in the retroactive deduplication space provide an API set that allows this evolution to occur so that data access can be seamless and Permabit has provided that in it’s SDK version.

A Deduplication Standard

This leaves the market wide open for a deduplication standard. One that can be added to existing arrays and potentially interoperates between arrays. Imagine when performing a VM migration between two storage systems to only have to move 10% of the VM’s data since the other 90% already exists on the second array?

The All-Flash Response

How will flash/hybrid storage system vendors respond? It depends. In some ways they should welcome a de facto deduplication standard, they would benefit from the same migration scenario described above. In addition, deduplication is just one arrow in the flash/hybrid vendor’s quiver, they may be more than happy to not have to worry about maintaining this complex code so they can focus on other more glamorous features.


In 2015, deduplication will ride flash/hybrid storage’s coattails right into the primary storage data center. The flash/hybrid startups had an early lead but now, thanks to retroactive deduplication, legacy storage vendors have more than a fighting chance. They can rapidly add deduplication while continuing to leverage their head start in features like tiering, replication and snapshots with a broader plan to embed it and deploy it across their storage offering portfolio. As we end 2015, IT professionals will begin to want primary storage deduplication technologies to work together to minimize data transfer between devices.

Sponsored by Permabit

Click Here To Sign Up For Our Newsletter

Twelve years ago George Crump founded Storage Switzerland with one simple goal; to educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought after public speaker. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , , , , , , ,
Posted in Article
6 comments on “The State of Deduplication in 2015
  1. Joe Ropar says:

    I thought this was supposed to be Storage Switzerland? We have been running deduplication on our NetApp “primary” and secondary storage systems for at least 6 years. We run our VMs on deduplicated NFS volumes and save over 50% on disk space. NetApp FlashCash is also dedupe aware. Not a real vendor neutral article. To read this one would think that NetApp did not even offer this functionality. If you had been running NetApp the capabilities you mention above would seem old hat. I’m very disappointed.

    • John Herlihy says:

      I agree with Joe – one of NetApp’s key advantages in Data ONTAP over the last several years has been its ability to dedupe primary SAN & NAS storage while other players dismissed it due to the performance degradation in their own storage efficiency features.

      Regardless of whether it’s post-processed rather than inline, the storage efficiency numbers are real and have been a big hit with customers for a long time.

  2. Gordon McKemie says:

    George: Dell has recently added “cold” dedup functionality to the Compellent product in the 6.5 product release. It works to compress inactive replay data in Tier 3 storage. I can send you a white paper on this if you want.
    Testing some of the inline dedup in the market we have seen about 30% efficiency, quite a bit less than the 2-1 claims vendors make.

    Gordon McKemie
    Ohio Valley Storage Consultants

  3. George Crump says:

    NetApp has a role in deduplication’s history, but does it have a role in its history?

  4. Jim Haberkorn says:

    Thanks, George, keep up the good work. I’m Jim from HP and live in Zurich. You covered a lot of ground in your article and just wanted to make you aware that HP 3PAR also has dedupe with a nice little twist – it does the dedupe via a custom ASIC and not on the system processors, so it doesn’t impact the system performance. 3PAR was late to the dedupe game and so it does sometimes get passed over in the dedupe discussion. However, it has had dedupe for going on a year now, and we like talking about it. Thanks again for having a forum where these things can be discussed.

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,227 other followers

Blog Stats
%d bloggers like this: