In a recent article “The State of Deduplication in 2015” we took a look at how deduplication is impacting the data center. Deduplication continues to change how data protection is performed and it is dramatically changing primary storage, most notably the adoption of flash and all-flash arrays. As you can read in the comments to this article, a couple of readers took exception to our treatment of NetApp, so it seems fair to discuss NetApp’s role in the state of deduplication.
NetApp has an important place in deduplication’s history. They were one of the first vendors to introduce deduplication on a primary storage system, and given the state of deduplication at that time, the way they implemented deduplication made a lot of sense. NetApp chose to apply deduplication in a post-process manner, meaning that after data had been written to disk and had aged a little bit, it was processed to see if the data was unique.
Also, in that timeframe there were a lot of arguments about whether deduplication should be inline or post-process. At the time there were performance issues when inline was used. These have evaporated as dedupe techniques advanced and processor performance has improved. In both backup and primary storage a case could be made either way for the technology. But in the current and future data center, for primary storage, I don’t believe that case can be made.
The future of primary storage is memory based storage. Today, and for the foreseeable future, that means flash. Flash has two basic challenges that slow adoption, one is price and the other is durability. I believe that inline deduplication will extract maximum value from flash.
When it comes to durability, post-process deduplication offers no help. In theory post-process would help maintain the performance of an all-flash or hybrid storage system. But it’s very rare to hear a complaint about all-flash/hybrid performance as I discuss in the column “Should you be able to turn All-Flash Deduplication off?”. Again, for most data centers, price and durability are the top concerns when looking at flash based storage systems.
In fact, a case could be made that post-process deduplication actually makes durability worse, because data is written to flash and then optimized later. Optimization in this context means identifying redundancies and eliminating them. When you erase on flash, you perform a write operation, so post-process deduplication on a highly redundant dataset could actually double the amount of writes that the NAND flash cells incur.
I’m glad that both readers were happy with their implementation of NetApp deduplication and they were using it the way I would design it if I were using a NetApp system and wanted to leverage deduplication. The flash used was read-only. I’ve spoken to many NetApp customers who were not as happy.
Conclusion
NetApp clearly has a place in deduplication history, but the article was about the state of and future of deduplication, and it is my personal opinion that at this point NetApp could have a role, but they will need to adopt a more modern deduplication approach that is more resource efficient and can be performed inline. That is not to say that NetApp FAS and E series solutions are not excellent products, but I believe the deduplication technology they use will be from a third party deduplication supplier in the future.

Greetings Mr. Crump,
Thank you for the response. I am sure I have no clue what innovations NetApp may have in store for us in the future but based on their track record my opinion is that they will be significant. My future is typically the next five years and I “can” see into that crystal ball. And what I see is 98% of my data still on disk fronted by dedupe aware flash supported by a complete suite of “integrated” data protection and storage efficiency features.
Regards,
Joe
I’m working from memory here, but it seems Samsung says you can rewrite their entire Pro SSD seven times a day for ten years. So at least in that instance, post process deduplication wouldn’t make a significant difference.
I think post process deduplication can be valuable when applied to backup storage.
Our clients want to shrink their backup times as much as possible. And they want affordable deduplication.
For SMBs doing up to 3 or 4 TB of backup daily, Microsoft’s Server R2 deduplication can be a good choice since it is included in the operating system. Using commodity hardware – a Dell 7020 i7 with hyperthreading turned off, and an Akitio rackmount raid 10 USB 3 storage module with WD RED NAS 4 TB drives, we see deduplication rates of 400 billion bytes and hour or more on second day and later backups.
Replication to external drives or across the net makes it easy to keep several copies of their entire backups.
We are also very impressed with the Windows inline reflation, which does not seem to slow down restores.
There will always be a place for high end brandname deduplication appliances, but Microsoft post process deduplication is a capable solution too.