Those considering the purchase of a storage system that advertises deduplication as a feature need to know what real inline dedupe is, because it matters quite a bit. According to the online SNIA dictionary, there are two types of deduplication: inline & post-process. Inline dedupe is “data deduplication performed before writing the deduplicated data.” Post process dedupe is “data deduplication performed after the data to be deduplicated has been initially stored.”
This is a binary condition, kind of like pregnancy. One cannot be a little bit pregnant. A product is either inline or it is not. Either a product dedupes the data before it writes it to storage or it dedupes it after they write it to storage. The problem comes from some negative brand equity that comes from the term post process – especially from the backup world. While you can make a solid case too that the post process architecture is clearly superior in many situations, its inefficiencies –such as the requirement for a large landing zone and the extra I/O it requires – proved easy targets for those marketing inline solutions, and so a lot of people feel that inline is better.
Fast forward to primary storage dedupe, where the advantages of inline dedupe become even more pronounced. People don’t want to buy an extra shelf of high-priced flash for a landing zone, and they don’t want to have to schedule large imports to the landing zone the way you can do in a backup system. They also don’t want to schedule the dedupe process.
The first vendor to offer primary dedupe, NetApp, chose to implement a post-process approach because it fit well within the OnTap architecture. The landing zone was actually part of the volume so it didn’t require extra configuration planning from that perspective.
Other vendors chose to implement and market an inline approach. However, there is at least one vendor with a post process architecture marketing what it does as inline. This blog post will not mention the name of the vendor for two reasons. The first is that it is not the point; the point is to educate everyone about the differences and why it matters. The second is that Storage Switzerland did not research the details of the dedupe architectures of all primary storage vendors, so it would be unfair to expose only one vendor if there are others doing the same thing. (Having said that, the vendor in question claimed that everyone does what they do and that is definitely not the case. There are true inline vendors in primary storage.)
In the backup space, it’s easy to tell the difference between inline and post process products. Post process products require a large landing zone and require administrators to manage the scheduling of the dedupe process. It must write each block of data that needs deduplication to the landing zone, read from the landing zone, and then possibly go to its final destination – for a total of two-three IOPs for each block needing deduplication.
By contrast, an inline product dedupes a given block before it leaves primary memory, which tends to be NVRAM to protect the data in case of power outage. The dedupe process runs and the decision is made as to whether or not it is new or a duplicate. If it is new, it is written to storage (one IOP). If it’s a duplicate, it is discarded (zero IOPs). Either way, metadata is stored – for a total of zero to one IOP for each block to be deduped.
The challenge comes when looking at primary storage dedupe. If a vendor dedupes the data before it leaves primary memory, then it is clearly inline dedupe. But what if it transfers the data to flash before it makes the decision? It’s still not inline, and the reason it matters is the number of IOPs that the post process generates.
If the un-deduped data is first transferred from primary memory to flash, that is an IOP. It will then require another IOP to read it into primary memory from flash when it is time to dedupe the data. If the data is unique, it will require another IOP to store it in its primary location. That means that this approach will require 200 percent more IOPs than an inline approach. The facts that the landing zone doesn’t need to be very big and the process doesn’t need to be scheduled are irrelevant. What is relevant is that the box will have less IOP capacity than competing solutions. In fact, the vendor whose architecture inspired this blog post says they have a 75 percent reduction in IOP capacity if you turn on dedupe. If every IOP actually requires three IOPs, that’s exactly what would happen.
A true inline solution plays somewhat of a push-pull game between CPU cycles and IOPs. While inline dedupe requires more processing power to decide if a block needs to be deduped before it is written, it comes with the benefit that there are a lot of blocks that don’t get written at all.
VMware uses the post process approach in the VSAN 6.2. Those creating the product knew it was not inline, but didn’t want to call it post process. So VMware chose to call it nearline dedupe. While it would have been preferable if VMware stuck with the official SNIA term – and not used another SNIA term that already means something else – at least VMware was honest enough to not call it inline. When asked about this, Cristos Karamanolis of VMware joked that he thinks that might have spent more time arguing about the term than the company did developing the product.
This post makes a similar point to the Backup Terminology Matters post: it matters what we call things. If vendors are going to differentiate based on certain features, it’s very important that we agree on what those features are called and what you must do in order to qualify for those features. It’s also important that prospective customers understand what those features are and do their due diligence. What do you think?
in this scenario:
1. Data comes into storage array and runs through light compression+blank space removal
2. Data is then written to a small NVMe staging area
3. Write is acknowledge back to the source
4. data is then deduped and later stored on the permanent physical flash media
is it inline or post? I think its post because the write ack is done before the dedupe is run
I think you just mean NVM, not NVMe, which is a protocol not a storage type.
If the NVM is in a DIMM slot, then it’s inline. If the NVM is on the other side of some kind of bus (e.g. PCI), then I’d say it’s not inline. In the official lexicon of SNIA it would be post. But some are taking to calling it many other names, like asynchronous, near-inline, parallel, etc.
Again, the reason the terminology is important is that a true inline architecture has to do 66%+ fewer IOPs than a non-inline architecture, and that has a performance advantage if IOPs are important to you.
I know this isn’t what you intended to address in your post – Is post processing of dedulication any bad at all? Especially for those workloads that can’t be duplicated much and when controller resources are scanty?
Actually post-process dedupe has its own advantages, but they don’t tend to help much in primary storage. The advantages mainly show up in backup & recovery workloads.
Is Pure storage the vendor who actually does post process dedupe but markets it as inline dedupe ?
As I mentioned in the blog post, it would be unfair for me to call out one vendor without looking in depth at the deduplication algorithms of every one of them. But if you’re familiar with the architecture of a particular product, it should be pretty easy to figure out if they’re really inline or not.
If they write pre-deduplicated data to some kind of storage that is on the other side of the PCI bus, then they’re not inline. They may be asynchronous/parallel/nearline (as VMware calls it), but they’re not inline. And the reason why that’s important — especially in primary storage — is the IOP cost of anything less than true inline.
Don’t forget that post processing is bad with flash for two other reasons. First, you’re consuming Program/Erase cycles on the media that would be avoided with a truly inline deduplication architecture. Second, the data accumulating in the landing zone must eventually be processed and this often leads to periods of inconsistent and unpredictable performance in the form of lower IOPS and higher latency. Folks are investing in all-flash arrays precisely to avoid inconsistency and unpredictability of performance.
Full disclosure: I work on EMC’s XtremIO all-flash array. Our deduplication in 100% inline, 100% always on, 100% of the time.
Thanks for being 100% honest. 😉