In the early days of All-Flash Arrays (AFA) deduplication was a key catalyst for adoption. Server and Desktop virtualization environments benefited greatly from the technology because of the similarity between virtual machine images. The environments also didn’t have the same performance demands of the system as scale-up databases and other applications. Using deduplication on database and other application environments doesn’t deliver the same capacity gains, thus wasting compute resources. Now IT planners need to reconsider if they “must have” deduplication in their AFAs.
Deduplication is an investment. In a storage system, the vendor is investing CPU and memory resources hoping that the return on that investment is a reduction in capacity requirements. Deduplication on production storage is nowhere near as effective as it is in backup storage. While some vendors claim 5:1 deduplication ratios, most customers actually see 3:1 when workloads like databases and virtualized infrastructure are blended together.
In an era were flash capacity was more than ten dollars per gigabyte, the investment made sense if the customer could get $30 of storage for every $10 purchased. It saves the customer $20. Today, flash storage is lower than $1 per Gigabyte, so that 3:1 ratio only saves $2 per gigabyte.
If deduplication is applied in such a way that it is 100% seamless to production storage and has no relative cost, then even a fifty cent per gigabyte savings might be worth its application. Deduplication though always has an impact. Storage vendors can hide the impact by using more powerful processors and more memory but all the attempts to hide deduplication’s impact, increase the cost of the all-flash array.
Another challenge facing vendors is there is less latency within the storage infrastructure to hide behind. First generation all-flash arrays use SAS based flash drives and connect to servers via traditional SCSI protocols. The overhead of SAS and SCSI create enough overhead that the impact of deduplication might be less noticeable but in most cases it often was still noticeable. As next generation flash technology and networking technology come to market, based on NVMe, latency is significantly reduced and the overhead of deduplication is more exposed than ever.
The final concern with deduplication is one of need. First generation flash drives were 128GB to 256GB, now flash drives are available in double digit terabytes. Many organizations may meet their capacity needs with a single AFA and half a dozen drives. Deduplicating data across those drives means an organization may end up with 3X the capacity that it will actually need.
The Dedupe Tax
Another issue with deduplication is that vendors with the feature actually tend to charge more per raw GB than vendors without deduplication. This is known as the dedupe tax. For example if the cost of flash capacity is $3 per GB raw but thanks to deduplication the vendor can drive that cost down to $1 per GB, yet they charge you $2 per GB, then the organization is actually paying more per GB than it should even though they are saving $1 per GB.
Life Without Deduplication
All-flash array vendors have conditioned the IT mind to demand data deduplication. IT professionals need to reconsider deduplication‘s value. It may be a nice to have for certain environments but it is certainly not a must have and it does not lower the price.
To learn more about why all-flash arrays cost so much and what to do about it watch our on demand webinar “How to Define a 92TB, 500k IOPS for less than $95k”.
Hi George! Interesting read! It actually made me think of our environment (mixed SSD/SATA). We have very heterogeneous data. On one of our 80TB SSD Volume Group I have 3:1, on the other 10:1 saving. I find it extremely hard to estimate the savings before buying. This makes the decision difficult. Plus, Vendors tend to make All-Flash arrays (or even more so – NVMe) and two-digit TB disks so expensive, you don’t think twice about it. Also, you want a certain amount of disks to spread the I/O. For us, deduplication ist just an added bonus, not a reason for SSD.