At its core, deduplication is an enabling technology. First, it enabled disk based backup devices to become the primary backup target in the data center. Now it promises to enable the all-flash data center by driving down the cost of flash storage. Just as deduplication became the table stake for backup appliances, it is now a required capability for flash and hybrid primary storage.
The Changing State of Backup Deduplication
The implementation of deduplication in primary storage will follow a similar path to that of backup storage. An IT planner wouldn’t think of buying a backup system without deduplication today, and the overwhelming majority of backup systems have the feature built in. In fact backup appliances have moved on, the debate about if and how to implement deduplication is over. These vendors are now focused on next level features. For example EMC/Data Domain’s source side application agent technology (DD Boost) reduces data being transferred to the appliance. Another example is ExaGrid, which now can host a backup agent on the appliance to minimize network communications and optimize advanced features.
Deduplication in Primary Storage
The next big wave is of course flash/hybrid primary storage. Here, deduplication is becoming a table stake feature. The first group to implement this capability was all-flash array providers like Pure Storage, XtremIO, Tegile and SolidFire. Thanks partly to an aggressive push of deduplication, these vendors were successful enough to get the attention of established storage vendors, to the point that EMC bought XtremIO and HP has added deduplication to their systems.
Today we have not seen a wholesale dedupe adoption of primary storage system providers. Dell, IBM, EMC, HDS and Oracle still dominate the market, but they all need to respond to the deduplication demand. There is still an opportunity/requirement for these vendors to add deduplication because users are sometimes hesitant to abandon ship and switch to a startup. As we all know, IT professionals hate change just as much as anyone else. A change means learning new storage system software and designing data protection processes. But that window of opportunity is closing and legacy vendors need to bring features like deduplication and compression to market quickly or lose their incumbent position in primary storage.
Will Legacy Vendors Develop Their Own Dedupe?
There is little doubt that legacy vendors will need to offer deduplication and compression, especially as flash and hybrid storage becomes more entrenched. The question is “will these legacy vendors develop it themselves?” After all, most of them have some form of deduplication technology in house, but those are mostly focused on backup and not primary storage. Primary storage deduplication’s use case is very different. It should be safe, inline, fast, scalable and resource efficient.
Another challenge facing legacy vendors is that 2015 is going to be a big year for flash deployment in the data center. It might not be “the” year, but it will be a big year. While there are a lot of smart people at these organizations, it simply becomes an issue of time vs. priorities. All these companies have more to develop than just deduplication.
Legacy vendors need to quickly add inline deduplication and compression to their existing storage offerings to hold off the hard charging startup storage vendors encroaching into their incumbent positions. Storage Switzerland has dubbed this technology Retro-Active deduplication. As we discussed in our article “Retroactive Deduplication and Compression”, the technology allows legacy vendors to be instantly competitive with the startup market. Retroactive deduplication starts as an appliance that sits inline and the customer assigns specific volumes to that appliance for optimization. Permabit is a good example of a company providing this solution and their SANblox offering is now sold by EMC, HDS and NetApp with more to come.
Retroactive deduplication will more than likely evolve into embedded deduplication using the same code in the appliance offering so that existing deduplicated volumes are compatible with new controllers. This evolution can happen at a pace that is compatible with the vendor’s product development and roll-out strategy. It is critical that vendors in the retroactive deduplication space provide an API set that allows this evolution to occur so that data access can be seamless and Permabit has provided that in it’s SDK version.
A Deduplication Standard
This leaves the market wide open for a deduplication standard. One that can be added to existing arrays and potentially interoperates between arrays. Imagine when performing a VM migration between two storage systems to only have to move 10% of the VM’s data since the other 90% already exists on the second array?
The All-Flash Response
How will flash/hybrid storage system vendors respond? It depends. In some ways they should welcome a de facto deduplication standard, they would benefit from the same migration scenario described above. In addition, deduplication is just one arrow in the flash/hybrid vendor’s quiver, they may be more than happy to not have to worry about maintaining this complex code so they can focus on other more glamorous features.
In 2015, deduplication will ride flash/hybrid storage’s coattails right into the primary storage data center. The flash/hybrid startups had an early lead but now, thanks to retroactive deduplication, legacy storage vendors have more than a fighting chance. They can rapidly add deduplication while continuing to leverage their head start in features like tiering, replication and snapshots with a broader plan to embed it and deploy it across their storage offering portfolio. As we end 2015, IT professionals will begin to want primary storage deduplication technologies to work together to minimize data transfer between devices.
Sponsored by Permabit