All-Flash Storage Efficiency Is About More Than Deduplication

George Crump, Lead Analyst – Storage Switzerland

All-Flash Storage Arrays have quickly become the go to option to solve storage performance challenges. Thanks to data efficiency technologies that reduce the effective cost per gigabyte (GB), the appeal of All-Flash Arrays now extends beyond the performance fringe and into the mainstream data center. Deduplication has captured much of the attention of data efficiency techniques, but IT planners need to be careful not to assume that all deduplication is the same nor is it the only data efficiency technology available to them.

There are several technologies that provide increased efficiency in flash storage. Each provides unique capabilities and together they can deliver a complete efficiency solution for flash storage. This article examines of the storage efficiency technologies can drive down the effective cost per GB of All-Flash Arrays to make them more attractive to mainstream data centers.
Thin Provisioning

Thin provisioning should be a core capability of a storage efficiency offering because it allows for storage to start efficient from the outset. Without thin provisioning, storage is typically allocated at the expected maximum requirement for the application that needs it. The problem is that without thin provisioning, it can take years for that application to even come close to utilizing that storage and in the meantime that data is held captive by the application and can’t be used by other applications. This is especially troubling for All-Flash Arrays. Not only is their capacity being wasted, it is sitting idle driving up the cost per GB (and IOPS) compared to idle HDD.

It is important to note that other forms of storage efficiency require data to be present in order for it to be optimized. If capacity is hard allocated to the application or server but there is not data within that capacity, there is nothing to optimize and therefore the capacity is captive. Thin provisioning is the only approach that frees up captive capacity and makes it globally available to the other data efficiency technologies.
Compression

Another challenge with deduplication is that not all data is redundant. There are some workloads, like virtualized servers and desktops, that do create a lot of redundant data, then there are others like databases that do not. Compression is a better alternative in these situations, because it works within the file to remove redundancy. Data that tends to be optimized most by deduplication is often optimized least by compression and data that is optimized most by compression is optimized least by deduplication. As a result, deduplication and compression are complementary technologies – deduplication can eliminate redundancy across files and compression within the files.
Deduplication – All Are Not Created Equal

Deduplication can have an important role to play in addition to thin provisioning and compression. Deduplication eliminates redundant data that exists across files. It generally operates at a sub-file level, comparing data segments in an attempt to find similarities and only stores a single reference instance of that data. As Storage Switzerland discussed in a recent webinar with Permabit Technology‘s CTO Jered Floyd, it is a sophisticated technology that requires more than just the management of these data segments. Deduplication, especially when used for All-Flash primary storage, requires a robust set of meta-table handling routines that allow it to be searched rapidly so that storage performance will not be impacted.

These meta-data tables also need to be robust enough to handle the scale required for primary storage. In primary storage, there will be less redundancy than in deduplication’s beachhead market, backup. But, there will still be enough redundancy to make implementation of the technology worth the effort, especially in virtualized environments. However, this lower deduplication ratio means that more unique data needs to be tracked, and that leads to a larger meta-data table. This fact alone impacts deduplication performance, but it can also lead to meta-table corruption and performance degradation if the developer does not design the technology to scale (i.e. keeping the meta-table in RAM with small and efficient indexing technology). It is key to look for scalable deduplication technologies that have stood the test of time and a demonstrated ability to scale.
Replication

Disaster recovery is a basic requirement for any data center. Array based replication is generally considered one of the easiest ways to fulfill that requirement. While not always thought of as a data efficiency technology, having the ability to replicate data to another location and to do so with minimal bandwidth utilization is a form of data efficiency.

Being inefficient at bandwidth utilization can be just as impactful to the cost of an All-Flash deployment as not having space efficient technology locally. Many All-Flash storage vendors don’t have replication built into their systems and as a result are counting on third party software tools to do the replication. The challenge to not leveraging array based replication is that all the hard work performed in the form of deduplication and compression to provide efficiency is undone when a third party tool is needed for disaster recovery purposes. As a result, all the data has to be inflated and uncompressed in order to be replicated by the third party software package.

Beyond not being able to leverage the array’s capabilities of deduplication and compression, the lack of array based replication means that the storage manager has to implement and manage third party replication software typically installed on the host to move data off-site to fulfill disaster recovery requirements. The challenge with host based replication is that it has to be deployed off of the storage and on the hosts. This can consume both CPU processing power and on-premise network bandwidth that the application may very well need, especially in a virtual server environment.

By comparison, replication that can leverage the deduplication and/or compression already done on the array requires significantly less CPU utilization and less bandwidth to get to the off-site location, since no additional work, other than copying the unique blocks, needs to be performed. A once sophisticated process is merely the copying of information that has already been gathered thanks to integrated deduplication and compression capabilities.
Conclusion

All-Flash Arrays promise to eliminate most storage performance concerns and as a result simplify the entire process of storage design and management. But the solutions have to be affordable to the data centers that need them. A portfolio of products is needed for that to occur beyond simple deduplication. The portfolio should consist of thin provisioning, compression, sophisticated deduplication and intelligent replication.

Companies like Permabit Technology are providing these capabilities to storage vendors. As we have shown in our on-going lab studies the technology performs very well and is extremely reliable. All-Flash vendors looking to take a leadership position in the data services they provide should take a hard look at technologies like Permabit’s as an alternative to developing the technology in-house. IT end-users should begin to demand that a complete data services capability is present in their all-flash arrays.

 

Unknown's avatar

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: , , , ,
Posted in Article
One comment on “All-Flash Storage Efficiency Is About More Than Deduplication

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 17.4K other subscribers
Blog Stats
  • 1,979,429 views