Flash Arrays need high performance compression

Startups like Nimble, Pure Storage, SolidFire and Tegile are starting to take business away from the traditional tier 1 storage vendors. Their key differentiator, and often the winning point, has been their ability to efficiently use flash storage. Making flash compelling to IT professionals requires a high performance architecture with the ability to use flash efficiently: at the right price point (effective cost) and effective capacity. Many tier 1 vendors have the high performance, but lack the effective cost and effective capacity. This is a direct result of the lack of compression, deduplication and thin provisioning capabilities. This is enabling independent all flash array vendors, as mentioned above, to encroach their accounts with better cost and capacity capabilities.

Efficiency Isn’t Optional

The modern data center needs both performance and capacity. Flash based systems, like all flash arrays and hybrid arrays, deliver on the performance need, but tier 1 vendors in particular are lacking in meeting the capacity demand in a cost effective way. In the same way that deduplication and compression were critical to the widespread adoption of disk backup as the primary backup target, the use of deduplication and compression is critical to the adoption of flash based systems as the primary production storage solution; potentially even more so. Without data efficiency, flash may never reach the price point where it can be leveraged across the data center.

More Than Just Dedupe

While much of the attention today focuses on unstructured data (data outside of a database), the growth of structured data is also on the rise. Structured data, in particular, typically benefits more from compression. In contrast, unstructured data benefits more from deduplication. The combination of both compression and deduplication results in the best approach to overall data reduction.

Along with deduplication and compression, thin provisioning and writable snapshots also provide additional data efficiency benefits. However, in many environments, compression can deliver a greater return on the efficiency investment than any other technology.

The Value of Compression

Compression has universal appeal and can deliver results even when storage administrators are carefully managing their storage. For example, deduplication delivers its efficiency by removing redundancy between files, but if a storage administrator leverages writable snapshot or clones, much of that redundancy can be eliminated. Thin provisioning delivers its efficiency by assuming that storage administrators will massively over-allocate capacity based on user demands instead of application reality.

Certainly deduplication has value, even in a well-managed environment, to catch the redundant data that is sure to creep into any storage infrastructure. But, compression delivers efficiency even on unique data by reducing the size of the individual files no matter how unique it may be.

All Compression Is Not Created Equal

Because compression has been available as a data efficiency technology for decades, there is a tendency to assume that all compression algorithms are the same. The reality is that there are differences in how that technology is implemented and how it is used alongside other data efficiency technologies.

First, there is the ever present concern about the performance impact of compression technologies. This concern has led many tier 1 vendors, with the exception of IBM, to offer compression as a post-process operation. In these implementations, compression is executed during the nightly maintenance window. Moreover, only files that have not been accessed in a certain period of time are compressed. In these instances, the storage vendor is trying to hide the performance impact of their compression implementation behind old data.

Now, vendors like Permabit with their HIOPS™ solution and IBM with their RealTime Compression solution, are offering compression inline as data is being written and read from storage in real-time. The value of inline compression is that all data is immediately optimized and all the resources of the storage infrastructure are fully optimized. Assuming even a modest 2:1 compression ratio, storage resources like cache, drive interconnect and of course capacity are all effectively doubled.

For inline compression to keep pace with the speed of storage and the level of I/O demand, it should be able to take advantage of multi-core processors in modern storage controllers. With fast compression algorithms, they can be parallelized and achieve extremely high performance, which will enable compression to be run inline even on high speed flash arrays.

Another challenge with some compression technology is when it makes freed space available. Currently available compression products have to run a secondary garbage collection process to make freed space available for reuse. Since flash systems are typically run at much higher capacity utilization levels, the preference should be to have the freed space instantly available.

Leveraging Deduplication with Compression

Increasingly, vendors are offering compression and deduplication together for maximum storage efficiency. But, how these two are combined can impact performance. For example, many of the startups compress all data first before deduplicating it. The problem with this approach is that if the data is redundant, the storage system may be wasting effort by spending cycles compressing data that is not going to be stored.

It makes more sense to actually do the deduplication first, before compressing the data. This allows the deduplication process to make sure that only unique data is being sent to the compression process, and that no cycles are wasted compressing data that won’t be stored. The challenge is that many vendor’s deduplication processes are not efficient enough to keep pace with flash technologies and as a result must run post-process.

For this to work successfully, the deduplication process must be inline and be able to manage the duplicate identification process in real-time, with no impact to the overall storage system. Since duplicates are removed before compression occurs, the overall amount of data left to compress is less than the original data being sent to the storage system and in many cases can actually result in an overall increase in system performance.

Summary

Deduplication clearly has an important role to play in the data efficiency eco-system, but it should be closely partnered with compression for maximum benefit. These two efficiency processes, if delivered as a high performance solution that can work inline, together can deliver effective data reduction at the performance needs of flash arrays for maximum efficiency and minimum performance impact. Companies like Permabit with their HIOPS solution are delivering these types of solutions.

Permabit is a client of Storage Switzerland

Click Here To Sign Up For Our Newsletter

Eight years ago George Crump, founded Storage Switzerland with one simple goal. To educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is and a heavily sought after public speaker. With 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , , ,
Posted in Article
7 comments on “Flash Arrays need high performance compression
  1. Disclosure – I work for HP Storage (which is obvious from my @HPStorageGuy Twitter handle).

    Pure Storage wrote a blog post in late November comparing AFAs. That post only focused on start-ups. I took what they had done and added 3PAR AFA and posted it on my blog. A Pure employee sent me a private tweet saying, “Your defense of HP is dutiful, but you are displaying a lack of understanding”. Of course I know a bit more about the HP 3PAR architecture but found this to be pretty humorous.

    But here we are a few months later and HP has added hardware enabled inline deduplication. Pure, who’ve probably used the word “FUD” in more blog posts than any storage vendor of late, have started to talk about HP 3PAR on their blog. They unfortunately still don’t understand the architecture and one of the HP 3PAR product managers just posted a blog setting the record straight. Here’s a link to that: http://hpstorage.me/1reHqAn. There are links to two technical white papers in the article that dive into the HP 3PAR architecture and how HP 3PAR is optimized for flash.

    While the start-ups try to lump HP 3PAR into the category of “traditional tier-1 storage”, it clearly isn’t and we’re pretty pleased with the results we’ve been seeing.

    • George Crump says:

      Calvin,

      In your case no disclosure needed :). Thanks for reading the article and for taking the time to respond in a professional manner.

      The decision to lump HP into the legacy storage vendor was Storage Switzerland’s decision alone and one that we stand by. Let’s face it HP is not a startup. I have publicly stated many times that the HP 3PAR architecture is one of the few architectures that was originally designed for a disk based system that is still appropriate for all flash arrays. Kudos to the original 3PAR designers there and for HP for continuing the work.

      I also stand by the claim that HP and other legacy vendors need to begin delivering on primary storage deduplication and compression. You guys have excellent technology in house, put it in your AFA. For example in the blog that you link to your PM mentions that if HP can get to the $2 per GB the method it uses to get there should be “irrelevant”. I think it is relevent. If HP added deduplication and compression and could then drive the price even lower than $2 per GB, wouldn’t that be better? Yes, unless your method of delivering those technologies impacted performance. Which is what we discuss in the above article.

      As a side note I am always suspicious of X dollar per GB claims on AFAs. What was the exact configuration that got you to that price point. Was that a minimal configuration or was it a maxed out box? Most AFA purchases start by addressing a specific pain point not as an enterprise sweep.

      I will be at both Flash Memory Summit and VMworld. Would love to get you guys on the calendar for a detailed discussion.

      George

      • John says:

        George, if you read the HP article you’ll see that HP have recently added inline dedupe to their 7450 AFA via the 3PAR ASICs inbuilt hashing functionality. They also have advanced sparing to release additional capacity for use per SSD drive as well as all the thin stuff and zero detect which already provides a 50% (2:1) usable guarantee below dedupe.

      • George Crump says:

        John, You are right, of course. I was speaking to compression. I get so used to using the terms together that both popped out. While the 7450 is an impressive box, I believe that HP needs to add compression to the unit in addition to deduplication, which is of course what the original article is all about. Unless I am missing something, I double checked my notes from the briefing… Compression is not a feature in the HP AFA solution. Certainly the combination of deduplication, thin provisioning, writable snapshots will all help. But compression tends to help when those other methods will not. I actually need to do a briefing note on the 7450, I forgot to do it when it was announced. I’ll get that posted with my thoughts this week. -George

  2. Jim-G says:

    “Companies like Permabit with their HIOPS solution are delivering these types of solutions”

    “Permabit is a client of Storage Switzerland”

    Kind of ruined the whole article for me…

    • George Crump says:

      Jim-G: First thank you for taking the time to read the article and even more so to respond. About 50% of our content is sponsored by our clients and we are very careful to note that in our disclosures, as you picked up on. That said it does not mean that the information is not accurate and that it does not provide value as you make your various storage decisions. Much of our content is pre-written prior to sponsorship decision being made and the sponsor to the article has almost no input on its content.

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 21,697 other followers

Blog Stats
  • 1,018,403 views
%d bloggers like this: