Startups like Nimble, Pure Storage, SolidFire and Tegile are starting to take business away from the traditional tier 1 storage vendors. Their key differentiator, and often the winning point, has been their ability to efficiently use flash storage. Making flash compelling to IT professionals requires a high performance architecture with the ability to use flash efficiently: at the right price point (effective cost) and effective capacity. Many tier 1 vendors have the high performance, but lack the effective cost and effective capacity. This is a direct result of the lack of compression, deduplication and thin provisioning capabilities. This is enabling independent all flash array vendors, as mentioned above, to encroach their accounts with better cost and capacity capabilities.
Efficiency Isn’t Optional
The modern data center needs both performance and capacity. Flash based systems, like all flash arrays and hybrid arrays, deliver on the performance need, but tier 1 vendors in particular are lacking in meeting the capacity demand in a cost effective way. In the same way that deduplication and compression were critical to the widespread adoption of disk backup as the primary backup target, the use of deduplication and compression is critical to the adoption of flash based systems as the primary production storage solution; potentially even more so. Without data efficiency, flash may never reach the price point where it can be leveraged across the data center.
More Than Just Dedupe
While much of the attention today focuses on unstructured data (data outside of a database), the growth of structured data is also on the rise. Structured data, in particular, typically benefits more from compression. In contrast, unstructured data benefits more from deduplication. The combination of both compression and deduplication results in the best approach to overall data reduction.
Along with deduplication and compression, thin provisioning and writable snapshots also provide additional data efficiency benefits. However, in many environments, compression can deliver a greater return on the efficiency investment than any other technology.
The Value of Compression
Compression has universal appeal and can deliver results even when storage administrators are carefully managing their storage. For example, deduplication delivers its efficiency by removing redundancy between files, but if a storage administrator leverages writable snapshot or clones, much of that redundancy can be eliminated. Thin provisioning delivers its efficiency by assuming that storage administrators will massively over-allocate capacity based on user demands instead of application reality.
Certainly deduplication has value, even in a well-managed environment, to catch the redundant data that is sure to creep into any storage infrastructure. But, compression delivers efficiency even on unique data by reducing the size of the individual files no matter how unique it may be.
All Compression Is Not Created Equal
Because compression has been available as a data efficiency technology for decades, there is a tendency to assume that all compression algorithms are the same. The reality is that there are differences in how that technology is implemented and how it is used alongside other data efficiency technologies.
First, there is the ever present concern about the performance impact of compression technologies. This concern has led many tier 1 vendors, with the exception of IBM, to offer compression as a post-process operation. In these implementations, compression is executed during the nightly maintenance window. Moreover, only files that have not been accessed in a certain period of time are compressed. In these instances, the storage vendor is trying to hide the performance impact of their compression implementation behind old data.
Now, vendors like Permabit with their HIOPS™ solution and IBM with their RealTime Compression solution, are offering compression inline as data is being written and read from storage in real-time. The value of inline compression is that all data is immediately optimized and all the resources of the storage infrastructure are fully optimized. Assuming even a modest 2:1 compression ratio, storage resources like cache, drive interconnect and of course capacity are all effectively doubled.
For inline compression to keep pace with the speed of storage and the level of I/O demand, it should be able to take advantage of multi-core processors in modern storage controllers. With fast compression algorithms, they can be parallelized and achieve extremely high performance, which will enable compression to be run inline even on high speed flash arrays.
Another challenge with some compression technology is when it makes freed space available. Currently available compression products have to run a secondary garbage collection process to make freed space available for reuse. Since flash systems are typically run at much higher capacity utilization levels, the preference should be to have the freed space instantly available.
Leveraging Deduplication with Compression
Increasingly, vendors are offering compression and deduplication together for maximum storage efficiency. But, how these two are combined can impact performance. For example, many of the startups compress all data first before deduplicating it. The problem with this approach is that if the data is redundant, the storage system may be wasting effort by spending cycles compressing data that is not going to be stored.
It makes more sense to actually do the deduplication first, before compressing the data. This allows the deduplication process to make sure that only unique data is being sent to the compression process, and that no cycles are wasted compressing data that won’t be stored. The challenge is that many vendor’s deduplication processes are not efficient enough to keep pace with flash technologies and as a result must run post-process.
For this to work successfully, the deduplication process must be inline and be able to manage the duplicate identification process in real-time, with no impact to the overall storage system. Since duplicates are removed before compression occurs, the overall amount of data left to compress is less than the original data being sent to the storage system and in many cases can actually result in an overall increase in system performance.
Deduplication clearly has an important role to play in the data efficiency eco-system, but it should be closely partnered with compression for maximum benefit. These two efficiency processes, if delivered as a high performance solution that can work inline, together can deliver effective data reduction at the performance needs of flash arrays for maximum efficiency and minimum performance impact. Companies like Permabit with their HIOPS solution are delivering these types of solutions.
Permabit is a client of Storage Switzerland