IT professionals expect a lot from their storage systems; they want volume management, thin provisioning, snapshot, clones, automated tiering, replication etc. Increasingly today, they want deduplication and compression as well, so they can squeeze every ounce of capacity out of their storage investment. While all of these capabilities can make the life of the storage administrator easier and reduce the cost of storage system ownership, they can also negatively impact performance.
What is Performance Impact?
Many vendors will claim that a given feature or set of features will not affect performance. Technically, that’s not accurate. All features impact performance to some extent, the key is whether they can execute efficiently enough to make that performance impact unnoticeable to the user community. If the user or application doesn’t notice a performance decline, then that feature essentially has no performance impact, at least from a real world sense. Also, if that feature can be made to run efficiently, are there side benefits that actually improve overall performance of the system?
Flash Makes It Worse
Many flash vendors will claim that they enable features by over provisioning storage performance. However overcompensation sacrifices much of the performance potential of the flash system. Storage Switzerland has identified vendors that, as they added software services to their flash arrays, saw the performance of their flash systems decline by as much as 50% or more. While these systems can still deliver plenty of IOPS the cost to get those IOPS goes up significantly thanks to all these added features.
The problem is that flash-assisted or all-flash storage systems remove the hard drive latency that storage software used to hide behind when features were added to the system. Now, thanks to flash’s low latency, the overhead caused by inefficient storage software is immediately exposed as performance consuming.
Storage and Compute Incorrectly Applied Adds Little Value
In order to address the performance consumption issue, storage vendors have tried to throw more hardware at the problem in the form of more powerful storage processors and/or more RAM. Most storage systems run their storage software on the same server hardware as the compute platforms that host applications. In an attempt to compensate for inefficient storage software, vendors often upgrade the compute capability of their storage servers. This is certainly an effective course of action, but at what cost and how efficiently are those additional resources used?
The Core Problem
Most modern day server hardware platforms consist of 4 to 8 processors with 2 or 4 cores per processor. These cores are essentially a processor within a processor. Modern day software should leverage these cores by running their code across these cores, not vertically siloed to a single core. The problem is that most storage software is not aware of or designed for multi-core execution. In most cases, the best that the storage vendor can do is isolate given functions on each core which leads to some cores being used heavily and others not at all undermining the investment made for the multi-core processors. Striping the execution of code across cores makes sure that the full power of each core is available for the task at hand. The result is that multi-core aware storage software can accomplish more work on fewer cores which saves money, improves overall system performance and core utilization.
For example, Permabit claims that it can process deduplication and compression while maintaining 200K IOPS per core, but because it can thread across multiple cores it’s able to scale these IOPS as more cores are available. This means in a single quad-core processor Permabit software can sustain over 800K IOPS while performing inline deduplication and compression. This speed is faster than most flash arrays on the market today. This is a good example of the application of software design for multi-core systems that yields impressive results.
The Data Efficiency Dividend
Data efficiency techniques like deduplication and compression are able to significantly reduce storage capacity expenditures. But data efficiency that is fast enough to be executed inline without impacting performance can also pay dividends beyond simple efficiency.
For example, if a deduplication process can eliminate data redundancy before the compression process has to deal with that data, then it makes compression more efficient as well. If the combination of data deduplication and compression can reduce the amount of data (both the number of files and average file size) then it can improve performance by reducing the write load. Reducing the write load also means eliminating the number of RAID parity bits that need to be generated, which should further improve performance. Finally, since there are fewer, smaller data objects to store, thanks to compression, premium resources like cache and flash tiers will be used more efficiently, meaning less recalls from hard disks. These processes should also extend the life of the actual flash media itself since it is written to less frequently, the efficiency in the design yields a cascading impact to overall system efficiency beyond the data reduction.
Many vendors claim the reason they are not delivering deduplication and compression for primary storage is that they are concerned about the potential performance impact. However, if the software is effectively designed and built on a modern, multi-threaded architecture the combined impact of an inline deduplication and compression algorithm can be a reduction in capacity utilization, an increase in the life of the flash storage and an increase in total system performance.
This Article Sponsored by Permabit