Storage caching enables the use of a very fast storage area, typically flash, to intelligently store the most frequently accessed data objects and improve application performance. Server-side caching uses one of a variety of flash based SSDs internal to the application server to bring this high performance storage closer to the CPUs and remove network latency. But there’s still a perception that caching can be risky and that it’s a less effective use of expensive flash capacity. This article will discuss ways to reduce that risk. An upcoming article will address ways to increase caching accuracy and make flash more efficient.
Reducing Cache Risk
With server-side caching the perception of increased risk comes from the fact that the caching software and the flash supporting that cache are all inside the same server. In a sense the server itself is a single point of failure. And if a failure occurs after a write has been acknowledged to the application, but before the data is actually written to the primary storage systems, then data loss could result.
But most of the time, with modern flash-based caching, this simply can’t happen because data is written to external hard disk storage, outside of the server, before the write is acknowledged. This is called “write-around caching”. Similarly, with “write-through caching” data is written to the primary storage systems in parallel to being written to cache. Again, the write is only acknowledged to the application once the data is safely on a disk system, typically outside the server.
In both of these methods write performance is governed by the speed of the disk write process. However, some applications and workloads are especially sensitive to write latency, or have intermittent write activities – such as flushing buffers – that impact performance consistency. These implementations can significantly benefit from “write-back” caching in the server. This two-step method first generates an acknowledgement to the application when data has been written to cache, and then flushes the data from the cache to disk. While write-back caching is faster than write-through caching it does introduce some risk. If there’s a failure during the timeframe between these two steps, data loss will occur. For environments that must have the added performance of write-back caching there are solutions to mitigate this potential risk.
Mirroring the cache between two flash devices can protect a write-back cache against data loss in the event of an SSD failure. Since application uptime is so important in performance-critical environments, the caching solution should provide this redundancy with minimal impact on write latency, and also have a process to recover from a failure with minimal disruption.
SanDisk’s FlashSoft software has a built-in logical volume manager that writes data to both SSDs, simultaneously, providing a mirrored cache with the write latency of a single SSD. The cache switches to write-through mode in the event of a single SSD failure, until the SSD can be replaced and the mirror restored. And in this restore process the software doesn’t have to rebuild the cache ‘from scratch’. It only needs to reconcile the data that wasn’t in the mirror before the primary cache went down.
Mixed Flash for Multiple Caches
It’s also important for the caching solution to support multiple cache volumes and multiple flash types. FlashSoft provides this feature as well, which can improve the options for establishing that mirror and bring some additional benefits in the process. For example, within a single server, an SLC-based PCIe flash device can be used to cache performance-sensitive workloads, while a lower cost MLC-based SAS SSD can be used for caching other workloads. This use of multiple caches can benefit both application performance and storage efficiency. But there’s still another benefit to write-back caching in addition to high performance.
Write Cancellation and Write Coalescing
Most data access patterns are highly repetitive, meaning the same data are written and then over-written, often multiple times. By caching writes on flash, these over-writes can cancel an original write before it’s actually flushed to disk. This makes multiple write operations at the compute tier correspond to only one write operation to storage. Additionally, when writes are finally flushed to storage, the cache coalesces a number of these write operations into a single write to storage. This serves to further reduce the amount of write activity that goes to the back end storage system, and the amount of traffic the network sees on its shared storage.
With write-back caching, a substantial amount of “dirty data” (data updated in cache but not yet written to storage) needs to be held in the cache in order to gain a significant benefit. The more dirty data, the greater the chances for multiple write cancellations (with large blocks of data), and the greater the number of writes to be coalesced when flushed.
So in addition to enabling better application performance by making write-back caching safer, caching solutions like FlashSoft also enable the benefits of write cancellation and write coalescing. The key to this capability is to maintain the appropriate amount of data in the cache for the appropriate amount of time to maximize storage efficiency, while minimizing the potential risk. With a mirrored cache, more data can be stored in cache for longer periods with less risk. This allows for more write cancellation and write coalescing and a greater reduction in write activity to backend storage.
Caching is an increasingly accepted technology for accelerating application performance and for certain types of applications and workloads, write-back caching may be part of that solution. But write-back caching can increase the possibility of data loss. In these situations, cache mirroring can reduce that risk and open the door for the additional benefits of write coalescing. By caching a sufficient number of writes in flash, storage network traffic and storage array overhead can be reduced considerably, conserving disk resources for other applications.
In an upcoming article we’ll discuss the problem of with cache misses and ways to improve cache accuracy and efficiency.
- How to Reduce Server-side Cache Risk (storageswiss.com)