Caching is a popular first step when data centers want to leverage high performance flash storage. It eases the transition from traditional disk storage by automatically moving frequently accessed data to the high performance pool. Most cache technologies that have come to market have essentially ignored writes and focused instead on read caching. But an increasing number of vendors have started delivering, or announced, write caching solutions as well. While it sounds like a good idea to cache both reads and writes there are some considerations that IT planners need to be aware of when implementing write-based caching.
What is Caching?
Storage Switzerland detailed fundamental caching technology in the recent article “What is Storage Caching?“. Essentially, it’s the automated movement of frequently accessed data from a slower storage type to a faster one. In the modern data center that tends to involve moving active data from hard disk drives to flash based storage. As stated above most caches are implemented for reads only, meaning they store a copy of the most active data on the faster storage. All writes are sent directly to the slower storage medium, typically a hard disk. In the event of a cache failure there’s no risk of data loss since the original data is on the hard disk.
The problem with read caching though is that a portion of the I/O stream is ignored, the data writes. How much of that I/O stream this turns out to be is dependent on the read/write mix in the environment. But for many workloads write operations make up between 40% and 60% of the total storage I/O activity. Also consider that writing data is a more resource intensive operation and in the typical enterprise use case, generates additional storage activity in the RAID parity process as well.
1. Write Back Caching Has Risks
Unlike read caching, write caching in the server has a level of risk associated with it. The primary reason is that for a period of time the cache holds the unique copy of data, at least until that write is safely stored on disk. The amount of time is dependent on a variety of factors but there is no doubt this creates a window of risk. If the cache fails during that window data could be lost, which could lead to application failure or data corruption.
Vendors have begun to deliver write caching solutions to the marketplace that attempt to address these shortcomings in a variety of ways. Each method can reduce exposure to data loss but can also increase cost or introduce storage latency.
2. Read Caching Can Improve Write Performance
While it may sound counter-intuitive, read caching can actually improve write performance to the point that a dedicated write cache may not be needed. This is because the read cache can act like a ‘read shock absorber’ for the storage system. In this scenario almost all the read requests are terminated at the cache so they don’t need to be handled by the storage system. This gives writes almost exclusive access to the storage interconnect and to the storage controller’s processors.
As an example let’s assume a typical virtual environment is 75% reads and 25% writes. If server-side caches are installed in each of the hosts and are able to achieve 80% hit ratios, then 4/5 of the reads generated by that virtual environment will be supported by the cache in the virtualization host. This translates into a 60% reduction in total I/O activity (80% * 75%) on the shared storage system. Since those reads are serviced locally they no longer impact the storage network or the storage controller. In the final analysis, alleviation of read requests from the storage infrastructure may deliver all the performance boost that is needed and the above risks of write caching never need to be incurred.
3. Write Caching Can be Made Safe
However, if the environment has a high enough write ratio, a read cache alone may not be sufficient to meet the performance demand. In this case it would be appropriate to consider a flash solution that can do read and write caching safely. There are typically two ways to accomplish this.
The first is to mirror the cache to a secondary flash device inside the server so that if one fails the other device can take over with no data loss. Doing this requires more than an operating system mirror since that would add latency and negatively impact flash performance. Instead, the caching software should provide this mirror itself. This mirror presets a single “cache” to the operating system while internally confirming that data has been correctly written to both locations prior to acknowledging the write back to the application.
While this approach does protect against a flash failure in the server it does not provide protection from a failure of the physical server itself. The likelihood of a server failure is rare but its impact on data loss could be severe enough that some data centers would want an extra layer of protection.
That extra protection more than likely would come in the form of a remote mirror where the cache is installed inside another server or on a shared disk storage system. This gets the unique write data out of the server and onto another form of storage. The caching solution would also need the intelligence to correctly flush or reassemble the cache data when the primary server comes back online.
Conclusion
Caching, especially in the server-side use case, is a simple way to begin adopting solid-state disk and reap the benefits of its performance. However, the IT professional is advised to consider carefully if they should use read caching or read/write caching. Read caching alone is more cost effective and, for many, may be all the performance improvement they need. Because the benefits of read caching on write performance are difficult to estimate in advance, it’s a good idea to try a read cache, even for workloads in which the problem seems to be write performance. However, there will be cases in which a write cache is really what’s needed. In those environments consideration should be given to a read/write caching solution that has redundancy built in to ensure data consistency in the event of a flash failure.
SanDisk is a client of Storage Switzerland
George, great overview of storage write caching.
[…] Read on here […]
[…] We examine that question in this video. Storage Switzerland Lead Analyst George Crump talks to SanDisk Engineering Fellow Serge Shats about the different types of caching, the potential risks involved and how to reduce those risks. For more information see our related article “Three Write Caching Considerations“ […]