Flash storage continues to be the “go to” option for IT professionals looking to solve performance problems, but these infrastructure designers are struggling with how to best implement flash. Automated tiering and caching are becoming common answers to that question. Instead of IT administrators having to measure the flash worthiness of each data set, the storage system or storage software can now make data placement decisions automatically. While the two terms are often used interchangeably the technologies are not the same and administrators need to be aware of those differences.
What is Caching?
As we detail in our article “What is Storage Caching”, caching comes in three basic forms. Write-around caching, also known as “read-only” caching, only operates on data that has been written to the slower tier of storage, presumably disk. Data is only copied into cache after a certain threshold of read activity occurs to those files.
The second type, write-through caching, writes data to both the flash area and the hard disk area at the same time. This technique essentially pre-promotes data working under the assumption that the most recently written data will also most likely be the next read. In both write-around and write-through cache methods write operations happen at hard disk speeds since the application has to wait for hard drive acknowledgement of a successful write.
What is Tiering?
Instead of making a copy of the most active data, tiering typically writes data in its entirety, first to one storage area (or tier), and then moves that data to different areas based on application or user performance requirements. Most automated tiering technology starts by making sure all new or modified data is written to the hard disk area first. The technology then migrates that data up to the flash tier based on access patterns.
Moves vs. Copies
A key difference between tiering and caching is that tiering moves data to the flash storage area instead of copying it, both from slower storage to faster storage and vice versa. Caching is essentially a one-way transaction. When the cache is done with the data it was accelerating it simply nullifies it instead of copying it back to the hard disk area. Write-back caching could be considered an exception but typically the write-back technology has made its copy to the hard disk area long before the cache is done with the actual data.
In theory, this should mean that caching has less performance impact than automated tiering. That would be correct if the vendor has chosen a hard-drive-first auto-tiering design strategy, since almost certainly some data will need to be copied “up” from the hard drive tier. If the vendor chose a flash-first strategy then the majority of the time data will be pushed “down” the storage tiers and extra copies will be unnecessary.
The important difference between moves and copies is that a cache does not need to have the redundancy that tiering does. Tiering stores the only copy of data for potentially a considerable period of time so it needs to have full data redundancy like RAID or mirroring. Because of the premium cost of flash storage that redundancy may be more than the IT budget can support.
Dealing with Write Performance
To address write performance concerns we have seen a few storage systems with caching technologies introduced that write to the flash area first, instead of the primary storage area. In caching terminology this is called “write-back caching” and it’s different from the other forms of caching in that writes are acknowledged to the user or application as soon as they are written to the flash area. While the application does not have to wait for hard-drive-speed acknowledgement there is the potential for data loss and corruption if the flash fails prior to flushing all its data to the hard disk area.
In the same way we have seen automated tiering technology only a few storage systems write to the flash tier first. As with write-back caching, the application benefits from flash performance, but like write-back caching, flash-first tiering requires extra redundancy to ensure that data can survive a flash module or storage system failure. Although it makes performance acceleration more expensive, redundancy is something that can be designed into both caching and tiering solutions.
There is another concern with flash-first tiering and caching technologies that’s been well documented, flash has a finite number of times it can be written to. SLC (Single-Level Cell) is the most durable form of flash but also the most expensive and, as a result, most vendors lead with MLC (Multi-Level Cell) flash to keep prices down. But this brings an increased risk of flash failure, a concern with both tiering and caching technologies if they write to flash first or in parallel to the hard disk.
In addition, the write-through, write-back and flash-first tiering use cases are all susceptible to early flash burn out since all net new writes go to flash first or in parallel to hard disk.
These issues of data risk and flash endurance can be overcome by leveraging a small SLC storage area to act as a ‘shock absorber’ for writes. Dell, as we pointed out in our lab report “Mixed All-Flash Array Delivers Safer High Performance”, was the first vendor to take the extra step of writing to a small SLC storage area first, then leveraging automated tiering to move data to an MLC tier as it ages. This technique strikes a better balance of durability, performance and risk aversion. We expect many hybrid storage arrays to implement a similar approach in 2014.
The single biggest difference in tiering vs. caching technology is that tiering can involve more than two types of storage areas. Caching, at least so far, has essentially been two tiered, flash and hard disk. While there are some caching technologies that will leverage RAM as a third tier those are rare. Automated tiering can be much more robust; we’ve seen vendors support as many as four tiers of storage. This allows for tiering to work for both enhancing performance enhancer and saving cost. Automated tiering could for example migrate from SLC Flash to MLC Flash, then to 15K RPM hard disk and finally to 7.5K RPM high capacity disks. As the data becomes less active the media that the data is stored on becomes less expensive.
Storage System Specific
Automated tiering is typically storage system-specific, meaning that the tiering software is designed to work with a particular storage system and all the storage hardware needs to be purchased from a single vendor. Even so, it should be considered a “must-have” feature for any organization that is ready for a storage system refresh or upgrade. An exception to this rule is the emerging software defined storage (SDS) market. A few of these solutions will allow for tiering to occur across different vendors’ hardware but it still must be controlled by a single storage management solution.
Caching on the other hand tends to be more universal and can be applied to almost any solution. There are caching appliances that run in-band on the storage network as well as on the physical servers themselves. In fact our research shows that the most common form of flash implementation is when flash is installed in a specific server having a performance problem. The most obvious compliment to that server flash hardware is server-side caching software.
Tiering and caching are two heavily used terms in the flash market. Both attempt to accomplish a similar goal, the automated use of a flash investment, but take different approaches. Caching is temporary in nature and typically can better utilize more scarce flash resources. Tiering is more permanent but requires a higher capacity investment in flash to be effectively utilized. In my next column I will discuss how to decide which of the two technologies is best for your data center.