Easy, Effective and Efficient Server SSD Caching

Posted on January 14, 2013 by Eric Slack

SSDs (solid state devices) are becoming more mainstream and now ‘placement’ technologies like caching are getting increased attention as users strive to get the most out of their solid state investments. Caching primary disk storage with NAND flash can improve application performance if the right data can be kept in flash, but to do this caching algorithms must predict which data will be needed by the application. VeloBit’s HyperCache software leverages ‘content popularity’ of data blocks to improve caching performance while increasing effective cache capacity.

HyperCache is a caching software solution that installs on the application server, in Windows, Linux, VMware or Hyper-V environments, using any available block-based SSD and RAM. It creates a caching layer that’s completely non-disruptive, requiring no changes to the primary storage that’s behind it or to the applications that are accessing it.

VeloBit has developed a highly effective caching algorithm that increases the chances of the right data being available in the cache when needed (a “cache hit”). It also leverages a data reduction process to increase the effective cache capacity. Together, these technologies commonly produce a 10x reduction in storage latency and a 20x performance increase over primary storage.

According to VeloBit the HyperCache software is easy to implement, typically taking less than 60 seconds. Storage Switzerland will test these claims in an upcoming test drive. The wizard-driven process automatically identifies primary storage and available SSDs, allowing the user to specify the amount of RAM to allocate to the cache, to set the write cache depth (can be set to “0” to disable write caching) and to configure sequential I/O filtering if desired.

As a block-based cache it should be more effective at identifying patterns that are used to determine cache worthiness than file-based methods typically are. Also, a block-based cache should be more efficient with large files because it doesn’t have to move the entire file into cache when only a portion of that file is needed. And, it doesn’t ignore very large files altogether, as some file-based caching does.

How it Works

In general, caching strives to leverage the fact that the majority of data accesses involve a minority of the data sets that are used to support critical applications. And most of those accesses occur in a relatively narrow time window, ie: on data that was read or written within the prior few minutes. By using knowledge about those data the most ‘cache worthy’ can be moved into SSD and kept there. To do this, caching algorithms make assumptions in order to predict which data will be needed by the application. What they base those assumptions on is key to how accurate their predictions will be.

Recency and Frequency Assumptions

Historically, caching algorithms have used the “recency assumption”, which is akin to the last-in, first-out concept. Essentially it assumes that data most recently accessed will be accessed again, before a ‘new’ segment or block is called for. In a similar fashion, the number of times a data block has been accessed in the past is commonly used as a predictor of its demand in the future.

Using this “frequency assumption” the caching software tracks which blocks have been accessed the most over a given period of time. While relatively accurate the frequency and recency assumptions can provide a foundation for the caching process. But they’re ‘brute force’ techniques that aren’t very sophisticated and use little intelligence about the data itself or the application to improve the overall hit rate.

Content Popularity Assumption

VeloBit uses another assumption based on “content popularity” to enhance the effectiveness of their caching algorithms. Each data block is scanned to create a signature, similar to that generated by a deduplication process. But this signature can be used to represent the content of the data block and compared with other blocks to determine how close of a match they are. Deduplication, on the other hand, only determines if blocks are identical.

As it turns out, blocks which have similar content are often used together making block similarity a strong predictor of a block’s probable use, or its “popularity”. So blocks which have a large percentage of similar content as recently used blocks can be identified as more popular and marked for caching. In addition to improving overall cache hits, knowing its popularity enables a block to be prefetched, further improving cache efficiency.

For example, consider full clones in a 100-desktop VDI environment: 100 copies of the operating system are stored in different locations on disk. Since recency and frequency algorithms use the address of a chunk of data, those algorithms would only see each OS copy being accessed once when booted. So each OS copy would not be considered ‘hot’, even though the same operating system is being booted over and over again.

In contrast, a content popularity algorithm recognizes similar blocks across the OS copies regardless of the location on disk. Those similar blocks are being accessed 100x more than the recency/frequency algorithms would indicate. The content popularity algorithm, on the other hand, sees the similar content 100x more than a frequency algorithm sees the same address. As a result, all 100 copies of the OS appear ‘hot’ and have high cache hits.

Compression

By understanding the contents of blocks VeloBit can make comparisons between blocks to determine their differences.This process, called “similarity detection”, can identify the portions of blocks that are redundant, including those which may appear unique but are only differently-aligned. Then it stores only the true differences between blocks, which can greatly reduce the amount of actual data that gets written to disk, and more importantly, to the cache area.

While this process does reduce data duplication in cache, it’s not deduplication, which can only eliminate identical blocks. VeloBit’s content-based data reduction is much more efficient than deduplication, since it only needs to find similarities in data blocks, not exact matches. The result is a much lower computational overhead which enables the process to be run in real-time at line speed.

This compression process enables the cache to eliminate more redundancy and greatly reduce data volumes, increasing the effective capacity of the cache. This increased cache capacity helps improve read hits and reduces the write load because data is stored in this compressed format. It does consume some CPU cycles but the ability to reduce write I/O is well worth this ‘investment’.

SSD Optimization

Many legacy caching processes were developed for disk drives, which can overwrite data at the bit level. NAND flash must be erased at the block level, which generates a significant amount of overhead. VeloBit takes care to structure data writes to correspond more closely with the block-level ‘program and erase’ process used with NAND flash. The result is better flash capacity utilization, lower controller overhead for flash-specific data handling (like garbage collection) and improved NAND endurance.

Use Case

A large telecom company providing managed network services – and a guaranteed service level – needs to collect network data in real time and perform real time analytics so they can provision the network and tune bandwidth performance to support these SLAs. This use case requires analyzing terabytes of data at GB/sec rates.

HyperCache was installed on 16 servers running the company’s analytics application and produced a 20x performance improvement, reducing processing time from 2 minutes to 6 seconds. A big surprise for this customer was the fact that they didn’t have to rewrite any applications to get these kinds of results. VeloBit’s block caching solution installed transparently and can scale with their application server instances.

Storage Swiss Take

Caching has emerged as a ‘must have’ for SSD implementations because it’s easy to put in and can provide widespread performance benefits. The effectiveness of a caching solution is determined by how it keeps the ‘right’ data on SSD, and by its ability to expand the effective capacity of the cache. VeloBit does both. This requires making accurate predictions (and assumptions) about which data will be needed by the applications it supports and by leveraging data reduction technologies to minimize the volume of data that gets written to cache.

By capturing information about the content of data blocks it can take a more intelligent approach to determining cache worthiness, making better assumptions about block ‘popularity’ and increasing the hit rate for a given cache area. It further uses this intelligence to data-reduce blocks chosen for the cache, effectively increasing the cache capacity. The results VeloBit customers are seeing represent a significant improvement over traditional caching technologies and certainly justify close consideration for any environment looking to leverage SSD to improve application response times.

VeloBit is a client of Storage Switzerland

About Eric Slack

Eric is an Analyst with Storage Switzerland and has over 25 years experience in high-technology industries. He’s held technical, management and marketing positions in the computer storage, instrumentation, digital imaging and test equipment fields. He has spent the past 15 years in the data storage field, with storage hardware manufacturers and as a national storage integrator, designing and implementing open systems storage solutions for companies in the Western United States. Eric earned degrees in electrical/computer engineering from the University of Colorado and marketing from California State University, Humboldt. He and his wife live in Colorado and have twins in college.

Tagged with: Cache, Flash memory, HyperCache, Linux, Solid-state drive, SSD, VeloBit, VMware, Windows
Posted in Product Analysis