How to Solve the SSD Endurance Problem

As the storage medium for main memory DRAM is where the processing work gets done in application servers. It’s generally the fastest mass produced data storage area available and can support the nearly unlimited cycle of writes and overwrites that takes place in this demanding environment. But DRAM is also volatile storage and can only store data when it’s powered up; it’s not a persistent memory. This means DRAM, by itself, is often not suitable as a storage area for business critical applications or for use with data sets that can’t be easily replaced.

Since flash is non-volatile it’s being used as an alternative to DRAM in many compute and storage devices. In a process called “tiering”, software applications are loaded into flash and run entirely out of this high performance medium. In other implementations, flash can be used as a read cache, providing a high capacity, high speed storage area to service repeated data requests from applications or users. But NAND flash can only support a finite number of write and erase cycles, limiting its useful life. When SSD endurance is a problem, DRAM is usually the only option, typically implemented with work-arounds to address its persistence issue.

Checkpointing

Data that resides on DRAM is vulnerable so designers will build in “checkpoints” which regularly copy data out to a persistent medium, like hard drives or NAND flash. But these extra writes impact performance since they consume processing cycles, and the destination storage is always slower than the memory area it’s coming from. This means that not only is the compute or throughput ‘work’ the DRAM was designed to perform delayed, but that additional process is particularly slow, in the case of hard disk drives, orders of magnitude slower. The result is a trade off between protecting the data that’s in DRAM and taking a performance hit.

Even with these issues, DRAM capacities in compute and storage devices are getting larger, in response to a continual need for performance. Application servers are now being implemented with enough DRAM to hold the most critical data sets or sometimes to run an entire application in memory. Storage systems which have traditionally used DRAM for storing metadata and for caching are now using more as well, thanks to capabilities like thin provisioning, snapshots and deduplication. These metadata and caching applications typically favor DRAM because of its speed of search and high write performance.

Combining flash and DRAM technology would seem to offer the best of both worlds. Such a combination could provide a high performance, solid state storage solution with the speed and long life of DRAM, plus the persistence of NAND flash. “Hybrid” solutions with both flash and DRAM have come out, but most of these require caching or tiering software either on the device or on the application server. Another alternative is to combine DRAM and flash directly on the DIMM module itself, providing data persistence to DRAM memory without introducing another storage device or another layer of software complexity.

Non-volatile DRAM

Solutions like the ArxCis-NV*, a non-volatile DIMM from Viking Technology, integrates NAND flash with high speed DDR3 DRAM in a DIMM sub-system package to create a non-volatile DRAM module (NVDIMM or NV-DIMM). When a power or system failure occurs the NVDIMM module has a super-capacitor circuit which maintains a charge on the device as it transfers all data from DRAM to flash. This kind of hybrid solution enables users to skip the work-arounds described above and implement DRAM, alone, as a persistent storage area.

Infinite write cache

Using an NV-DIMM means that critical applications can have a write cache that’s much larger (up to the capacity of available DIMM slots), without battery backup circuits to maintain data protection. This solution also eliminates the checkpointing process described above and the performance penalties this process entails. DRAM can provide an almost ‘infinite’ write cache, in terms of endurance, compared with NAND flash, since it can handle the continuous write and erase activity common in storage devices.

“Suspend/Resume” recovery

Perhaps the most dramatic improvement these hybrid solutions can bring to storage and compute platforms is evident after a system failure. When a failure event occurs caches must be restored or repopulated with data from the persistent storage area, a process which can be especially disruptive since it’s coming from a slower storage medium. NVDIMM, as an example, performs this copy process using an on-board controller and an internal data bus. No system resources are used, slower networks and server buses are bypassed and the transfer takes place directly from flash to DRAM. The result is a recovery process that can look to the application like a ‘suspend and resume’ cycle, not the shut down – start up – recopy process that traditional DRAM recoveries involve.

This nonvolatile DRAM solution can also replace the need for redundancy typically used to assure high availability (HA) in business critical applications. For storage implementations, this can mean eliminating the second ‘hot spare’ storage device, or another storage system altogether, greatly reducing cost and complexity.

Storage Class Memory

As mentioned earlier, new hybrid storage solutions that use commodity server hardware leverage storage services such as snapshots, thin provisioning and deduplication to achieve their performance and provide overall value. These processes rely on larger memory capacities to store critical metadata and to serve as write caches. But using DRAM in these storage solutions means dealing with its data persistence problems as well.

Flash has been touted as an alternative to DRAM in these larger memory configurations. But flash has its own shortcomings. Essentially workarounds themselves, flash storage devices include a significant amount of embedded overhead and complexity in an effort to increase its lifespan. NAND flash is also an order or magnitude slower than DRAM, making it a less than ideal stand-in.

With the persistence issue resolved, DRAM provides a more complete solid state storage solution than flash, one with better performance and essentially no long term endurance issues. For many storage applications NV-DIMM is the first real “Storage Class Memory” and a technology that solves the SSD endurance problem.

* Trademark of Viking Technology

Viking is a client of Storage Switzerland

Eric is an Analyst with Storage Switzerland and has over 25 years experience in high-technology industries. He’s held technical, management and marketing positions in the computer storage, instrumentation, digital imaging and test equipment fields. He has spent the past 15 years in the data storage field, with storage hardware manufacturers and as a national storage integrator, designing and implementing open systems storage solutions for companies in the Western United States.  Eric earned degrees in electrical/computer engineering from the University of Colorado and marketing from California State University, Humboldt.  He and his wife live in Colorado and have twins in college.

Tagged with: , , , , , , , ,
Posted in Article

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,246 other followers

Blog Stats
  • 1,564,419 views
%d bloggers like this: