Solid State Disk (SSD) storage devices now commonly use flash memory as the media on which they record data. Flash has driven the SSD cost per capacity significantly below that of DRAM based SSD and flash is narrowing the lead that DRAM SSDs have in performance and reliability. The industry is now standardizing on flash SSD products and users are looking for ways to distinguish one manufacturer from another. The component of SSD that most directly drives its performance and reliability, but is often overlooked, is the flash controller. Storage managers must pay attention to flash controllers, as they’re the most significant differentiator of flash SSD systems for the enterprise.
There are two basic types of flash memory, Multi-Level Cell (MLC) and Single Level Cell (SLC). SLC flash is the technology used in enterprise SSD products. MLC offers greater data density and a lower cost, but has a shorter usable life, slower performance and lower reliability, so it’s relegated to mostly consumer products like memory cards and thumb drives.
Beyond these different types of flash (SLC & MLC), flash memory is essentially the same for all manufacturers. Flash controller innovation is one way that flash manufacturers can differentiate their solutions. Flash controllers serve some of the same functions as disk controllers such as data protection, but are also expected to handle writing and erasing, block provisioning, and wear leveling. A second way flash vendors are different is that some manage flash inside their hardware and others rely on external resources.
Data protection is the most critical job of the flash controller. To maintain data integrity, flash controllers typically implement up to three levels of data protection: bit-level ECC correction, chip-level RAID, and board or drive-level failover. As a comparison, hard drive RAID controllers only handle two levels, parity calculations and drive failover, not the ECC function.
A primary differentiator of flash controller design are these data protection functions. Comparatively with traditional hard drives flash memory is a less robust storage medium so data protection schemes are essential. A good rule of thumb is each level of data protection reduces risk of failure by one order of magnitude. That said, many controllers still only implement one level of data protection; leaving the system vulnerable. The best controllers implement at least two and preferably all three levels of data protection.
Writing and Erasing is the function that most directly affects performance. When new data is written to flash, old data must be erased ahead of each write to make room for the new data. Therefore, write performance to the flash device is a function of the speed with which data can be written and erased. Also, flash memory must be erased in blocks, not at the byte level. This means that when a section of data is deleted, the entire block it’s located on must be erased and the bytes NOT marked for deletion must be copied to another location. This generates another operation for the flash controller to manage and another process that must occur before the write is completed. Poor performing controllers are slow to do this multiple-step, back-end erasure, reducing overall write speed of the device. In some cases, poorly written write provisioning software can worsen flash write times to that comparable to a disk drive. Well designed controllers, on the other hand, can improve performance by reserving excess flash capacity and ‘pre-erasing’ a number of flash cells ahead of write operations.
Flash block provisioning is a process in the flash controller that deals with bad block management. At times blocks physical sections on the flash chips will go bad or become inaccessible. The flash controller will manage the process to replace these bad blocks with good blocks from extra flash capacity that’s allocated for this purpose, essentially an ‘overhead’ expense of the system. This block replacement process maintains the usable capacity of the system and prevents actual data loss.
Amazingly, some flash controllers actually don’t do flash block provisioning, they make the entire capacity ‘usable’ and vulnerable to data loss. This is something that’s not done in true enterprise flash systems, but does occur in lower level systems.
Wear leveling is another responsibility of the flash controller. Disk drives typically write data to a location and then erase and rewrite that location when the data’s changed. Flash memory as a medium has a finite lifecycle, measured in the number of times it’s been erased and written to (P/E operations). For this reason flash systems spread out data writes across all available blocks so that the entire chip set ‘wears’ at a consistent rate. Wear leveling is something that all enterprise-class flash systems do, although it’s left out of lower level products. The sophistication of the wear leveling algorithms in the flash controller can affect the useful life of a flash memory system.
Where is Flash controlled?
Keeping track and managing the execution of each of these processes is done by software routines written into the flash controller. Not surprisingly, running these routines can consume a significant amount of CPU cycles and internal memory. Where this processing is done is perhaps a major fundamental difference between flash controller designs. Good, enterprise-level flash systems have dedicated internal CPUs and memory to do this management processing job. Systems that don’t must use the host CPU and memory, stealing system resources from other jobs which affect application performance. This is a common problem of overhead in an IT system – it has to be done somewhere. Also, flash systems that have their own dedicated processing engines tune these subsystems for maximum performance. Systems that rely on host CPUs are typically less efficient.
With flash memory becoming the standard in enterprise SSD systems, users need to look more closely at the flash controller architectures of these products as a way to evaluate them. The flash controller manages a number of functions specific to this technology that are central to data integrity and overall read/write operations. Aside from system reliability, poor controller design can impact throughput, latency and IOPS more than any other system component. Given the importance of performance to SSD systems, flash controller functionality should be a primary focus when comparing different manufacturers’ systems.