There has been a lot of discussion lately about the issue of power failures in flash based solid state storage systems. In various tests run by industry associations flash based solid state disk (SSD) devices, when subjected to a sudden power-loss, have been found to lose data, corrupt data or even fail altogether. While some of the tests go to extremes (most data centers won’t experience 150 power failures in succession, as is cited), there is a reason to be concerned. Consequently, IT planners need to make sure that the flash based system they select can survive and protect data after an unexpected power loss.
The Cause For Concern
Flash based media does two basic operations and there are several design concerns that put data at risk in the event of a power failure. Flash based devices typically write data in two 16KB pages simultaneously. In other words flash devices write data in 32KB chunks. If a 40KB file is written to the flash device, the flash controller needs to decide what to do with the extra 8KB of data. If the controller writes data to the flash immediately, this will waste 28KB of flash capacity, which when multiplied over thousands of writes would be very inefficient.
In order to address this most flash devices will hold this data in a small volatile buffer until the proper condition occurs that forces a RAM flush. It is this data that’s at risk and even though only a small amount, losing it can lead to complete data loss.
The Capacitor Problem
Most of these data loss events should be eliminated by using a capacitor to maintain a charge until data can be flushed to persistent flash storage. But there are several problems with capacitor technology that need to be addressed. A capacitor is the technological equivalent to a circuit-board level battery backup, providing enough energy for the buffer to be flushed and a graceful shutdown to occur, typically a few milliseconds. But if the controller does not understand how to monitor for power loss situations and leverage the capacitor then data may be lost.
The first problem with capacitor technology is that many SSDs on the market purport to be ‘enterprise class’ or at least ‘server grade’, yet actually use consumer grade flash controllers. These controllers don’t have the intelligence to take advantage of a capacitor in the process described above.
Most consumer grade flash controllers only hold data for a few seconds, simply to determine if more data will be received to fill up the rest of the page and allow for more efficient use of the SSD. If no additional data is received they simply flush the buffer anyway. Then standard garbage collection routines reorganize this data during idle times and return the drive to a better level of efficiency.
In contrast, true “enterprise” flash controllers do typically have the intelligence to use the capacitor as emergency backup power. They can effectively monitor the power state and flush the buffer to the flash when power is lost. As a result enterprise flash devices with enterprise flash controllers will hold data in the buffer area until it can write a full 32KB block.
The problem with this approach is that more data is exposed for a longer period of time and it makes data integrity dependent on the capacitor circuit’s operation. However, many enterprise flash controllers typically lack the ability to test the capacitor continuously to make sure it’s up to the task. This is an important omission since, as we will discuss, capacitors have their own reliability issues.
In both consumer SSDs and enterprise SSDs there is cause for concern. If a power failure occurs, even with capacitor protection, data could be lost. In the consumer flash controller case, data in the buffer area won’t be included in the emergency flush to disk. In the enterprise case if power fails and the capacitor does not function correctly, data will also be lost.
The real challenge with a capacitor, and one that we suggest was the cause for data loss in some of the reported tests, is that capacitor technology, despite being around for a long time, has not proven to be 100% reliable. Especially when small capacitors are used that can fit inside the drive form factor SSD.
As a result reliability of capacitors is a real concern. The SSD should be able to test the capacitor’s ability to hold a charge and report on this component’s health to the storage system or the user. If the capacitor’s performance has been degraded or has failed completely it would leave the system vulnerable to a power outage.
In this situation, the enterprise drive is actually more vulnerable than a consumer drive since the enterprise drive won’t flush data at all until it receives a power failure indication where the consumer SSD flushes this area after a few seconds.
The final cause for concern is around the types of devices that are tested. In most cases these are drive form-factor SSDs or a PCIe SSD installed inside a server without even the most basic of data protection methods, like mirroring or RAID. They certainly aren’t “systems” that are designed specifically around flash media.
Drive form-factor SSDs and PCIe SSDs also have physical space constraints that keep them from using larger, more reliable capacitors or even better, non-volatile RAM technology that’s immune to power loss. Physically larger, these solutions simply can’t fit into the space constraints of a card or drive form factor. Solid state storage systems, on the other hand, are designed specifically to provide availability and data protection through all kinds of power interruption events.
The Solution – Proper Flash System Design
As we will describe below a properly designed flash-based storage system should be completely immune to repeated power failures, certainly more so than most server class systems. This is where a more vertically integrated system like Skyera’s skyHawk product family has an advantage. In fact, their system is designed with data page buffers which are stored on MRAM, a non-volatile form of DRAM, so there is no need for capacitor protection at all.
How is SSD deployed?
The first area to explore is how the SSD is deployed within the flash storage system. Did the vendor choose to use individual drive form-factor SSDs or a bank of flash memory modules laid out similar to the way DRAM is laid out on a server motherboard?
The drive form-factor approach exposes the storage system to many of the same failure concerns that the above mentioned SSD experienced, unless the SSD addresses those concerns. This type of design is essentially creating a “server” with many drive bays. Filling those bays up with individual SSDs and then adding storage software does not make it a storage system. The system designers may add a flash protection layer as well as integrate standard data protection techniques like RAID.
Storage systems that use a modular flash approach can interact with all the flash memory as if it were one logical unit, which means these storage systems can control the kind and number of capacitors being used. More importantly they have the ability and physical space to leverage the above mentioned MRAM technology. An SSD system typically has the space to integrate these physically larger devices.
The design type chosen also impacts the viability of the data protection technology that the flash storage system uses. An assumption would be made that even the most basic RAID design should protect against many of the failures cited in the research. The problem is that power loss causes data loss to occur in the RAM buffer of each individual drive. If an SSD system is built with drive form-factor devices, 100% of the capacitors must be operational on each drive or a multiple drive failure situation will occur for which RAID will provide no protection.
Once again, the vertically integrated approach has an advantage here. First, as described above, the chances of repeated power failures causing data loss or drive corruption are greatly reduced if not eliminated thanks to either MRAM or better capacitor technology (and capacitor monitoring). As a result the chance of a double module failure is extremely low and RAID-protected flash becomes a more viable strategy.
This combination of MRAM for inbound data protection, in-module protection and cross-module protection should lead to a statistical impossibility of data loss from power failure. That said all of these components need to be continuously monitored as does the status of power and the amount of outstanding data still to be written. The combination of the technology and monitoring provides vertically integrated systems an advantage when it comes to surviving power faults.
Mapping Tables – The Capacitors Biggest Challenge
The available storage space of shared SSD and PCIe SSD creates a new capacitor challenge, the time it takes to flush mapping tables to the flash tier. These storage systems also have to take into account this challenge as part of their system designs. Unlike drive form-factor SSDs, shared storage systems and internal PCIe SSDs have very high capacities and all of which must be mapped so that the flash controller knows where on the NAND the data is when requested.
Think of this mapping table as an inode table in UNIX or a file allocation table (FAT) in Windows. If this table is lost during a power outage, it must be rebuilt from scratch, similar to running checkdisk on a hard disk based system. This process can take hours for many shared solid state storage systems. For the real-time environments that count on instantaneous results, that rebuild time is probably unacceptable. In our next article “The Unknown Risk of SSD Mapping Tables” Storage Switzerland will discuss the importance of this table and how to properly manage it.
Flash storage’s susceptibility to data loss from repeated, unplanned power outages is real and something that should be considered when evaluating flash technology to solve performance issues. High performance should not come at the cost of potential data loss. Flash based systems, especially those that are vertically integrated so that hardware design and software capabilities can work together, solve many of these issues and give yet another reason to consider shared over server-based solid state storage.