The Unknown Risk of SSD Mapping Tables

Solid State Disks (SSD) and Flash Appliances use mapping tables to track where data is stored on the flash device. These tables play a similar role to iNode or File Allocation Tables (FAT) and if they get corrupted or lost, they need to be rebuilt. While faster than hard disk systems, the rebuild process on large flash appliances can take hours which makes it especially problematic for the performance sensitive environments that solid state solutions are typically deployed into.

What is a Mapping Table Used For?

Mapping tables allow mixed data segments to be joined as a single stream and written sequentially to flash cells. The use of mapping tables is very common in storage systems of all types but is particularly valuable in solid state storage. Mapping tables enable the sequential writing of random data and increase SSD capacity efficiency. Furthermore, it improves flash write performance by allowing flash blocks to be “filled” sequentially. Ultimately, the purpose of the mapping table is to provide a path to all the sub-block locations that make up a specific data segment.

The mapping table is typically stored in DRAM on a flash appliance. This allows for it to be quickly updated and searched. However, if a power failure occurs, similar to what Storage Switzerland described in its recent article “Power Failure and Flash Storage”, the mapping table may be at risk.

What Happens if the Flash Mapping Table Fails?

As mentioned above, if the mapping table fails or is lost it needs to be rebuilt. The good news is that the actual data is not lost but the bad news is the data cannot be accessed until the rebuild takes place. The rebuild process involves scanning the flash device block by block so that data segments locations can be re-mapped, similar to how a file system has to run chkdsk or FSCK when there is a hard drive error. While flash devices can do this rebuild faster than HDDs, the time lost while the rebuild happens is more critical in environments counting on flash. Imagine the impact of a real-time analytics system going offline for two hours while the system rebuilds the mapping table.

Why is the Mapping Table Vulnerable?

Mapping tables are quite large compared to the flash buffer area that was the focus of the power failure article. Since the buffer only maintains a few seconds of data it is quite small. The mapping table, however, has to store roughly 4-8 bytes of data for every 4,096 bytes (4k) written. A 1TB flash appliance “starter” system, will need 2GB of DRAM to store the mapping table. A 5TB flash appliance will need 10GB of DRAM and a 40TB flash system may need as much as 80GB of DRAM to store the mapping table. While the cost of DRAM may be hidden in a flash storage system, the challenge is how to protect that much dynamic memory long enough for it to be flushed to the non-volatile flash.

If a power failure occurs, the mapping table information that is in DRAM, which in most cases is the entire mapping table, is lost and needs to be rebuilt, as described above. Most flash vendors need a UPS with their system but the problem is that UPS’ are not integrated or standardized. Flash appliances often have no way to poll the UPS to make sure that it will work when needed. Battery technology is very unreliable, and the UPS needs constant monitoring to make sure that if a power failure occurs, it will be able to do its role. This lack of integration means that if there is a failure, the flash appliance may not be able to take precautionary measures to copy the mapping table to nonvolatile flash.

Another option is to use super capacitors (SuperCaps). As discussed in the Power Failure article, these capacitors are used to maintain power to the DRAM long enough to copy the data to nonvolatile flash. While this technique works, it is already suspect on lower capacity (500GB or less) drive form factor SSDs, while it is extremely suspect on high-capacity flash appliances that benefit from the use of globally available flash modules. Given the size of these mapping tables, the time required to charge DRAM, so that a dump to flash can occur, could be many minutes. This is something most capacitors can’t do.

Nonvolatile DRAM (NVRAM) is another option, but again with the mapping table sizes required by flash appliances, the cost of a large nonvolatile DRAM storage area may be too cost prohibitive. Also, the latest NVRAM modules are physically quite large so there may be a physical limitation to their use as well as a practical cost limitation.

The Solution? Journaled Mapping Tables

The answer can be found in advancements in file system technology that is designed specifically for a flash only environment. Companies like Skyera are creating flash storage appliances that leverage a journalling method similar to modern file systems. When a flash system is developed leveraging this type of technology, the mapping tables are incrementally synchronized from DRAM to Flash during lulls in system I/O. Only net new additions and changes to the table are sent to the flash storage area. This means the updates are small, fast and frequent, so that regular I/O is not impacted and only small sections of the mapping table are at risk. By using a journaled type of approach, lookups and updates to the mapping tables still occur at DRAM speeds since DRAM is still the primary storage area.

In the event of a power failure, the mapping table is reloaded from flash into DRAM when power is restored. Then a quick sync of data occurs to make sure that any data written to the flash appliance that didn’t have its corresponding mapping table entry written to flash is identified and added to the mapping table. The difference is the system returns to operation in seconds and not hours.

The challenge with a journaled approach to protecting mapping table information is that it requires a detailed understanding of the flash components as well as direct manipulation of these tables. Since most flash appliances rely on off-the-shelf SSDs, they typically won’t have that level of integration. Most flash suppliers will also not invest the software development cycles to manage the flash to this level of detail.

Conclusion

Solid state storage systems are full of potential but they are also full of surprises. This is compounded by the fact that there is a tendency to treat all flash systems the same. As this article explains the total thought given to flash array system’s creation impacts key availability concerns like downtime associated with mapping table rebuilds.

The system needs to protect users from any occurrence in the data center that could put their information at risk or application availability at risk. This includes a complete power failure compounded by a UPS failure. Companies like Skyera are addressing these concerns by designing completely integrated systems that protect not only against power failure but also insulate users from failures to the UPS systems.

 

Twelve years ago George Crump founded Storage Switzerland with one simple goal; to educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought after public speaker. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , , , ,
Posted in Article
One comment on “The Unknown Risk of SSD Mapping Tables
  1. […] count on instantaneous results, that rebuild time is probably unacceptable. In our next article “The Unknown Risk of SSD Mapping Tables” Storage Switzerland will discuss the importance of this table and how to properly manage […]

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,246 other followers

Blog Stats
  • 1,564,450 views
%d bloggers like this: