All-Flash Array Vendors are now releasing systems with 3D TLC SSDs. They claim that they have reached price parity, without data efficiency, to mainstream data center hard disk arrays. 3D TLC NAND does bring the price per GB of flash storage down considerably, but it does carry the risk of device failure and data loss. Understanding how a vendor mitigates that risk is critical to vendor selection.
Since their inception, one of the goals for all-flash array vendors is to create an all-flash array (AFA) priced the same as or better than an equivalent HDD array. When MLC based AFAs came to market, they reached their goal but under certain conditions. The array had to use data efficiency, and the environment had to be databases, virtual desktops or virtual servers. The corresponding HDD array also had to be designed for high performance. 3D TLC based AFAs compare favorably to mainstream arrays and don’t necessarily require data efficiency to achieve price parity. See our article “What is 3D NAND and why should the Data Center care?” for more information on 3D NAND technology.
Does 3D TLC Durability Matter?
To achieve the new low in price per GB, 3D TLC SSDs write three bits per cell instead of MLCs two bits per cell. But the increase in write density also means that 3D TLC SSD can’t sustain as many writes before they wear out. While 3D technology provides an improvement in durability, AFA vendors that leverage TLC SSD must take extra precautions to ensure that the drives are suitable for the enterprise.
It is reasonable for an IT professional to question whether TLC durability should be a concern of theirs, and whether they should care how the AFA vendor provides that durability. After all, the vendor is going to offer a warranty and increasingly those warranties are tied to years of service versus the number of writes.
Understanding 3D TLC durability and how the vendor provides enterprise durability does matter. Even if the manufacturer replaces a failed SSD within hours of failure, it takes time for the administrator to identify, swap out the failed drive and send it back to the vendor. Even if the manufacturer is going to replace the drive for the customer, coordinating and escorting vendor staff through the data center also takes time.
Another reason the IT professional needs to know how the AFA vendor is providing 3D TLC durability is the steps that a vendor takes may also impact the performance and cost of the AFA.
Most data is extremely active when initially created. Updates are frequent after creation. Within a relatively short period of time, the data becomes passive and from that point forward is typically read, not written to or updated. The first method to leverage 3D TLC SSDs is to use them in combination with a small MLC SSD tier. The MLC tier acts as a shock absorber to the TLC tier. As a result, the TLC tier is only written to when the data is passive. It is essentially a read-only tier. We expect most hybrid vendors as they introduce AFAs to adopt this technique.
Over-provisioned 3D TLC SSD
One of the causes of wear on an SSD has little to do with how much data is written to the drive and more to do with how the SSD controller organizes data. An SSD is constantly re-writing data. The closer to reaching its capacity, the more difficult it is for the drive to update existing data. Data has to be read in and moved somewhere else on the drive. Each of these swaps increases the number of writes.
Another technique to increase SSD durability is over-provisioning. Over-provisioning sets aside SSD capacity and hides it from the user or operating system. For example, a drive purchased as a 500GB SSD may have 650GB of actual capacity, but the operating system only sees the 500Gb. Over-provisioning increases the number of cells that the SSD has to write to, thereby increasing the effective durability. The technique also allows the drive to use the over-provisioned area to reduce the amount of data swaps that need to occur when updating data, reducing write amplification. The higher the set-aside, the higher the effective durability of the drive.
Over-provisioning is particularly useful with 3D TLC NAND because of its low cost. Hiding 150GB per drive is less expensive than hiding 150GB on an MLC drive. AFA vendors can choose to set aside a higher percentage of the drive and still deliver a less expensive system than one that uses MLC as a buffer. We expect most all-flash vendors to adopt the extra over-provisioned technique because they don’t have the capability to move data between tiers,
Tiering vs. Over-provisioning
Picking a winner between the two techniques is difficult at this point. Part of the cost equation is dependent on the size of the data set. For most environments the size of the active data set tends to stay consistent, but the inactive dataset grows quite large. If this is the case then over time the tiering method may be less expensive since those vendors won’t need to over-provision the larger TLC tier as much as the TLC only vendors will.
While MLC should have a performance advantage over 3D TLC, the write performance differential between MLC and TLC is relatively small. Except in high-write environments, the TLC only systems should have a performance advantage because they don’t have the overhead of having to analyze and move data between tiers. Also, the extra over-provisioning allotment should allow them to receive writes faster.
For a data center buying an AFA today, tiering seems like the safer choice. It uses proven MLC technology to receive inbound data and TLC as more of a read-only tier. The problem is few vendors provide this capability today. Most AFA vendors will move from an MLC-only AFA configuration to a TLC-only configuration. They will not invest in making their storage software more intelligent so it can move data between media types. In the long run this decision is short-sighted. There will always be a “next” storage technology, whether it is non-volatile RAM acting as the shock absorber, or quad-level cell flash NAND acting as the cost-efficient storage area for persistent data. Automatically managing these technologies allows data centers to adopt them sooner and solve next generation problems faster.