The removal of latency is a critical part to delivering more fully on the flash performance promise. No matter how fast the flash technology becomes, the latency of its connection to the CPU is a key stumbling block in achieving the greatest level of flash performance. Since the introduction of flash as an enterprise storage offering, the goal of flash suppliers has been to eliminate as much of that latency as possible.
Step 1 – Server Side Flash – Removal of the Network
The first step that many flash vendors take to reduce infrastructure latency is to remove the storage network itself. While we contend that the storage network can be enhanced to resolve this latency issue, many vendors instead chose to eliminate it all together. As a result, server side flash became a popular flash implementation method. When combined with caching software or when used as a virtual memory swap pool, server side flash storage became an easy first step to flash performance.
Most of these early implementations were based on drive form factor technology, also known as solid-state disks (SSD), that were installed into existing hard drive expansion bays in the server. But these devices had to deal with the latency of the storage protocol stack within the operating system and the SCSI interface; both of which were designed for hard disk systems, not flash.
Step 2 – PCIe Removal of Stack Latency
The storage stack is the layer of software and hardware interfaces that allow the CPU to communicate with the storage devices. This can include the operating system SCSI drivers and the hardware interface. In high performance environments, the time to navigate the storage stack can lead to latency. The solution for many vendors was to introduce PCIe based flash. There are two types of PCIe boards commonly available. The first is essentially an SSD on a PCIe board. It has the advantage of not taking up a drive bay and it is totally compatible with normal SCSI drivers. That means these boards tend to work out of the box with no special software; you can even boot from them. But they do incur the latency of going through the normal storage protocols and SCSI bus.
The second type is what we call a native PCIe SSD. This card does not use the operating systems storage protocol stack. Instead, it uses a driver tuned specifically for flash-based storage. While this provides a dramatic reduction in latency, it does need the use of special software drivers and in most cases, you cannot boot from these cards. A second device is needed, often a traditional SSD, to boot the server. Adding an SSD to an already expensive PCIe Flash solution makes these solutions harder to cost justify but for environments that needed to cut latency, it is an investment worth making.
For the last few years it seemed that native PCIe offered the greatest latency reduction. It was directly accessible by the CPU on the PCIe bus and the CPU did not have to go through the storage protocol stack to get data. But this is not a perfect solution. The PCIe is a shared bus and it’s bandwidth has to be shared with a variety of other interface cards used for network and device I/O. It also was not a bus that was designed specifically for storage or data. Hence, memory channel storage was introduced to overcome the latency issues inherent with a shared PCIe bus.
Step 3 – Memory Channel Storage – Elimination of The Bus
The next step in reducing latency to the flash device is now becoming a reality. Vendors like SanDisk are creating flash memory boards in the form of DIMMS that install where you would normally install DRAM. As we discussed in our recent webinar, “The Final Frontier For High Performance Flash”, there are several types of DIMM based solutions and each have their specific use case.
The first types are SSD DIMMs. These are flash storage modules that resemble DRAM DIMMs and are installed directly into the DIMM memory slot. However, unlike DRAM, they connect to the standard storage bus, typically SATA. They have the same latency issues as drive form factor SSD, but they can prove a very cost-effective performance alternative when near zero latency is not required and drive slots are scarce.
The second is NVDIMMs, which are actually DRAM modules that leverage flash and a small capacitor to protect the DRAM from failure. These devices could be used for high performance write cache applications, to protect Flash Appliances or to give servers a suspend/resume capability similar to laptops.
The third and newest type is Memory Channel Storage. This is Flash storage similar to SSD DIMM but interfaces to the CPU via the memory channel instead of an interconnect like SATA or PCIe. This will mean that server OEMs will need to update their ROM Bios to support the technology. But this it is not difficult to do and several already have prototype servers working. The advantage is the memory channel has the fastest and most dedicated access to the CPU of any other interconnect. It opens the possibility for Flash to not only be used as near zero latent high performance storage but also as very affordable, high-capacity, albeit slower DRAM.
Please see the below links for more details and be sure to register for our webinar, “The Final Frontier For High Performance Flash”. In that webinar, we go into the technology in detail and how it compares to others. By registering, you will receive an exclusive copy of our white paper, “Reducing Storage Latency by putting Flash on the Memory Channel”.
ChalkTalk: Removing Flash’s Final Latency Roadblock
Briefing Note: How to Make Flash Accessible on the Memory Bus