It is easy to get caught up in the maximum performance potential of solid-state drives (SSDs), especially as we move into the era of non-volatile memory express (NVMe) access protocols. However, in reality, applications and service level agreements (SLAs) cannot be built around the peak level of performance that is possible. Rather, they must be built around a level of performance that is consistent, and acceptable, in order to facilitate a predictable end user experience. Consistency matters for the high-performance workloads that are traditionally served by SSDs. Looking beyond this more niche use case, predictable performance also matters as enterprises run a larger number of workloads, and their core, run-the-business workloads, on SSDs.
Cameron Crandall, Senior Technology Manager with SSD and flash memory provider Kingston Technology, joined George Crump, Founder and Lead Analyst of Storage Switzerland, for a Lightboard Video to discuss why the SSD architecture matters when it comes to achieving predictable performance.
A first step to enabling consistent input/output (I/O) operations per second (IOPS) and latency (which measures response time) is hardware optimization. For example, introducing a larger dynamic random access memory (DRAM) cache on the SSD provides the ability to buffer a larger number of incoming write jobs. This allows the handling of those jobs, by the SSD, with greater consistency.
Arguably, the more challenging component beyond hardware optimization is optimizing the storage software stack. As SSDs become more commonly utilized, the storage software no longer has the levels of hardware latency to hide behind that it used to. This became evident with the shift from hard disk drives (HDDs) to SSDs, and it is becoming an increasingly prominent bottleneck with the shift from serial-attached SCSI (SAS) and serial advanced technology attachment (SATA) SSDs, to NVMe SSDs. NVMe vastly increases command counts, queues and queue depths, and it utilizes fast peripheral component interconnect express (PCIe) interconnect technologies, facilitating ultra-low latency and much higher I/O counts. Additionally, customized firmware, that optimizes background operations such as garbage collection, becomes more important than ever before to not only fully realize these potential performance levels, but also performance consistency. Not only do these background operations consume compute cycles, but because they do not run continuously, they stand to cause unpredictable and inconsistent performance variations when they do kick off. At the same time, enterprise IT infrastructure, especially systems that are serving mission-critical workloads such as databases and online transaction processing (OLTP), are typically running continuously. As a result, they have no (or minimal) idle time to perform these functions. Consequently, how they are executed via the storage firmware can make or break overall quality of service.