Delivering new levels of application performance at massive scale is a requirement for modern enterprises and cloud service providers. These organizations are frequently employing hyperscale architectures and solid-state disks (SSDs) to meet modern applications’ aggressive performance requirements. However, performance degradation still occurs because inconsistent flash read latencies clash with the distributed architecture of modern, cloud-native applications.
SSDs are capable of delivering very low levels of latency, by and large in the realm of one millisecond or less. At times, though, SSD latency levels can spike to three or four milliseconds (or more). For example, internal tasks such as garbage collection might occur during an in opportune time when the application is attempting to read from the flash chip. Also, an input/output (I/O) operations request might be assigned to a flash chip that is already working on an I/O operation, and as a result becomes queued.
Latency outliers are not a problem when it comes to serving traditional applications. This is because traditional applications were architected to confine work to a singular system. As a result, their performance depends on the average latency of that storage drive’s chips.
Cloud applications, on the other hand, are architected to distribute singular jobs across multiple systems. They do so to enable multiple users to access the application at once, to facilitate faster application processing, and to increase utilization of system resources. Hundreds or thousands of systems might work concurrently on the same job. For example, a big data analytics workload might use the MapReduce programming model, which conducts functions such as data filtering and sorting as well as summary operations in parallel, across a cluster of multiple systems.
Because of their distributed and parallel nature, cloud applications’ performance depends entirely on the slowest SSD chip request latency. For the job to be completed, the application must wait for any outlier chip in the SSD that encountered a latency bottleneck while processing its particular component of the job. This problem becomes exacerbated as more workloads are consolidated on the cloud infrastructure (which can create new I/O queue management issues), and as SSD densities continue to increase (as more chips are concentrated within the SSD). Consistent or “deterministic” performance at massive scale becomes a requirement.
Toshiba Memory recently joined Storage Switzerland for a Lightboard discussion of how its KumoScale architecture addresses the flash latency outlier problem by buffering read and write I/O operations across SSDs.