The typical performance metrics used to categorize flash performance are throughput and IOPS. The most important metric may actually be latency. The elimination of latency has become a top concern for customers and flash vendors. That has led to solutions that move flash performance into the server and on to faster buses like PCIe and now as we will discuss in our webinar, the memory bus itself. The purpose of this article is to clarify the difference between latency and IOPS.
What are IOPS, Bandwidth and Transactions?
Performance is most often represented in storage products by two statistics, throughput and IOPS. Throughput is a measure of the amount of data that can be pushed through a common point in the data path in a given time. A similar term is “bandwidth”, which represents the ability to support a given level of throughput. both are typically expressed as bits per second in networking terminology and as bytes per second in storage terminology.
IOPS or Input/Output operations per second as we discuss in greater detail in our article “What Are IOPS?” is an aggregate metric, a measure of the total number of storage transactions being processed through a system or a single storage port every second.
“Transactions” are requests for data made by servers and the process of finding and accessing the blocks of data out from storage systems to fulfill those requests. IOPS are typically used to describe performance in use cases that involve smaller data objects, like metadata about financial transactions or web traffic logs. For larger data objects, such as images or video files which need to be streamed to users or applications in the shortest possible timeframe, throughput is the metric most often used.
What is Latency?
In the context of this article, latency is a measure of the time required for a sub-system or a component in that sub-system to process a single storage transaction or data request. It’s akin to the propagation delay of a signal through a discrete component and is typically a function of hardware. For storage subsystems, latency refers to how long it takes for a single data request to be received and the right data found and accessed from the storage media. In a disk drive, read latency is the time required for the controller to find the proper data blocks and place the heads over those blocks (including the time needed to spin the disk platters) to begin the transfer process.
In a flash device, read latency includes the time to navigate through the various network connectivity (fibre, iSCSI, SCSI, PCIe Bus and now Memory Bus). Once that navigation is done latency also includes the time within the flash subsystem to find the required data blocks and prepare to transfer data. For write operations on a flash device in a “steady-state” condition, latency can also include the time consumed by the flash controller to do overhead activities such as block erase, copy and ‘garbage collection’ in preparation for accepting new data. This is why flash write latency is typically greater than read latency.
How Are IOPS different?
IOPS for a given subsystem, like a flash device, is an aggregate of the transactions that it processes in a given second. So latency can directly affect IOPS. However, focusing on IOPS alone as a performance metric can be misleading. While it does describe the number of data transactions a system can sustain each second, it doesn’t specify the amount of data each transaction delivers, which can vary widely with block size.
In order to maintain IOPS performance storage systems need enough pending requests for data be available (in a queue) so that the latencies of a few individual requests don’t adversely impact the overall IOPS performance of the system. When queue depths are low, latency becomes a larger determinant of storage performance and in flash based storage systems, low (or no) queue depth is a common occurrence.
While adequate transaction queues are required to realize SSD performance, large queue depths can actually mask latency in the short run. Essentially, the system can process a large number of transactions per second from the queue (high IOPS) but still take a relatively long time to complete those transactions (long latency). So, in the short-term IOPS may look good, even though the system’s ability to improve performance in the long-term is poor. For these reasons, latency is often a more important metric than IOPS for measuring SSD system performance.
Performance in storage devices can be a difficult thing to accurately compare, since the way they are measured can vary with the application or the environment. In some use cases, moving large amounts of data quickly (high throughput) makes up good performance; for others it’s supporting a high number of relatively short processes (high IOPS). To further confuse the issue, factors like block size, queue depth and degree of parallelism can increase throughput and IOPS in an environment without benefitting overall application performance.
A more consistent performance metric is latency. As a primary factor in both IOPS and throughput calculations, it’s impact on storage performance is fundamental. This means that reducing latency will universally improve performance, making latency the first metric to look at when evaluating storage performance.
- New Report – Reducing Storage Latency by putting Flash on the Memory Bus (storageswiss.com)