What is Latency? And How is it Different from IOPS?

The typical performance metrics used to categorize flash performance are throughput and IOPS. The most important metric may actually be latency. The elimination of latency has become a top concern for customers and flash vendors. That has led to solutions that move flash performance into the server and on to faster buses like PCIe and now as we will discuss in our webinar, the memory bus itself. The purpose of this article is to clarify the difference between latency and IOPS.

What are IOPS, Bandwidth and Transactions?

Performance is most often represented in storage products by two statistics, throughput and IOPS. Throughput is a measure of the amount of data that can be pushed through a common point in the data path in a given time. A similar term is “bandwidth”, which represents the ability to support a given level of throughput. both are typically expressed as bits per second in networking terminology and as bytes per second in storage terminology.

IOPS or Input/Output operations per second as we discuss in greater detail in our article “What Are IOPS?” is an aggregate metric, a measure of the total number of storage transactions being processed through a system or a single storage port every second.

“Transactions” are requests for data made by servers and the process of finding and accessing the blocks of data out from storage systems to fulfill those requests. IOPS are typically used to describe performance in use cases that involve smaller data objects, like metadata about financial transactions or web traffic logs. For larger data objects, such as images or video files which need to be streamed to users or applications in the shortest possible timeframe, throughput is the metric most often used.

What is Latency?

In the context of this article, latency is a measure of the time required for a sub-system or a component in that sub-system to process a single storage transaction or data request. It’s akin to the propagation delay of a signal through a discrete component and is typically a function of hardware. For storage subsystems, latency refers to how long it takes for a single data request to be received and the right data found and accessed from the storage media. In a disk drive, read latency is the time required for the controller to find the proper data blocks and place the heads over those blocks (including the time needed to spin the disk platters) to begin the transfer process.

In a flash device, read latency includes the time to navigate through the various network connectivity (fibre, iSCSI, SCSI, PCIe Bus and now Memory Bus). Once that navigation is done latency also includes the time within the flash subsystem to find the required data blocks and prepare to transfer data. For write operations on a flash device in a “steady-state” condition, latency can also include the time consumed by the flash controller to do overhead activities such as block erase, copy and ‘garbage collection’ in preparation for accepting new data. This is why flash write latency is typically greater than read latency.

How Are IOPS different?

IOPS for a given subsystem, like a flash device, is an aggregate of the transactions that it processes in a given second. So latency can directly affect IOPS. However, focusing on IOPS alone as a performance metric can be misleading. While it does describe the number of data transactions a system can sustain each second, it doesn’t specify the amount of data each transaction delivers, which can vary widely with block size.

In order to maintain IOPS performance storage systems need  enough pending requests for data be available (in a queue) so that the latencies of a few individual requests don’t adversely impact the overall IOPS performance of the system. When queue depths are low, latency becomes a larger determinant of storage performance and in flash based storage systems, low (or no) queue depth is a common occurrence.

While adequate transaction queues are required to realize SSD performance, large queue depths can actually mask latency in the short run. Essentially, the system can process a large number of transactions per second from the queue (high IOPS) but still take a relatively long time to complete those transactions (long latency). So, in the short-term IOPS may look good, even though the system’s ability to improve performance in the long-term is poor. For these reasons, latency is often a more important metric than IOPS for measuring SSD system performance.


Performance in storage devices can be a difficult thing to accurately compare, since the way they are measured can vary with the application or the environment. In some use cases, moving large amounts of data quickly (high throughput) makes up good performance; for others it’s supporting a high number of relatively short processes (high IOPS). To further confuse the issue, factors like block size, queue depth and degree of parallelism can increase throughput and IOPS in an environment without benefitting overall application performance.

A more consistent performance metric is latency. As a primary factor in both IOPS and throughput calculations, it’s impact on storage performance is fundamental. This means that reducing latency will universally improve performance, making latency the first metric to look at when evaluating storage performance.

Related articles

Success! You're on the list.

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: , , , , , ,
Posted in Article
7 comments on “What is Latency? And How is it Different from IOPS?
  1. We completely agree that latency is a very critical performance metric here at GreenBytes. The IOPS performance metric should always include the latency numbers for a “complete” picture of performance. There’s more I’ve written on this topic here http://bit.ly/1cGZfjR.

  2. Great article George! The only thing I would add is that people need to be wary of bogus latency metrics. When you see a storage system that only reports “Minimum latency” on a spec sheet, ask for the Nominal latency numbers. Maximum latency is also important to know(though realize that in the real world it can occur even less frequently than the minimum number). Also often ignored is how latency numbers vary with queue depth. For multi-tenant applications like VDI or virtual server environments, the nominal latency seen from the individual VMs accessing the same device at once can be dramatically different from a single test that runs against a single device with a queue depth of one. Data management functions can amortize cost at higher queue depths, this can actually reduce the impact of latency in multi-tenant environments.

  3. […] IOPS’ figure with the following formula using average seek time and average latency (See “What is Latency?”) from the individual drive […]

  4. […] as we describe in our article, “What is Latency?”, is potentially more critical than IOPS when trying to improve application response time. The […]

  5. Storage Guy says:

    Finding out average (or Nominal) latency doesn’t do that much for you if you know the IOPS and Outstanding I/O (or Queue Depth if a single Queue). Simply calculate Average Latency = OIO/IOPS. If you want to really understand the behavior you need to look at measures for latency distribution. Like Max Latency, 99.999% of I/O max latency, 99.99%, etc… This will give you a better indicator of the maximum response time. Also probably wise to look at things like the standard deviation of the distribution to determine how consistent the performance will be from one I/O to the next.

  6. ericmb says:

    Ultimately if your businesses’ apps are doing OK, you are OK. Different applications are sensitive in different ways to latency. Hency why latency has been my ‘go to’ metric for performance “issues” over the years rather than bandwidth etc.

    However, with the advent of SSDs the bottleneck will now typically shift away from storage layer to network/bandwith. I have seen 20GB network pipes maxing out where I work and/or ESX hosts maxing out before storage. Finally.

    Couldnt agree more with article.

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 25,542 other subscribers
Blog Stats
%d bloggers like this: