In a recent article, “What is I/O and why should you care”, we discussed what input and output (I/O) operations are and how a storage system’s ability to support these I/Os is a finite resource, one that’s arguably as important as its physical capacity. When a storage system is hit with requests for data it can run out this ‘I/O capacity’ long before it’s actually full of data. For this reason, knowing the performance specs of a storage system is critical to maximizing this resource.
Storage performance is comprised of transfer rate and I/Os per Second or “IOPS”. Transfer rates are determined by sequential reads and writes, or how fast data can be transferred from contiguous storage locations on disk. Each operation is measured in MB/s and is mostly associated with larger files and reference data that’s not changing. IOPS are measured as an integer and refer to a maximum number of reads and writes to non-contiguous storage locations. These operations are dominated by seek time, the time it takes for a disk drive to position its read/write heads over the correct location on disk. IOPS are associated with smaller files and more continuous changes, and comprise the workloads most typical in enterprise data applications.
As an illustration, let’s look at two ways a storage system can handle 7.5GB of data. The first is an application that requires reading ten 750MB files, which may take 100 seconds, meaning the transfer rate is 75MB/s and consumes 10 IOPS. The second application requires reading ten thousand 750KB byte files, the same amount of data, but consumes 10,000 IOPS. Given the fact that a typical disk drive provides less than 200 IOPS, the reads from the second application probably won’t get done in the same 100 seconds that the first application did. This is an example of how different ‘workloads’ can require significantly different performance, while using the same capacity of storage.
In addition to the file type and size, the way a software application uses a file determines the workload it generates. The way an application changes files and how often, the way it uses cache versus disk reads and writes and the type of processing it’s doing all impact the workload it creates. For example, a video operation which copies a large file to cache, runs a filter on the entire file and then stores that file, creates a more sequential workload than a database which reads and writes 1KB fields continuously to support an e-commerce application.
The point here is that in order for a storage system to be useful it must provide data fast enough to keep applications from waiting. If the application is processing large image files, for example, the storage system usually must have reasonably good transfer rates. But applications typically deal with smaller files, which requires the more random IOPS. Also, from the storage system’s perspective, as it’s supporting multiple servers and applications, the requirements of these severs (or VMs) are aggregated to produce an even larger, more random IOPS requirement. For most enterprise environments, storage systems will typically start to see a performance drop off (usually random IOPS) before they run out of capacity. This makes understanding the concept of workloads important to managing storage systems.
How to calculate IOPS
On the storage side, an IOPS spec may not be available for the storage system you’re using or for systems you might like to evaluate. If you don’t find this information, or if the data you do find looks a little too good to be true, you can do some calculations on your own. This formula is most often used to size a new array appropriately, and not in place of a ‘bake off’, but it will give you a decent ballpark IOPS figure.
You can calculate a ‘raw IOPS’ figure with the following formula using average seek time and average latency (See “What is Latency?”) from the individual drive specs:
IOPS = 1/(Avg Seek Time + Avg Latency.
Alternatively, a rule of thumb to use is 180 IOPS for a 15K RPM drive, 120 IOPS for a 10K RPM drive, 80 IOPS for a 7500 RPM drive and 40 IOPS for a 5400 RPM drive. To get the raw IOPS for a storage array, simply multiply the individual disk IOPS by the number of spindles.
Apples to apples
When comparing IOPS requirements with existing storage capabilities, or between storage systems, care must be taken to assure systems are of similar configuration, both in terms of hardware and the testing parameters used. It’s also very important to confirm that performance numbers you do get represent reads and writes to/from the disk array and not from cache. Most storage systems will list performance a number of different ways, in terms of read and write operations, or as a combination of reads and writes with an assumed distribution percentage. Obviously, this distribution should be comparable between systems and appropriate for the assumed workloads.
The ability to get data into and out of a storage array is finite for most systems and as important as its physical capacity. If the system can’t provide timely access to any additional data it’s full, for all intents and purposes. This ‘I/O capacity’ is defined in terms of transfer rate and IOPS, with IOPS being the spec associated with the most common storage use cases. Knowing the IOPS that a system can provide and matching that with the demand for IOPS created by application workloads can be instrumental in maximizing performance and utilization of this IT asset.