Conventional Storage QoS falls short on VMware and Virtualized Workloads

Posted on May 4, 2015 by George Crump

An increasing number of storage systems are coming to market with Quality of Service (QoS) functionality that allows an administrator to guarantee and in some cases, limit the amount of storage performance that a VMware or other virtualized workloads will experience. This is an important feature as the number and variety of workloads continues to increase in the data center. QoS should allow mission critical workloads to be intermixed with less critical workloads, without fear of the impact of a noisy neighbor stealing all available storage performance. The problem is that most storage QoS implementations are not tuned for virtualization and can actually make the process of assuring mission critical performance for VMs more complicated than without QoS.

What is Storage QoS?

The goal of any QoS feature is to assure that a given virtual machine or application is guaranteed to have access to a certain set of resources regardless of what else is contending for those resources. QoS is available for CPUs, network bandwidth and storage performance. Storage QoS, typically expressed in guaranteed IOPS (Input/Output Operations per Second), makes sure that a mission critical application always experiences a certain number of IOPS performance.

Storage QoS can be expressed both in minimums and maximums. For example a high priority, mission critical workload may be guaranteed 75k IOPS. That does not mean that it is always consuming or has those IOPS reserved, but it does mean that when it needs 75k IOPS the necessary resources are de-allocated from other systems to make sure that it does get them. Some storage systems can also allow the setting of a maximum QoS. This makes sure that a less critical server never receives more than a given number of IOPS. For example, a Domain Controller may be set to never be allocated more than 500 IOPS. Cloud providers most commonly use this maximum setting option to make sure that someone who purchased “bronze” service never experiences “platinum” service.

QoS Enables Safe Shared Storage

The benefits of enterprise shared storage, the drive to fewer and potentially one storage system, are numerous. The most obvious gain is simplification of storage and data protection administration. It is simply easier to manage and protect one of something instead of dozens. There is also a very real cost savings to consolidation, especially in the modern data center. A consolidated storage system can deliver better returns on flash investments and derive better efficiency from features like deduplication and compression. Shared storage has also enabled some of the most powerful features of virtualization like workload migration, thin provisioning, cloning, and more. The Achilles heel of shared storage is making sure that each workload gets the resources necessary for the performance level it demands. Storage QoS is the storage system’s answer to this problem. For traditional bare metal application deployments, where one LUN or volume hosts a single workload, storage QoS tends to work well. But in the virtualized environment traditional storage QoS implementations fall short.

The Virtualization vs. Storage QoS challenge

The single biggest challenge to effectively using storage QoS for virtualized workloads is a lack of granularity. Most storage QoS implementations set their minimum and maximum throttles at the LUN (Logical Unit Number) or volume level. VMware and other virtualization platforms typically will have dozens, if not hundreds, of VMs (Virtual Machine) per LUN or volume.

The second challenge is using IOPS as a measurement for QoS. While a part of the storage performance vocabulary, IOPS can mean different things to different operating systems. Much of this is dependent on the block size of the file system being used. Flash vendor’s IOPS scores are the best example of what a problem this is; the same vendor can report dramatically different IOPS dependent on the block sized used.

The third challenge is monitoring storage performance resource consumption on a per virtual machine (VM) level. New applications are being added all the time, and those applications will impact the available pool of storage performance resources. The storage and/or virtualization administrators need a way to measure current resource utilization to both make adjustments to existing VMs, and to make room for newly added ones. At the same time this storage tuning can’t be a task that requires an all-day effort. This monitoring needs to then be presented in a visual fashion that presents the key performance parameters per VM or per physical host.

What to Look for in Storage QoS for Virtualized Workloads

The number one requirement for effective storage QoS for virtualized workloads is a granular understanding of each VM. VM-level QoS would allow the storage or virtualization administrator to set individualized QoS parameters via IOPS for each specific application. The QoS function would leverage flash resources intelligently, but still assure that mission critical applications get the performance they require.

The second requirement for storage QoS is to make sure that each VM is expressing IOPS the same way since the environment can support file systems with many different block sizes. IOPS, despite its flaws, is a common way to communicate performance requirements. But the IOPS expression needs to be normalized across different block sizes. To accomplish this the storage system should be reduced to a common denominator or block size. This will require that the storage system have the ability to “see” into the VMs and understand each block size setting and then make the calculations needed to normalize the IOPS utilization prior to applying QoS settings.

Finally, and potentially most important, is visibility into how storage resources are being utilized at the storage system level and the VM level. Again, performance is more than just the number of IOPS used. Other factors like latency, throughput, network utilization, and physical server CPU utilization all impact it. The storage QoS solution should have the ability to report on how each of these resources is being utilized in a visual fashion. In addition the system should have the ability to provide immediate and again visual feedback on any changes that are made to those settings.

In the end visualization of QoS settings becomes the critical ingredient in the success of deploying QoS across a high number of VMs. Without visualization the system administrator is left with a seat of the pants, trial an error approach to optimizing application performance. In other words without a visual representation of the impact of QoS settings as well as real-time feedback on changes made QoS can send administrators down a knob twisting rat hole that ends up consuming more time than before.

Conclusion

The goal of QoS is to deliver performance SLAs to individual apps as needed and that they don’t have to contend with ’noisy neighbors’. A good QoS product should also be simple to use and leave out the guesswork of how and what to apply the settings. Conventional storage lacks the fine grain QoS control and is incapable of delivering performance guarantees to applications with different service requirements. For QoS evaluations, buyers should look for solutions with the combination of VM-level granularity and visual guidance.

Sponsored by Tintri

Tintri builds smart storage systems that see, learn, and adapt – enabling IT to focus on virtualized applications instead of managing storage infrastructure. Applications drive business and infrastructure to support those applications. Tintri believes it is more important for the IT teams to focus on performance, QoS, speed to deployment, and scalability of apps – instead of managing the storage infrastructure.

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Bandwidth, Compression, Data Protection, Deduplication, IOPS, LUN, performance, QOS, Shared Storage, Tintri, Virtualization, VM
Posted in Article

3 comments on “Conventional Storage QoS falls short on VMware and Virtualized Workloads”

Chris McCall says:

May 5, 2015 at 10:04 pm

Disclosure, I work for NexGen Storage.

Great post, it’s absolutely spot on. Conventional (read LUN-based QoS) does have its challenges, which is why VVOLs is such a valuable feature for customers. Not only do you get VM level storage workload QoS, you get more granularity like assigning database logs and tables different levels of storage workload performance – something that was impossible before VVOLs and helps keep infrastructure costs in check.

Long term, a more important issue may be how VM level QoS is integrated as there are two competing approaches. 1) Proprietary NFS implementations, 2) VMware supported VVOL implementations. A proprietary approach gives customers a solution sooner, but waiting for a VMware supported API means longer term compatibility. It will be interesting to see how this plays out.
- Scott Ryden (@rydenbrook) says:
  
  May 21, 2015 at 10:20 am
  
  Great dialog. How about a solution that includes both approaches (as Chris describes) as options?
QoS and TGC Storage Groups | Tintri API and UI Ramblings says:

May 7, 2015 at 11:29 am

[…] George Crump’s post, “Conventional Storage QoS short on VMware and Virtualized Workloads“ […]

Comments are closed.