QoS (Quality of Service) is a topic that’s becoming more common in storage discussions, partly due to the rise of multi-tenant environments like public clouds. QoS puts management controls on storage resources, especially processing power, so they can maintain performance for the most critical applications. But outside of these use cases, QoS is often thought of as a ‘nice to have’ feature, not a ‘need to have’ function for many companies. My question is “Why?”.
Today, with server virtualization extending into all areas of IT and VM densities increasing across the board, even a modest data center infrastructure can look like a multi-tenant environment. With hosts competing for resources effective storage management is becoming harder, resulting in less efficiency and more administrative overhead. The question for many companies is “Why would you buy a storage system without QoS?”
Consolidation and Sharing
Starting a couple decades ago storage consolidation was the order of the day at most companies as data centers embraced storage networking and moved away from DAS. Simplified management and better capacity utilization were among the drivers for this movement. For the past decade it’s been server consolidation, as these same companies are now virtualizing physical servers and packing more VMs onto fewer hosts.
Consolidation means sharing resources, which means managing resources so that each server or application gets enough to operate as they did before they were added to a shared environment. Storage allocation simplified capacity management, but what about processing power?
Historically, shared storage use cases ran out of capacity before they did controller bandwidth, but as storage systems scaled and attached more host servers, controller overhead and disk latency became an issue. Server virtualization brought this problem to a head by combining many more server instances into a single piece of hardware and multiplying the I/O impact of each host on storage systems supporting those virtualized clusters.
This has led to VMware being, to a large extent, the cause of storage ‘de-consolidation’. In the modern data center it’s not uncommon to see two or three different storage systems supporting the virtual server infrastructure, one for virtual desktops and others supporting traditional bare-metal applications like Oracle and MS-SQL. Ironically, the lack of QoS is the primary reason for the use of multiple systems and why administrators don’t feel confident virtualizing those mission critical applications. It is for this reason that VMware has been so aggressive in rolling out VVols.
Sharing requires Rules
Sharing is a cooperative exercise, just ask your kids. If all parties work together everything’s fine. If one doesn’t cooperate or follow the rules the whole thing falls apart. The problem with traditional shared storage is there aren’t enough rules, so admins have to step in and manage the situation – like parents do with kids who can’t share. The result is high administrative overhead or poor resource management.
You wouldn’t dream of giving a group of 2-year olds a plate of cookies without first setting up some rules for sharing. Why would we assume we can effectively share storage resources among a group of servers or VMs without first setting up some rules?
Servers and VMs need Rules
At the most basic level many storage systems have the ability to throttle resources, like limiting each child to one cookie. But what if in addition to cookies, we were feeding the same children lunch? Limiting how much of each food they could have would keep a child from eating ten cookies but it won’t get them to drink their milk.
QoS seeks to establish rules so that servers ‘play nice’ and sharing works. But this requires more than simply limiting specific resources, it also means setting minimums so that each host gets a baseline amount of the things they need. This is especially true for virtualized environments, as George Crump discusses in the article “Conventional Storage QoS falls short on VMware and Virtualized Workloads”.
When Not to use QoS
For many environments, especially those with highly dense virtual server infrastructures, QoS is an essential function, but not for every environment. Some companies buy an all-flash array precisely so that it will provide more performance than they will ever need, given their inventory of attached servers and hosts. For them, eliminating performance tuning altogether is part of the appeal of these systems. Similarly, environments that don’t support performance-centric applications, or just aren’t pushing their storage resource limits, may not have much need for QoS either.
Storage QoS also assumes that storage is the reason for the performance problem. If there is a network or CPU problem, storage QoS can do little to resolve a performance issue. It should be used with equal parts of networking and CPU QoS or else care should be taken to eliminate all performance bottlenecks in these layers.
QoS technologies are more sophisticated than this example of kids and cookies. With hybrid flash and all-flash storage systems there are many ways to implement QoS, but the concept is the same. Resource management is a fundamental requirement for the data center today, as most are looking at greater levels of server virtualization. Last year VMware CEO Pat Gelsingler stated “maybe we should have come out with VVols before we did VSAN” indicating that he understood that maintaining consistent storage performance is a key to continued VMware success. With the mandate that IT organizations have to assure application performance and pull more value out of storage resources why indeed would anyone buy a storage system without considering quality of service.