After a rapid move from test to production, virtualization of existing servers in many companies seems to slow down. While it is true that most data centers have adopted a virtualize first philosophy, getting those older, mission critical workloads virtualized seems to be a thorny issue. These applications are often at the heart of an organization’s revenue or customer interaction and tend to be unpredictable in the resources they require. This is especially true when it comes to storage and networking.
The Advantages of 100% Virtualization
Data centers that have achieved 100% virtualization are benefiting from the reduced costs and increased flexibility that an abstracted environment delivers. New applications can be instantiated rapidly and workloads can be effectively balanced. One of the big pay-offs of 100% virtualization is the enablement of an always-on data center. The ability to move workloads between servers and even data centers can significantly increase availability. The data protection hooks that virtualization provides allow backup applications to meet stricter Recovery Point and Recovery Time Objectives (RPO/RTO).
The Phases of Virtualization
After an initial test phase, most virtualization projects follow a familiar pattern. First, servers that are not business- or mission-critical are virtualized. This step is uneventful because there is typically plenty of processing power in the virtual cluster and these initial workloads simply don’t move the needle. Then as new applications or workloads are added to the environment they are virtualized (virtualize first). This step also works well as users of these applications have never had a bare metal experience with which to compare performance.
These first two steps are non-linear, often happening in parallel with each other, and are on-going. The process of server consolidation of non-critical workloads seems to never stop and, of course, new applications are being added all the time. It’s the third step, virtualizing existing critical workloads, where the effort seems to slow down or even stop all together.
The Problem with Virtualizing Critical Workloads
The problem with virtualizing critical workloads (business or mission) is that these environments are often already in-place and on bare metal systems. There are a finite number of variables that an administrator needs to monitor to make sure that the workload delivers the performance that users expect and demand. Also consistency is essentially assured since there is little else interacting with the bare metal system.
This is of course the inverse of the situation in the virtualized environment where each host server supports potentially dozens of virtual machines (VM). Of course, all of these VMs share CPU, RAM, a limited number of network and storage network adapters, as well as a storage system. This shared everything infrastructure makes guaranteeing performance to critical applications and troubleshooting any performance issues that may arise almost impossible.
Bare metal applications are not the sole concern however, the challenge with virtualize first efforts is that some of these workloads will also evolve into business or mission critical environments. These increasingly critical workloads will also need to meet user expectations of performance and reliability.
The Requirements for 100% Virtualization
To achieve 100% virtualization data centers must be able to virtualize these mission critical applications while continually marching forward with consolidation and virtualize-first efforts. But that strategy will require solving issues related to the shared-everything architecture. For the most part there is enough CPU power available to drive these applications, and most current generation servers can accommodate plenty of memory. This leaves the storage architecture, both the storage network and the storage system, as the primary roadblock to 100% virtualization.
Requirement 1 – Storage Quality of Service
The first requirement to achieve 100% virtualization is making sure the storage infrastructure can guarantee the performance that each application needs. Storage Quality of Service (QoS) is an increasingly common feature to help with this guarantee. But virtualization adds another wrinkle to the QoS requirement, VM granularity. The storage system needs to not only provide QoS, typically set on a per volume basis, but it also needs to be able to set and modify QoS per VM. Per VM QoS can be done by storage systems that host VMs on a file-system like NFS or fully support initiatives like VMware’s VVOLS.
Some vendors claim QoS because of their ability to segregate workloads to different storage types or to modify cache allocation. Storage QoS should also be able to limit actual usage of IOPS and bandwidth, again per VM. And it should be able to ensure that enough IOPS and bandwidth are always reserved to specific critical workloads.
Requirement 2 – High Speed Network
Storage QoS assumes that everything between the storage and the VM is not part of the performance problem. There are three key elements involved in that connection: the storage network card, the storage switch and the storage adapter on the storage system. This network should be upgraded to higher speed components, at least 10GB Ethernet and/or 16Gbps FC. In addition, modern networking technologies have their own QoS capabilities that could be used in combination with the above storage QoS. At this point however, it’s a manual process. There is limited integration between these various QoS settings. They have to be set and monitored individually. For now, upgrading to either 10GbE or 16Gbps FC removes the concern of network latency in many environments.
Requirement 3 – VM Level Analytics
A final requirement is per-VM analytics. While QoS can help set minimum and maximum performance thresholds, insight is needed into the VM to see how often it is bumping into a limit set by QoS. It may be that a given application needs more performance than what that limit will allow. This need can be caused by an increase in the performance demand of that particular application or an increase in performance demand of surrounding applications. Alternatively, the QoS setting may be reserving too much IOPS for a given application and those IOPS could be better used on another VM. Analytics allows the storage administrator to treat IOPS as an inventory item to be allocated and reserved as needed. In both cases VM specific and virtualization cluster wide level of performance reporting is needed.
These analytics should also advise what the impact of adding more flash to the environment will be. If adding more flash would reduce the occurrence of over-threshold conditions then the IT Manager may decide that this is a wise investment. The analytics allows the administrator to purchase just the amount of flash they actually need instead of making a best guess on the most expensive and important tier of storage
Finally, these analytics need to both report in real-time what the consumption of storage resources are by the virtual cluster, and provide a DVR-like playback of performance consumption over the past weeks or months. This allows an IT professional to troubleshoot a performance problem that was late in being reported. These analytics need to do more than just report the problem, they need to suggest recommended actions based on what the analytics are indicating.
100% virtualization is key to realizing the full ROI of the virtualization project, but also lays the groundwork for initiatives like the always-on data center. These initiatives count on the VM mobility and advanced data protection techniques that virtualization delivers. But to accomplish 100% virtualization requires the investment in a storage system and infrastructure that can guarantee critical applications the performance they require. The owners of these applications will not accept a step backwards in performance, consistency or availability just to help IT meet a 100% virtualization goal.
The good news is the ability to deliver 100% virtualization is here now. The combination of storage QoS, advanced storage networking and VM level analytics should allow administrators to virtualize critical workloads with full confidence that performance will be the same if not better than in its previous bare metal state and do so without breaking IT’s budget.