Despite the economy, server virtualization rollouts continue unabated and in fact because of the economy there is increased pressure to optimize further the virtual infrastructure by increasing virtual machine density. This increased density puts additional pressure on an already strained storage infrastructure. IT professionals should consider leveraging a fibre channel SAN and using NPIV to optimize server virtualization’s storage.
What is the challenge? Server virtualization: The big I/O blender
All virtual machines running on a physical server share the same physical I/O connections. In that context, the hypervisor can be seen as a giant “I/O blender” mashing up all disk I/Os together before sending them over the SAN creating potential bandwidth contention problems and quality of service issues for applications running in individual virtual machines. Also, in this context the current set of tools used by storage administrator to monitor, troubleshoot and secure the SAN looses application level visibility since I/Os initiate from the same physical HBA.
In a non-virtual environment a typical SAN practice when assigning a storage LUN (Logical Unit Number) to a server is to create a zone. That zone allowed only one particular server to access that LUN. This was triggered by assigning the worldwide name (WWN) of the SAN host bus adapter (HBA) to that LUN. Since each HBA had its own unique identifier or WWN this allowed for secure access to that LUN as well as allowing customizable quality of service (QoS) and enabled chargeback software to utilize the WWN to capture server usage.
This best practice is initially broken by server virtualization. As stated earlier each zone is assigned to a WWN but the problem is each virtualization host may support 20 or 30 virtual machines. Each virtual machine shares access to the server host’s HBA and as a result has the same WWN identification to the LUN. Without a mechanism to identify the individual virtual machines to the SAN there is no way to track their use of SAN resources or to make sure they don’t conflict with those SAN resources.
Another challenge that server virtualization brings to SAN storage is created because of its very popular live migration capability; the ability with a SAN in place to move a virtual machine from one virtualization host to another. The problem is that the SAN manager has to remember to include the second host’s WWN in the zoning scheme, otherwise when migrated to the second host the virtual machine can’t see its storage because the SAN will block access to it from the HBA with an unauthorized WWN.
This becomes unwieldy in a larger multi-host environment. In this environment it is ideal for the virtual machine to be migrated to any available physical host. That means that the storage administrator has to open up the zone to each physical host, at which point the original value of the zone is lost.
Even if the storage administrator worked with the server virtualization administrator to plan what hosts would be the viable targets for virtual machine migration, that planning is rendered almost useless when products like VMware’s Distributed Resource Manager (DRM) are implemented. With these types of solutions live machine migration is automated based on resource utilization and availability, so there is no way to plan for where the VM will migrate.
NPIV also known as N_Port ID Virtualization, is a capability that is unique to fibre channel SANs, and is an extension to the existing fibre channel standard which restores the best practice of SAN zoning to the virtual environment by allowing the creation of a virtual WWN per virtual machine.
With NPIV in place you can create a zone in a SAN that only one virtual machine can access, thus restoring that security between the applications even if they are both running on the same virtual machine.
NPIV really pays off in a virtual environment because the virtual WWN follows the virtual machine. This means if you migrate the virtual machine from one host to another, there are no special requirements to make sure the target host has the correct access to the LUN. The virtual machine has that access and as a result the host inherits the ability to access it.
This greatly simplifies storage provisioning and zoning in a virtual environment by allowing the storage admin to interact with the lowest level of granularity in storage access. Once in place the storage admin can monitor SAN utilization statistics to track how each virtual machine is using SAN resources. With this level of detail the SAN administrator is better able to balance utilization.
For example using NPIV in conjunction with a capability like Brocade’s Top Talkers service, a SAN admin can easily track which virtual machines are consuming the most resources and then make sure that all of those bandwidth consumers are not coming from the same virtual host nor pointing at the same disk array. In effect they could distribute the load properly across virtual hosts, physical switches and storage arrays.
What is required for NPIV?
To enable NPIV in the environment requires several components, the first of which is a NPIV aware fabric. The switches in the SAN must all support NPIV and again using Brocade as an example, all Brocade FC switches running Fabric OS (FOS) 5.1.0 or later support NPIV.
In addition the HBA’s must support NPIV as well and they need to expose an API for the VM monitor to create and manage the virtual fabric ports; it is relatively common for HBA’s to support this today.
Finally the virtualization software itself must support NPIV and be able to manage the relationship between the virtual NPIV ports and the virtual machines. Most virtualization software also requires the use of a specific type of disk mapping; VMware calls this Raw Disk Mapping (RDM).
In the VMware case, by default when a virtual machine is created it is mapped to a virtual disk in a Virtual Machine File System (VMFS). When the operating system inside the virtual machine issues disk access commands to the virtual disk, the virtualization hypervisor translates this to a VMFS file operation. RDMs are an alternative to VMFS. They are special files within a VMFS volume that act as a proxy for a raw device.
RDM gives some of the advantages of a virtual disk in the VMFS file system while keeping some advantages of direct access to physical devices. In addition to being used in a virtual environment with NPIV, RDM might be required if you use server clustering, or for better SAN snapshot control or some other layered application in the virtual machine. RDMs better enable systems to use the hardware features inherent to SAN arrays and the SAN fabric, NPIV being an example.
NPIV is completely transparent to disk arrays, so the storage systems themselves require no special support.
NPIV and QoS
Where NPIV becomes truly valuable is when it is used in conjunction with storage QoS capabilities like those that Brocade provides in an end-to-end configuration and as we detail in our article on using QoS in a Virtual Environment. NPIV support in VMware ESX Server 3.5 enables extending the benefits of Brocade Adaptive Networking services to each individual VM rather than the physical server running the VM.
Based on the benefits NPIV can bring into an enterprise in the virtual world, NPIV and QoS are critical components in extending the ROI of server virtualization. Using NPIV to optimize server virtualization’s storage provides an admin with yet another layer of control. This allows the system administrators to more completely understand what and how they are delivering to their customers from a virtualization perspective and provides the much needed layers of metrics that possibly don’t exist in the I/O layer of the overall virtualized storage environment today.
While NFS and iSCSI have gained the attention of those selecting a storage protocol strategy for server virtualization, fibre channel should continue to be a top consideration. The choice of fibre channel storage for server virtualization will yield uncompromised I/O performance, supporting the highest server consolidation ratios. The use of NPIV gives storage administrators VM level I/O visibility for application level I/O performance tuning, troubleshooting and accounting.