Why Storage vMotion hurts and how to stop the pain

VMware’s Storage vMotion is an invaluable tool to enable the migration of a virtual machine’s (VMs) datastore to another storage system. It is similar in concept to how regular vMotion migrates a VM to another host in a VMware cluster. The motivation to move a VM’s datastore can be to move it to a storage system that can deliver better performance, better reliability or as the VM becomes static, migrate it to a secondary storage system. While a valuable addition to the VMware toolkit, Storage vMotion creates challenges for the storage infrastructure.

Storage vMotion was an excellent solution for managing an environment under the rigid constraints of the traditional storage model of tiered back-end storage pools. The most typical scenario was reacting to VM performance issues on over-utilized VMFS datastores. While this improved that VM’s performance it did so at the cost of wasted capacity per spindle.

How Storage vMotion Works

Storage vMotion has been available since version 3 of VMware, but it was a command line utility mostly used for upgrading to VMFS. In version 3.5 it was given the name Storage vMotion and in version 4 it was given a graphical user interface (GUI).

Copying home directory (config, log, swap, snapshots) data of a VM to a destination storage device is the first step in a vMotion. Second, an initial copy of the VMs disk files are copied to the target storage device. During this copy, VMware uses changed block tracking to keep account of changes being made to the source datastore. Third, Storage vMotion copies the blocks changed while the seeding of the target occurs. The updating of the target with changed blocks is repeated until the number of outstanding blocks is small enough that Storage vMotion can perform a fast suspend and resume of the VM. During the suspend operation, the VM is pointed to the new datastore and then resumed.

Unlike vMotion, the time it takes for this transfer to occur can vary. While that migration is transparent to the users, it may impact the performance of the application. Once this process is complete Storage vMotion cleans up and deletes the old files from the source storage system.

Why is Storage vMotion Invoked?

Unless vSphere’s Storage DRS is being utilized in automatic mode Storage vMotion is almost always a reactive measure to address a pressing storage issue like performance or capacity. A typical use case is to move a virtual machine (vmdk files) to a higher performance datastore to respond to complaints about slow performance. Meeting one VM’s performance needs may require moving some VMs off of the corresponding high performance datastore so that there is enough capacity for the upgraded VM. Finally, Storage vMotion could also be used to archive dormant VMs to high-capacity, low-cost storage.

Why Storage vMotion Hurts

Storage vMotion is a capability that most VMware administrators use with extreme discretion, and they try to perform these movements during maintenance windows. The hesitation over Storage vMotion versus a less impactful migration of a VM between hosts is the nature of the move.

First, Storage vMotion is a physical move of an entire VM’s data set instead of moving memory addresses. The raw size of the step is, of course, larger than the size of the move of a regular vMotion since all the data must be moved instead of just memory pages.

Second, the size of this move requires significant network I/O to transfer the data. It also requires CPU resources. The resources of the host server have to manage the initial copy and change block copies until the destination storage is up to date. The storage CPU has to control the sending or receiving of the VM’s data. Both the network I/O consumption and CPU consumption on both the host and storage impact other VMs.

Third, the I/O profile is a mixture of sequential and random workloads, one of the hardest patterns for a storage system to digest. The first copy of the VM is a rapid sequential move. The update of that VM via changed block tracking is more random. The receiving device has to be able to handle all of it.

Fourth, Storage vMotion is quite obviously VMware centric, meaning that it only works in a vSphere environment. Most data centers are not 100% virtualized, and many are using multiple hypervisors.

Fifth, since the use of Storage vMotion is seldom automated, its use requires administrator time, something that the data center may not be able to afford. It also requires planning and real-time decision making. The administrator has to make a very quick analysis of the situation causing the problem, and then decide which VMs to move and when to do it. The risk of error given this kind of pressure is high.

As Storage Switzerland discussed in its article, “What are VVOLS?”, VMware introduced VVOLS to resolve some of the manual aspects of Storage vMotion as well as provide storage systems with better granularity to the VMs for which they are providing infrastructure. While VVOLS address some of the manual nature of Storage vMotion, it has many of the same limitations mentioned above in terms of network I/O and CPU consumption.

Finally, Storage vMotion almost encourages storage sprawl. Instead of identifying a single storage system that can deliver on all the storage I/O demands, administrators are led to believe that Storage vMotion and VVOLS will allow them to buy storage infrastructure piecemeal.

Stopping The vMotion Pain

It is always easier to manage one component instead of multiple components, and storage is no different. But there is also a legitimate need to store different types of data on different storage media. While all-flash array vendors will claim that their solutions fix that by storing everything on very fast storage, most data centers can’t commit to the cost associated with an all-flash data center.

For most data centers, a consolidated storage solution will be a storage system that has both flash and hard disk drives. Managing the movement of data between these tiers should ideally be done by something like VVOLS or the storage systems own data movement intelligence. Preferably the system will leverage both.

A consolidated storage system should provide a quality of service (QoS) functionality that can either integrate with VVOLS and Storage vMotion or provide the functionality independently for times that it is hosting non-VMware or non-virtualized workloads. As Storage Switzerland discussed in its article “How Storage Vendors Integrate VVOLS”, there are different ways that vendors can integrate with solutions like VVOLS. If used correctly they can leverage VVOLS to enhance and simplify their own QoS capabilities.

It is key for the storage system to move data within the storage system itself instead of across storage systems. As a default the storage system should provide an automated movement of the most active data to the higher performance flash tier. In many storage systems this is accomplished via a physical LUN migration between tiers. An advanced software defined storage system would accomplish this data movement in real time by tagging each and every I/O within the array and adjusting its data placement accordingly. As the I/O utilization of the supported workloads shift the system should make adjustments to data positioning so that the heaviest demand workloads continue to realize the performance they expect.

Conclusion

The result of this style of automated data movement, combined with an administrator driven QoS capability eliminates the need for Storage vMotion and all the pains associated with it. There is no impact on network I/O since all data movement is within the storage system. There is no impact to host CPU resources and there is limited impact to storage CPU resources. The time that the administrator spends tuning storage performance is also reduced. Essentially once performance parameters are set, it becomes the responsibility of the storage system to make sure they are met.

Finally, an implementation of a single storage system with intelligent, QoS driven data movement actually encourages the consolidation of workloads. The more data that can be placed on the system the less work there is for the storage administrator. Moreover, since the consolidated system has both flash and hard disk it has both the performance needed to meet the demands of mission critical workloads and the capacity to meet the overall storage requirements of the enterprise.

Sponsored by NexGen Storage

About NexGen Storage

NexGen Storage offers value-driven hybrid flash arrays that let customers prioritize data and application performance based on business value. Unlike other storage solutions that treat all data the same, NexGen’s policy-based Quality of Service enable customers to avoid the high cost of all-flash arrays and the inconsistent performance of SSD-based hybrid arrays. By connecting customers to the business value of data, NexGen’s solutions deliver the predictable application performance end-users require. For more information, visit www.nexgenstorage.com.

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: , , , , , , , , ,
Posted in Article

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 25.6K other subscribers
Blog Stats
  • 1,938,328 views