VMware continues to be at the heart of many data center infrastructures and will continue to be that heart for years, if not decades to come. Many of these infrastructures are still struggling however with the most basic of data management and data protection functions. All-Flash arrays may have alleviated some of the infamous IO blender issue but there are still many more storage challenges to tackle. Two key challenges and areas for innovation are gaining insight into the storage IO demands and behaviors of each specific virtual machine as well as the need to better predict and plan for scale.
From Visibility to Insight
Most storage systems that support VMware environments are block-based, which by default provide no visibility into the specific virtual machine (VM) IO activities. In 2015, VMware delivered a feature called VVOLS that provided increased VM visibility but with limited ability to respond rapidly to specific IO conditions. VVOLS is still volume based and requires the creation and management of volumes. An alternative is to use a file-system based storage architecture, which, because VMs are essentially files, provides visibility into each VM’s IO profile.
Visibility into each VM’s IO profile is an improvement over block-based storage, but to take full advantage of this granular view of VM storage, it requires more than just loading VMs on an NFS volume. The storage system needs to have intelligent software built-in that performs a continuous analysis of each VM’s IO pattern, storage capacity consumption rate and provides predictive forecasting / modeling of future use. Armed with this insight, IT can easily respond to complaints about storage performance and either take corrective action or prove that storage is not the source of the bottleneck.
From Insight to Learning
Insight into the IO characteristics of a specific VM enables IT to more quickly and precisely intervene when problems arise. Intelligent infrastructure learns from that analytics captured, allowing the system to take corrective action on its own. Based on the analysis, it should be able to take corrective action either to mitigate outages or to meet changing performance demands. Applying machine learning to the data the storage system already collects enables organizations to avoid spending all day manually monitoring and managing storage.
Dealing with Scale
One reality that almost every VMware administrator and the infrastructure admins that support them must deal with is scale. Either the current storage system will run out of storage capacity or it won’t be able to keep up with storage IO demands. Scaling typically means adding another storage system and migrating workloads to the new system so that the old one can be retired. Several vendors have brought out scale-out storage solutions or scale-out hyperconverged solutions to address the scaling problem but these environments tend to start too large, don’t scale granularly enough and put extra pressure on the storage network. A more intelligent approach is a system that can scale up by adding additional storage capacity and then scale-out by adding additional storage systems. The second storage system can start small and have capacity added to it as the need demands.
The typical problem with adding multiple storage systems is managing them and figuring out which VMs IT needs to migrate to the new system. Storage systems need innovation so that IT can forecast, by using the methods described above, when the need for a new storage system will occur. There is also a need for innovation in automating which and how VMs move to the new storage system since in most cases, the current system still has years of reliable service left. Trying perform a migration, at-scale, between systems that are not VM aware is much more difficult, time consuming and more than likely will impact production applications.
Armed with an intelligent scale-out capability IT can buy a new storage system that performs much better but initially has significantly less capacity then the current system. The storage software can then leverage the analytics information to move the most viable candidates automatically, to the new system. This automate process frees up capacity on the current system while improving the performance of VMs that need it.
Conclusion
In order to enable IT professionals to focus on tasks that more directly and positively impact the organization, they need technology that manages itself. The storage system and infrastructure are an excellent starting point. With proper intelligence a VM aware system can deliver valuable telemetry data that IT can use to manage storage better. The endgame though, is to have the storage system teach itself from this telemetry data and automatically take corrective measures, freeing IT to work on higher level tasks.