Designing Highly Scalable Storage for Dense Virtual Machine Environments

Posted on October 16, 2014 by George Crump

We will discuss the testing methodology and scalability results for 4 & 8 node clustered Hitachi NAS Platform 4000 systems yielding 15,000 VMs in real-world enterprise environments. The test establishes linear performance scalability as nodes are added to the cluster, providing a solid platform for VMware vSphere-based environments.

Highly scalable virtual machine architectures are typically limited to fibre channel storage networks and systems. The problem is that these technologies can get expensive as they scale, both in size and virtual machine density. An NFS-based storage network would provide more cost effective growth and connectivity, but can NFS provide the performance needed to support the above environment?

That question was the subject of a recent test conducted by Hitachi Data Systems and audited by Storage Switzerland. The objective was to see if an NFS-based storage system could deliver the same or better VM density than a more traditional fibre channel storage system. The test had as its back-end the Hitachi NAS Platform (HNAS) 4100 based on the Hitachi Unified Storage (HUS), which is able to scale up and scale out as the compute layer requires.

An Hitachi Storage platform with HNAS is able to present both block and file volumes to connecting hosts. The system is built using Field Programmable Gate Arrays (FPGAs) with bandwidth dedicated to each function, giving HNAS one of its major benefits – the ability to scale. An HNAS cluster has a modular architecture that allows more nodes (up to 8) to be added to distribute the workload as performance and capacity demands increase. This lets customers start with a smaller HNAS system and then add nodes as their environment grows. With Hitachi NAS Virtual Infrastructure Integrator, HNAS has tight integration with VMware vSphere for ease of VM data protection, cloning, management and best practice deployment.

In addition to scalability and performance, the HNAS architecture also provides excellent data protection via various RAID technologies and high performance system- and file-level snapshots. The writable snapshots, also known as clones, can provide a high level of data and operational efficiency since snapshots track just file- or VMDK-level changes and can be made writeable. This capability allows for the “cloning” of master virtual machine images and can significantly reduce capacity requirements without increasing costs.

Unlike other scale-out storage systems, nodes in the HNAS cluster can scale up as well, allowing each to reach its full potential and eliminating the “node sprawl” that affects other scale-out architectures. Node sprawl comes from adding too many nodes too quickly in order to meet performance or capacity demands, wasting IT budget and data center floor space.

To demonstrate both the scale-up and scale-out aspects of HNAS technology, the benchmark was run against both 4-node and 8-node configurations. Based on these configurations, the data shows that HNAS cluster could scale to support 15,000 VMs @ 25 operations per second (OPS/SEC) per VM.

Test Value

One of the biggest challenges facing any benchmark test, especially one that is focused on high performance, is trying to ground that test in reality. Millions of operations per second are impressive, but isn’t something most data centers ever need. But while big performance numbers were a focus of the HDS test, we were able to derive two other important numbers as well.

First was the number of virtual machines and physical hosts that HNAS 4100 could reasonably support. The second was the number nodes, specifically, how HNAS would maintain performance (OPS/SEC and latency) as more nodes were added to the system. In other words, could we continue to add VMs at the same rate as we added nodes or did we hit a point at which a new storage cluster would have to be created?

What this Test Means to IT Managers

The value of this test to the traditional data center manager is twofold. Even though most would likely start with a much smaller configuration, these results allow IT planners to understand how far the storage system could scale in their environments and what level of VM density would be possible. Second, they show that an NFS-based storage system, when designed correctly, can provide a legitimate storage platform for a dense virtualized server infrastructure. In this report, we will correlate the raw data to the potential impact on the traditional data center so IT managers may understand how it applies to them.

Configuration of Equipment

A key component of any benchmark test is the tool that is used to generate the load. In this case Hitachi chose Vdbench, a software-based testing tool that can generate both file and block data sets. It provides very fine-grained control over a number of factors, including: I/O rate, LUN or file sizes, transfer sizes, thread count, volume count, volume skew, read/write ratios and random or sequential workloads.

Vdbench has two parameters that are manipulated. The first, File System Anchor, controls the number of directories created, the number of sub-directories created within each directory (depth) and the number of files per directory. Up to 32 million files can be created per anchor. A typical VMware environment will have datastores and associated VMs as sub directories while the VMware files (such as VMDKs) reside in those associated VM directories.

The second parameter, File System Operation, is used to manipulate the directories and files themselves with commands like directory create/delete, file create/delete, file read/write, file open/close, setattr and getattr.

For the Hitachi Data Systems test, operations like file create, read and write were used. While the test did not actually create a database or a series of virtual machines registered to VMware vCenter, the file creation and random workload that was generated using the above parameters did simulate them against typically deployed VMDK-based files. The I/O pattern and its behavior generated by Vdbench was very similar to the typical I/O pattern used to size VMware workloads.

In general Storage Switzerland has observed that the typical I/O profile in large private clouds is highly parallelized where the modest workloads (25 OPS/SEC on average) of many virtual machines get aggregated into a very high total performance demand. This type of parallelism is ideal for scale-out architectures like HNAS because more of the distributed controller architecture can be utilized.

In addition, while these workloads tend to be read-heavy, they are not overwhelmingly so. A test that is 90% to 100% reads would have little value because the amount of data being written is measurable. Virtualized environments are not pure read environments and the data being read and written tend to be small, highly randomized blocks. Specifically for VMware workload types most of the I/O operations are disk-intensive (create, write, read, delete etc.) and the amount of protocol metadata operations (setattr, lookup, access) are comparatively less.

Any test that tries to simulate a virtualized environment would need to have these characteristics. This is something the HDS test that Storage Switzerland audited does have.

From a hardware perspective the test solution consisted of an HNAS configured with eight 4100 nodes, an Hitachi Virtual Storage Platform G1000 (VSP G1000), an Hitachi Apresia 15000-64XL-PSR Ethernet switch, two Brocade VDX6730-24 10GbE switches, up to 46 Sun Fire x2200 servers and a Brocade 5320 80-port FC switch. The Sun Fire x2200 servers were used as load generators, each running Vdbench accessing the HNAS 4100 heads via the Hitachi Apresia 15000-64XL-PSR 10GbE Ethernet switch, all in a 10GbE environment. The Brocade VDX6730-24 switches were used for cluster interconnections between the 8 HNAS 4100 nodes.

Testing Methodology

The initial step was to configure four parity groups of eight drives each using RAID-5 (7 data + 1 parity) on VSP G1000 for a total of 32 drives. A parity group is essentially a group of drives with some form of RAID protection assigned, what other vendors might call this a “RAID group”. All the LUNS (or in the Hitachi Data Systems vernacular, “LDEVs”) created from the parity group will have the same data protection type applied to them and share the same drives. But each LDEV can be a different size.

The drives were flash module drives (FMDs), a new flash storage device built specifically for the most demanding enterprise-class workloads. The FMD features a custom-designed, rack-optimized form factor and innovative flash memory controller technology from Hitachi, Ltd. These features let the module achieve higher performance, lower cost per bit and greater capacity, compared to conventional drive form-factor solid-state drives (SSDs) on the market today.

The next step was to create sixteen 716GB LDEVs from each parity group. As previously mentioned, “LDEV” is a term to describe a piece of logical RAID storage carved out of an Hitachi disk array. When an LDEV is presented at a port on the array, the hosts see it as a LUN. For the purposes of this document the reader can consider an LDEV to be equal to a LUN.

Each LDEV was assigned to two host ports for multi-pathing and each host had a multi-path driver enabled. The System Drives were auto-assigned to System Drive Groups (SDGs). As stated above, to show the linear scalability of the solution, the tests were done in two configurations, 4-Node and 8-Node.

During the 4-Node tests, only 16 FMDs were used, but the 8-Node tests used all 32 FMDs. For the 8-Node cluster a total of sixteen storage pools were created (using 4 LDEVs each) over the 32 VSP G1000 paths. One file system was created per storage pool with one NFS export coming from each file system. This made a total of 16 file systems and 16 NFS exports from the HNAS 4100 8-Node cluster, which is analogous to 16 data stores in a VMware environment. Utility scripts were then created to drive the individual workloads on the Sun X2200 servers (38 servers) against the HNAS file systems.

During the 4-Node tests, at least 380 Vdbench threads (each working on a 20GB file) were used, resulting in a 7600GB dataset across the 4 nodes. Similarly, during the 8-Node tests, at least 760 Vdbench threads (each working on a 20GB file) were used, creating a 15200GB dataset across the 8 nodes. This is analogous to 760 VMs running over 525 OPS/SEC per VM or 15,000 VMs running @ 25 OPS/SEC each. (Please see table below)

Between each individual workload test run, the NFS exports were unmounted in all the clients and the HNAS file systems were remounted. This ensured that any potential cache pages or cache slots were cleared between each test run. Also, the NFS clients used the Vdbench flag O_DIRECT to ensure that the client memory was not engaged during any of these tests and thus the true HNAS performance was measured.

The HNAS 4100 cluster tuning parameters summary as well as HNAS 4100 cluster configuration used for the sequential and random tests are shown in tables below.

The best practices for Hitachi NAS Platform for NFS with VMware vSphere can be found below:

Hitachi NAS Platform Best Practices Guide for NFS with VMware vSphere

Results of Benchmark – Overview

In the HDS tests, each thread simulates a VM, which means the 760 threads used represent 760 VMs. For each thread HDS achieved 525 ops/sec – for a total of almost 400,000 (760 * 525) OPS/SEC. If we take this total and divide by a typical 25 OPS/SEC per VM demand, the HNAS 4100-based platform could support 16,000 VMs, assuming no overhead.

Of course in the real world every VM does not happily run along at 25 OPS/SEC. Some virtual servers could need consistently more while still others would never need anything close (3-5 OPS/SEC is frequently seen). The most common scenario, of course, is for the VM to need less than 25 OPS/SEC most of the time then suddenly need significantly more as a particular process kicks off or multiple requests are made. And, in a virtual environment, those spikes will occur randomly from multiple VMs, at the same time.

The same holds true for a virtual desktop environment. While most medium/heavy user virtual desktops are sized for around 25 OPS/SEC there are occasions where certain desktops will need more. But unfortunately it’s difficult to create a benchmark that accounts for these situations. IT planners still need to take this raw data and apply it to their data centers.

For example, if there is a database in the environment that needs 50,000 OPS/SEC on a relatively consistent basis these supported VM numbers should be reduced. This is not an unique problem for HDS but the reality of any system. What is unique is that HDS has the ability to scale the environment so if an unexpected performance demand becomes a regular demand it can be addressed without adding a whole new storage system.

The key to managing through the realities of storage performance peaks and valleys is for the storage system to have enough raw performance to support these spikes. IT planners should also leverage VMware’s Quality of Service capabilities per VM and the QoS features in the storage network. But for these QoS functions to work properly the raw performance of the storage system still has to be there, and as these benchmarks will show, HNAS 4100 has that raw performance.

Detailed Results

Based on the 4-Node and 8-Note tests, the data showed that HNAS could support 15,000 VMs @ 25 OPS/SEC per VM. By progressively increasing the node count (from 1 to 8) larger configurations and linear VM growth could be supported.

Fig 1:

8 Node HNAS All Flash VSP G1000-HNAS results;

Using Conservative 50% read; 401,018 @ (25 Ops/sec+7% overhead) = 15,000VMs

Additional Capacity

Another observation was that the back-end SAN, by virtue of the I/O profile from HNAS, had the performance capacity to take on additional workloads, over and above the 15,000 VM load on NFS. An example where this could be valuable is in a unified environment where both NFS and FC-based data stores are being used to meet various SLA and cost targets.

About VSP G1000 load: It had dual DKCs, 4 VSD pairs and a 128GB CLPR. The MPs were about 50% busy during the 50/50 test and about 75% busy during the 100% read test

The Correlation to Enterprise IT

Performance benchmark testing, like the kind performed by HDS, often leads to impressive results, but making those results useful to traditional enterprise IT can be difficult. After all, most enterprise data centers don’t need to support 15,000 virtual machines, but they do face challenges. These benchmark results indicate that HNAS 4100 can be an ideal solution for many of these challenges.

The first step in correlating benchmark testing with real-world IT is to determine if the test has an I/O profile that closely resembles that of the environment. In other words, ‘Did the test use the right size I/O and the right kind of I/O operations?’. The answer here is an emphatic “yes”. HDS, in our opinion, went to great lengths to accurately simulate the I/O pattern of a virtual environment. This is unlike performance benchmarks common in the flash array market where the block size is relatively large and the I/O pattern is often sequential.

The second step is to understand the weakness of the benchmark when compared to the real world; all benchmarks will have a weakness. In this case the HDS calculation of the number of VMs supported was done assuming a steady-state of 25 OPS/SEC per VM. While a reasonable assumption, there is no way of predicting the exact OPS/SEC of each VM until it is in production and its demand peaks and valleys are well understood.

To overcome this weakness we have to examine the benchmark results to make sure there is enough headroom to support theses spikes. In this case there is. The IT planner has to determine the aggregate potential OPS/SEC at peak demand and plan accordingly. One key to the benchmark result is that HNAS 4100 certainly has the raw performance to get there. Another important finding in this benchmark is the near-linear scaling of HNAS architecture. This is critical when it comes to handling the performance load required by the real-word data center since determining what that load may be in advance is not an exact science. The scalability of the system allows the IT Planner to purchase what they believe is just enough performance and capacity and then incrementally add more as demand warrants.

Finally, there is a never-ending quest to reduce the number of storage systems that IT needs to manage in the data center. The problem is that each workload added often has a unique I/O profile that leads to an accumulation of storage silos based on application or environment. Virtualization has only made this situation worse as separate storage systems are purchased for the server and desktop virtualization infrastructures.

Given these benchmark results, and the fact that the HNAS platform is a unified (file and block) architecture, it’s reasonable to assume that HNAS could support a larger virtual server and virtual desktop environment while meeting the demands of file sharing and collaboration. A key proof point was the finding that even after all the benchmark workload was active there was still additional HNAS performance capability to take on additional workloads.

Conclusion

Enterprise IT no longer has the time to do a full scale “bake-off” of various storage solutions. They need a vehicle to create a short list. Performance benchmarks are a valuable way for IT to classify storage solutions and to begin compiling that short list. By using Vdbench and creating workloads similar to a virtual environment HDS has delivered a benchmark that the IT planner can use when investigating the system. The raw data provided can be extrapolated easily into their environment to determine if HNAS will meet their performance requirements. Based on these results, HNAS should certainly be on the short list of any enterprise looking to increase the density of their virtual environment and reduce or eliminate their storage silos.

Sponsored by Hitachi Data Systems

Click Here To Sign Up For Our Newsletter

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Dense VM, HDS, Hitachi Data Systems, Hyperscale, NAS, Network-attached storage, Scalability, Scale Right, Scale-Out, Virtual machine, Virtualization, VM
Posted in Lab Report