All-Flash arrays, or almost any array that leverages flash, creates a problem for the rest of the storage infrastructure. It exposes weaknesses in storage software, internal storage system networking and external connectivity. In the past these architecture components hid behind the latency of hard drives. But because flash has almost zero latency, the weaknesses in the storage infrastructure are front and center. One of the biggest areas of exposure is the storage network, the connection from the host array to the server.
The Flash Networking Problem
Most data centers still purchase a centralized, shared storage system. That storage system is often put on a dedicated network, either IP or iSCSI, and connected to physical servers either running a database application or a hypervisor with virtual machines. When an organization moves to flash or all-flash, that move is largely driven by the need to respond to the massive amounts of random IO that both of these environments generate. And for most data centers the move to flash does make a difference, even on existing architectures.
The problem with a move to flash without upgrading the rest of the network is the enterprise is not optimizing its investment. The flash system could potentially support far more workloads, but the old network is limiting its ability to do so. With a better network in-place, the data center can add more workloads to the flash system and extend its life significantly further than the organization might expect.
Making the Network Flash Ready
Certainly the first step for most environments is to upgrade the bandwidth of the network. For IP environments that means moving to at least 10GbE or even jumping right to 25GbE, of which there are several end-to-end (adapters, switches and cables) solutions. For Fibre it means moving to switches with 16Gbps connectivity to the hosts and 32Gbps connectivity between switches.
Increasing bandwidth though is just the first step in the process. Another is to consider transport. Both iSCSI and FC today transport on the SCSI protocol, which adds a substantial amount of latency. But in the hard drive era SCSI was required. For server-based flash storage a new protocol is emerging NVMe. NVMe, as we discuss in our article “What is NVMe?“, is PCIe-based protocol optimized for flash and increases the size of queue and number commands to better match the capabilities of the drive.
NVMe over Fabrics brings the efficiency of the NVMe protocol to the network. Combined with the high bandwidth options described above, the storage network can generate a response time very similar to that of internal storage.
The Hyperconvergence Flash Problem
Hyperconvergence leverages two trends in the data center. The first is virtualization. The hypervisor allows the virtualization of almost any application. The second is software defined storage. For years storage hardware solutions have been more about the software than the hardware. Most vendors now use very similar hardware. Their secret sauce is the storage services and features that their software provides.
Hyperconvergence combines the trends of virtualization and software defined storage. The storage software is virtualized and runs as a virtual machine in each node in the cluster. Capacity internal to the server is assigned to each storage virtual machine and then that capacity is aggregated and made available to the other VMs in the cluster. Data is most often stripped across these nodes as it is written or read.
While many hyperconverged vendors claim simplified networking, the reality is as the hyperconverged environment scales network traffic increases and becomes more complex. Hyperconvergence changes the network from a north-south, many to one communications path, to an east-west many-to-many communications path There is not less or simplified networking there is actually more network IO. In fact, some studies show that as much of 70% of the traffic within a hyperconverged architecture is storage related.
To keep the scale part of hyperconverged architectures working, vendors are now looking at NVMe over Fabrics as the solution. The lowering of latency related to storage IO (again 70% of the traffic) will enable the hyperconverged environment to scale to many more nodes, and a much higher rate of utilization than previously thought.
The NVMeOF Problem
The problem for data centers wanting to push all-flash arrays further is that NVMe over Fabrics (both IP and FC based versions) are not ready for production deployment. They are still in the testing phase. But those tests are going well. IT planners should expect to be able to deploy NVMe over Fabrics solutions by the second half of 2018. Until then they should make sure the network upgrades they perform over the next couple years are forward compatible with NVMe over Fabrics.
Steps to the Optimized Storage Network
There are many elements of the storage architecture IT needs to explore after an upgrade to a flash based storage system. The network is potentially the first. Prior to upgrading the network hardware though, IT planners should take a hard look at the cabling infrastructure. Is it ready for the next generation of network bandwidth? Storage Switzerland took a deep dive on the importance of cabling infrastructure in our article, “The Criticality of Cabling Infrastructure in High Performance Storage Networking“.
The second step is to improve the raw speed, and those improvements are available now. With that upgrade IT should look for the ability to improve the allocation of bandwidth. Look for solutions that provide end-to-end quality of service features and monitor to make sure mission critical applications are getting the IO performance they require.
The third step is to plan for NVMe over Fabrics. Make sure the networking equipment the enterprise purchases going forward is NVMe ready. Also, start to talk to storage vendors about when they will plan on having NVMe over Fabric connectivity.
For many data centers the initial move to a flash-based storage system may seem to solve all their performance problems. But without a flash-optimized network, those organizations may add additional storage systems or storage nodes sooner than they need to. A network upgrade will enable the flash array to reach its full potential and maximize ROI.
The place to learn about all-flash arrays is the Flash Memory Summit in Santa Clara, CA., August 8 to 10. It is one of the most educational events of its type. Whether you are developing the next great flash technology or a data center manager looking to understand how to best leverage flash, there are tracks for you.