Hyperscale architectures typically sacrifice resource efficiency for performance by using direct attached storage instead of a shared storage solution. That lost efficiency though, means the organization is spending money on excess compute, graphics processing units (GPUs) and storage capacity that it doesn’t need. Some organizations are looking to Non-volatile Memory Express over Fabric (NVMe-oF) storage networks to help them regain efficiency and better use of IT spend.
NVMe-oF, on Ethernet, comes in two flavors: RDMA and TCP fabrics. NVMe over RDMA rivals the latency of direct attached storage while NVMe over TCP requires no new investments in HBAs and NICs. Both provide the potential for highly efficient use of the parallelized, deep queues and high I/O of NVMe drives by providing access over a fabric to many central processing units (CPUs) and GPUs. Which provides greater efficiency of all resources. However, NVMe-oF is only the first step.
While NVMe-oF does eliminate much of the latency and performance concerns of previous networking standards, it only solves half the problem. It doesn’t provide a means to allocate these shared resources in an on-demand fashion. Most hyperscale environments assign jobs to a specific node in the cluster. That node does most of the processing required for that job and provides a result.
The hyperscale application administrator needs to manually allocate resources to that job or make sure the job is placed/started on a node with the appropriate resources already preallocated. As a result, even though NVMe-oF provides the potential for it, efficiency, or at least an attempt to be efficient, must come through either a manual allocation of resources or a static assignment of them. Also, in most cases the core software is pre-installed on the node itself, it can’t, for example, run an Elastic workload one moment and TensorFlow the next.
The lack of a simple, automated reconfiguration of node resources and functionality leads, once again, to inefficiency. Even in an NVMe-oF infrastructure, nodes are often hard set to run a particular workload type. In most environments nodes are either configured for a worst-case scenario or jobs must wait until the appropriate nodes become available.
NVMe-oF Needs Composability
The missing part of the solution to enable an architecture designed for optimal hyperscale efficiency is composability. A composable infrastructure is able to reconfigure itself, either programmatically or through simple administrator interaction, as needed. Composability essentially takes all the components of the infrastructure and puts them on a dynamically accessible shelf. These components can be allocated on-demand to the most appropriate node within the cluster. Nodes can run drastically different workloads from one moment to the next. The data and the application are, through an NVMe-oF connection directly attached, on the fly, to the most appropriate node for the job. Then once the job is complete, the components are returned to the dynamic shelf or allocated to a node that is better suited for the workloads’ idle state.
Composability Doesn’t Need NVMe-oF
For many organizations, as long as they have the capabilities that composable infrastructure offers, the latency of a standard Ethernet or iSCSI connection is acceptable. For these organizations a composable infrastructure that supports more than NVMe is ideal as it gives them the dynamic flexibility to reconfigure nodes as needed without having to dramatically change their network.
Benefits of Composability
The primary benefit of a composable infrastructure is efficiency. This efficiency leads to dramatic reductions in hardware spend. Expensive CPUs, and especially GPUs, can be used across multiple workloads and jobs. Storage capacity can be allocated and reallocated to the most appropriate node for the job at any given point in time. In addition to a reduction in hardware spend, customers using the solution should also see an improvement in overall job completion times. Instead of being locked into a particular node for all jobs, IT can shift workloads to other higher performance nodes when they are available.
In our next blog, Storage Switzerland will provide an analysis of DriveScale’s Composable Infrastructure solution which enables on-the-fly creation of servers and efficient use of resources over a variety of connectivity options including NVMe-oF.
In the meantime, watch for upcoming live 15 minute webinar “Composing Infrastructure for Elastic, Hadoop, Kafka and Splunk” on August 29th at 1:00pm ET. Register and receive a copy of Storage Switzerland’s eBook “Is NVMe-oF Enough to Fix the Hyperscale Problem?”