Software Defined Storage (SDS) abstracts commodity storage hardware so that it can be used in a scale-out architecture, but this can create a bottleneck between the CPU and the storage media. In many cases, the latency of hard disk drives and even SAS based flash drives, may hide that bottleneck. Thanks to a significant reduction in latency, NVMe flash threatens to expose the overhead of SDS.
The data center is increasingly becoming software defined and running on commodity server hardware. With the use of multi-core processing, the total CPU power of those commodity servers historically made CPU contention a rare occurrence. However, as the data center becomes more software defined and as network throughput increases, the more contention there is for those CPU resources.
In the past, because of storage media latencies, wait states hid CPU contention. The CPU performed the various storage functions while waiting on IO response from the disk drive. However, the speed of flash storage, especially NVMe flash, removes those wait-states creating a problem for software defined storage and networking.
The now impactful overhead of SDS in an NVMe flash system means that many storage systems only realize 10% of the potential raw performance of the flash media installed inside. The shortfall in raw versus actual performance forces many IT architects to deploy mostly internal server storage and leverage only a minimal amount of data storage servers.
The overhead of storage software leads new flash storage startups to deliver either highly optimized storage software or storage systems that leverage FPGA’s or custom silicon. Current SDS vendors and storage hardware integrators need help creating a lower overhead solution without software redevelopment.
Current SDS vendors work around the overhead problem in one of three ways. First, the vendors can require customers to buy storage servers with significantly more CPU power. Purchasing more CPU power, however, raises the cost and power consumption of the solution. Second, SDS vendors can create more multi-threaded, processor-efficient software. Many software vendors can’t afford the time and cost of optimizing existing software in order to make it more CPU-efficient. Third, vendors can leverage dedicated processors or co-processors. An example is Mellanox’s BlueField SmartNIC, a storage network interface card with multiple types of on-board processing.
Introducing Mellanox’s BlueField SmartNIC
The BlueField SmartNIC offloads network, storage and data processing, right on the adapter card. The NIC features an 8 lane PCIe Gen 3/4 half-height, half-length card, with two 25Gb/s Ethernet ports and a BlueField System on a Chip (SoC) with DDR4 DRAM memory. (Mellanox also offers a larger BlueField controller card with two 100Gb/s Ethernet ports and 16 PCIe Gen3/4 lanes, which is used as a flash array or JBOF controller, not as a SmartNIC.)
Mellanox claims the BlueField Smart Network delivers the best TCP/IP performance in the industry. The embedded ConnectX adapter offloads network and storage traffic with RDMA over Converged Ethernet (RoCE) as well. BlueField can simultaneously mix and match TCP and RoCE traffic as workloads require. Installed in a storage server the BlueField SmartNIC operates as a co-processor, offloading specific storage tasks from the physical storage server’s main CPU. The SoC employs a standard Linux software stack, which means that any vendor with a Linux based SDS application can support the NIC without major code changes. An SDS vendor that supports the SmartNIC can load their entire software package (or a slice of it) onto the card to remove any fear of bottlenecks that are caused by flash storage IO.
Most of the applications today aren’t SAN aware, they assume that storage is local and has adequate capacity. Classic examples are database and distributed file systems that aren’t aware of the storage media proximity. Hence, IT administrators traditionally overprovision storage resources so as to not run into capacity problems. This results in inefficiency due to underutilization of storage resources and causes higher server CapEx. SDS lends itself to a distributed and scale-out cloud storage model, however hiding the access latency of the remote SSD drives is important. Essentially, all virtualized, cloud scale, storage should look local, with local like latencies, to applications.
As an initiator, the BlueField SmartNIC is ideal for hyper-converged solutions to enable the host CPU to focus on compute while the Bluefield’s Arm cores handle the storage interface, data transfers, security, load balancing, and network virtualization. Further, the BlueField SmartNIC can virtualize all networked storage resources so they look like direct attached local storage to applications. Thus, IT administrators can achieve better utilization and efficiency of expensive NVMe drives using NVMe virtualization through the BlueField SmartNICs.
NVMe’s ability to expose bottlenecks caused by SDS’ overhead is just the beginning. As networks and storage media get faster the problem gets worse. In the same way that AI and Machine Learning workloads offload processing to GPU cores, SDS solutions need to offload their processing to dedicated storage cores. Implementing the co-processing on the storage network adapter makes the most sense.
Mellanox’s provision of a Linux operating environment enables SDS vendors to adopt it quickly as a standard means of implementing their solutions while the use of accelerators and the Arm cores enable exciting innovations for storage applications.