The modern data center is increasingly microservice or container-based. These workloads are dynamic and unpredictable. The datasets within these workloads range from thousands of large files to billions of small files. Artificial Intelligence (AI) and Machine Learning (ML) are being used across these workloads to generate the appropriate context for the information they contain. Also, these workloads are driven by more than just Intel CPUs. Offload processors like graphics processing units (GPU), tensor processing units (TPU) and System on a Chip (SoC) are appearing throughout the infrastructure.
Can the Storage Infrastructure Keep Pace?
All of the advancements at the compute layer, as well as the dramatically changing nature of workload deployment, puts more pressure on the storage infrastructure to meet the demands of these environments. Storage media, both flash-based Non-Volatile Memory Express (NVMe) SSDs and NVMe-based Storage Class Memory (SCM) technology like Intel’s Optane, can reduce storage latency to microseconds and deliver millions of IOPS. Internal and external architectures are also advancing thanks once again to NVMe communication within the storage system and NVMe over Fabrics (NVMe-oF) for external connection to the computing layer.
The Storage Software Challenge
The bottleneck for the storage infrastructure is the storage software. Most storage software today was written in the Small Computer System Interface (SCSI) era, not the NVMe era, and NVMe is fundamentally different. To extract the full potential from NVMe, organizations need to make sure their software leverages its highly parallel nature. Also, the CPUs used in new storage servers are different than when developers first created these solutions. Moore’s law has effectively ended. Performance increases occur because of the increasing number of cores per CPU more so than gains in per-core performance. Here again, modern software needs multi-threading to take advantage of the available processing power hidden inside each core.
Modern storage software also needs portability. The best location for it to execute may no longer be on Intel CPUs in a storage server. Multiple companies, including Broadcom and Mellanox, are delivering SoC interface cards suitable for hosting storage software, freeing CPUs to work on other tasks. Storage Switzerland’s analysis on the Mellanox SoC is available in our briefing note “Solving the Software Defined Storage Bottleneck – Mellanox Briefing Note.”
For the modern data center, storage software needs to deliver disaggregated and composable storage architecture. It also needs to avoid inhibiting the low latency and high bandwidth of modern NVMe media.
Introducing EXTEN –
EXTEN is based in Austin, TX, and is a pioneer in NVMe over Fabrics solutions. It provides storage software to enable modern NVMe storage architectures to reach their full potential. The EXTEN software runs on a storage target and requires no client CPU resources. The client-free deployment empowers the client’s resources to be predictably available to the application tier. EXTEN claims it imposes less than one microsecond of latency on the infrastructure. It also embraces open management frameworks like the Redfish/Swordfish API.
The EXTEN framework is itself a series of independent microservices for flexible feature layering, deployment, and platform integration. It uses a parallel storage stack ideal for NVMe and NVMe-oF. EXTEN leverages multi-core CPUs by using each core as an independent controller. Not only does leveraging individual cores as controllers increase overall efficiency, but it also provides a quality of service attribute to the connected workloads.
EXTEN also is very flexible in connectivity. Workloads can connect to EXTEN via a Transmission Control Protocol (TCP) Client, which leverages the existing network infrastructure, or Remote Direct Memory Access (RDMA) Client, which leverages the deterministic low latency of modern network infrastructure. The EXTEN Targets are NVMe-based storage servers, or the software can install on SoC networking cards inside a storage shelf full of NVMe flash.
The EXTEN solution is designed to drive up performance while not adding latency. In test configurations, the company claims 20 microseconds of round trip latency while delivering 60 GB/s of bandwidth and 10 million IOPS. The test data that EXTEN shared with Storage Switzerland consistently shows that the saturation point of the test occurred at network saturation, meaning the software is no longer the bottleneck.
As the software reaches these extreme levels of performance, there is no load, from a storage perspective, on the client. Certainly, the client is working harder because it is no longer waiting on storage IO. Volumes that administrators create though the EXTEN console appears to connecting workloads as raw devices. The software also provides maximum NVMe efficiency thanks to its support of a user-mode data path and a lock-free design, which enables the data path to bypass the kernel, further lowering overhead.
EXTEN provides resiliency at both the drive level and the node level. Drive level resiliency includes RAID 0, 1, 10 and 6 with drive hot plug. For additional data protection and resiliency, EXTEN can cluster multiple storage targets together. Data either replicates between multiple nodes in the cluster, or the customer can choose to create a RAID6 stripe across cluster nodes.
Software is one component of the storage infrastructure that has not seen a significant upgrade in the last few years. NVMe is quickly becoming the dominant storage media interface, and NVMe-oF, especially with NVMe-oF/TCP, is set to increase its adoption by IT planners. On the computing side, there are now plenty of workloads that can fully tap into the performance potential of these storage infrastructure improvements.
The storage software remains the challenge. Companies like EXTEN are moving quickly to address this challenge by using microservices and highly parallel design to fully exploit the potential of the new infrastructures.