The storage infrastructure for multi-rack scale applications like Hadoop Spark, Cassandra, and CouchBase, are typically built using directly attached flash-based storage instead of a shared flash array. The motivation for using direct-attached storage (DAS) is simple. Media inside a server is lower cost than media inside a shared array. Direct-attached media also has lower latency and higher bandwidth than a shared array, since the array’s data undergoes storage array processing (RAID, DEDUP, Compression, Snapshots, etc.) and has to traverse a network. DAS though is not without its challenges; it isn’t used as efficiently and is more vulnerable to failure.
Shared Storage Falls Short
Shared storage has several advantages that should allow it to overcome the price and latency concern that rack scale application designers have. First, several studies have shown that rack scale architectures (RSA) use less than 30% of the available storage capacity. Capacity utilization is likely to get worse as the density per drive continues to increase. Shared storage should address this by pooling the media resources and allocating them as needed to specific compute nodes. Even if a shared storage array only doubled utilization to 60%, it should be able to overcome much of the cost concerns. Reality though, shows they can’t; there is too much “system” cost wrapped around the media.
High-speed SAS or NVMe over Fabric should address the network latency concern. The modern storage protocol should provide almost the same latency as internal storage, and for the most part, it does. High-speed SAS though, requires extra effort to network, and most flash storage systems don’t yet support an end to end NVMe design.
The technology for shared storage to be a viable option to the rack scale environment exists. However, implementing that technology requires innovation. There is also a problem with the storage system’s software. The software adds too much latency, and it is the bottleneck especially in the flash environment. Most flash arrays only get 5 to 10% of the raw performance of the flash SSD primarily because of the storage system’s software and their lack of supporting end-to-end NVMe.
Complexity Falls Short
To address these challenges, several vendors have come to market with solutions that either try to improve storage software efficiency by placing it in a field programmable gate array (FPGA) or going all-in with a custom ASIC. Other vendors have tried to distribute the storage software processing across nodes within the rack scale cluster. All of these workarounds introduce complexity and to some extent increase costs.
Simple is Better – DriveScale’s Software Composable Infrastructure
What traditional storage software vendors are missing is that most multi-rack scale applications have the same features built into them as the typical storage software, which means that the fundamental cause of the bottleneck is unnecessary. DriveScale takes a much more straightforward approach to deliver the efficiency of shared storage with the performance of direct attached storage.
DriveScale creates a shared pool of direct attached storage that it can assign to, and then later remove from, particular servers in the rack scale cluster. The first iteration of the solution used a SAS to Ethernet bridge to enable high-speed local latencies, but with shared resource efficiencies. The software doesn’t try to overreach. It provides the connectivity between the compute nodes and the hard disk pools and then lets the capabilities available in the software do the rest.
Flash at Rack Scale
A growing number of rack scale architectures need high-performance flash for real-time analysis and processing. The problem is that flash media compounds the inefficiencies of direct attached storage significantly.
Recently, DriveScale announced its solution to deliver shareable flash at direct-attach latencies by leveraging an end-to-end NVMe over Fabrics (NVMe-oF) architecture. On the storage end, the organization installs its flash media into an Ethernet-attached Bunch Of Flash instead of its servers. An Ethernet-attached Bunch Of Flash (eBOF) is an NVMe-oF connected drive shelf coming to market from multiple vendors. DriveScale has a solution, and Western Digital is shipping the Ultrastar Serv24-HA flash storage server.
DriveScale utilizes NVMe-oF on RDMA over Converged Ethernet (RoCE v2) or iSCSI to connect the eBOFs to the assigned servers. After racking the eBOFs, the DriveScale software allocates flash drives or carved slices from individual drives to compute nodes in the cluster. The nodes see their assigned drives as local block devices.
The DriveScale storage pool is available to multiple rack scale application clusters at the same time. IT can move capacity between nodes or between clusters. The result is a massive increase in efficiency without sacrificing performance.
StorageSwiss Take
Sometimes the more straightforward approach is better. Most all-flash arrays, competing in the rack scale market were born in the enterprise storage market. That market needed robust storage software capabilities. In the rack scale market, those features are either redundant or unnecessary. DriveScale’s approach of creating a simple software interface and providing streamlined connectivity provides a more frictionless approach to increasing storage efficiencies in rack scale environments. The result should be a dramatic decrease in costs, and increased flexibility without sacrificing performance.