As analytics environments like Hadoop, Elastic, Kafka and TensorFlow continue to scale, organizations need to find a way to create a shared infrastructure that can deliver the bandwidth, flexibility, and efficiency that these environments need. In a recent Storage Intensity podcast, Tom Lyon, founder and chief scientist of DriveScale and George Crump, Lead Analyst of Storage Switzerland, sat down to discuss a wide range of subjects including Non-Volatile Memory express, (NVMe), Non-Volatile Memory express over Fabric (NVMe-oF), and Composable Infrastructures. You can listen to the podcast below:
One of those topics we discuss in the podcast is the challenges in designing a storage infrastructure for Hadoop.
Designing a storage infrastructure for Hadoop, Elastic, Kafka and TensorFlow, and other modern workloads means overcoming several conflicting challenges. Organizations need a storage infrastructure that can deliver high bandwidth, multi-rack redundancy, workload flexibility, optimal efficiency, and of course, low cost. The reality is most storage solutions can’t meet all these requirements in the same system, forcing the IT planner to compromise.
IT planners most often settle on direct-attached storage (DAS) architectures, which gives them high bandwidth, multi-rack redundancy, and low cost. However, they sacrifice workload flexibility and efficiency, which as the environment scales, ends up costing the organization far more than what it initially saved with the DAS infrastructure.
A New Storage Architecture for Hadoop
If organizations are to adopt a new storage architecture for Hadoop, one that is shared so it can overcome the workload flexibility and storage efficiency problem, it must also not reintroduce the challenges that lead to DAS in the first place.
Overcoming the Parallel IO Challenge
The first challenge that Hadoop and workloads like it present to a storage infrastructure are the high bandwidth requirements of hundreds, potentially thousands of nodes. Traditional shared storage infrastructures struggle to meet this demand. Ethernet network bandwidth, however, now supports speeds over 100Gbs with faster bandwidths on the horizon.
The challenge is the Small Computer System Interface (SCSI) protocol used to traverse this network. The SCSI command count and queue depths are relatively small. As a result, CPUs and the network end up waiting for the storage infrastructure to process the Input/Output (IO) request. NVMe is the answer to this problem. Where SCSI-based SAS has a queue depth of 254, NVMe has a queue depth of 64,000. That means that if a Hadoop cluster sends 50,000 IO requests to the storage architecture at once, SAS can only process 254 of the requests, the other 49,746 have to wait. NVMe, on the other hand, can handle all of these IO requests at once.
NVMe has a networked version, NVMe over Fabrics (NVMe-oF). The challenge is that until recently, NVMe-oF requires IT to upgrade the network infrastructure with new switches and new network interface cards to something that supports Remote Direct Memory Access (RDMA). Now, though, organizations can use NVMe over traditional Transmission Control Protocol (TCP) networks with NVMe/TCP, which means the NVMe protocol, can run over any standard TCP network. Without RDMA, there is the standard TCP overhead, but the infrastructure does benefit from the massive increase in queue and command depth.
Overcoming the Cost Challenge
If the network can deliver the bandwidth, the next challenge to overcome is the cost challenge. Most mainstream shared storage solutions are too feature-rich for the Elastic, Hadoop, Kafka, and TensorFlow environments. Also, many of the shared storage system features are not appropriate for these massively parallel, eventually consistent environments. For example, how can a storage system without knowledge of one of these applications ever hope to take a valid snapshot? The only place it makes sense for the snapshot to occur is from the application. As a result, these environments count on the application to drive most of the storage feature set. Their developers design them to work with internal disk drives. The more similar the storage solution can present its storage as internal drives, the better. And, the fewer features it has, the better.
The Value of Efficiency
Using standard drives and providing only the features the Elastic, Hadoop, Kafka, and TensorFlow environments need can dramatically reduce the cost. Assuming the solution uses NVMe-oF then it can also reduce cost by providing greater efficiency without impacting performance. A shared storage solution provides higher resource storage utilization. It also provides servers with the ability to change personality as the workload demands change, which eliminates the need for an organization to have multiple clusters dedicated to each workload they are running.
The Elastic, Hadoop, Kafka and TensorFlow Storage Architecture Design
The storage architecture for these modern workloads should consist of an orchestration tool that enables the administrators to change the cluster configuration and personality almost instantly. Typically, it will also need an agent that installs on each server that works with the orchestrator. The network, today, needs to be some form of NVMe, end-to-end. Finally, the storage chassis needs to be relatively basic, just a shelf of NVMe Flash. Some hardware vendors are shipping a chassis that has computing power with the flash, which a software vendor can use to house their orchestrators.
The DAS architecture typical in today’s Elastic, Hadoop, Kafka, and TensorFlow environments becomes unwieldy and expensive as these environments scale. Organizations that left shared storage for the compelling price advantage and initial simplicity of the DAS design now need to return to a shared storage environment. The shared storage environment, though, must deliver efficiency and flexibility without sacrificing high-bandwidth. Solutions like those from DriveScale can provide both sides of the requirements list, enabling the customer to get the best of both worlds.
Or subscribe to Storage Intensity, so you don’t miss an episode.