Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) workloads often start as skunk works projects within an organization. After the proof of concept and testing they move into production, which means storage performance and capacity demands for the solution increase rapidly. Since most of AI/ML architectures are built from dozens if not hundreds of servers simultaneously accessing the same unstructured data set, a shared file system is the most obvious choice. IT planners often try to use legacy NAS systems to support the AI/ML workloads but quickly find legacy NAS lacking.
Why Traditional NAS Falls Short
A NAS has two critical components, the software that actually provides the services and the hardware that delivers those services. The traditional NAS hardware is a single or dual controller system that routes IO requests to several shelves of hard disk or flash media. All data flows through these controllers and in AI/ML workloads they quickly become the bottleneck.
The data that AI/ML workloads use are typically made up of very large quantities of small files. It is not uncommon for file counts to reach into the high millions or even billions. The software of the typical NAS system can also bottleneck in these high file count situations. The NAS file system is bogged down by the metadata required to track these files.
The typical legacy NAS performance work around is to leverage flash media. A hybrid or all-flash NAS provides a performance improvement over hard disk drives but eventually the flash media is also inhibited by the hardware IO bottleneck and the software’s inefficiency in managing metadata.
Why Scale-Out NAS Falls Short
In an attempt to address storage performance concerns, many IT professionals attempt to leverage scale-out NAS systems which connect dozens of storage servers, called nodes, into a single storage construct. The problem is that most scale-out NAS systems are not truly parallel; they have a single set of control nodes that manage IO movement. A request for data must first go through the control nodes, which then route the IO to the storage nodes. Once the request is received the storage node then routes the IO traffic back through the control nodes before the data is sent to the requesting user or application. There is no way, in most scale-out storage architectures, to route storage traffic directly to the nodes containing the data, instead all data must route through a set of control nodes. These control nodes create a similar bottleneck as scale-up architectures.
Why Legacy Parallel File Systems Fall Short
Most AI/ML storage architectures eventually end up with a parallel file system. These file systems enable compute servers to communicate directly with the node(s) that have the data those servers need. With a parallel file system performance scales as nodes are added, enabling it to keep pace with a rapidly growing compute infrastructure and data set.
However, the problem with most parallel file systems is they were written over a decade ago. Their performance was designed in an era of single core processors and hard disk drive based media. While some parallel file systems dedicate certain processes to certain cores they are not multi-threaded.
A larger problem is the lack of support of flash media, specifically NVMe media. In the past, using a hard disk driver to interface with a flash drive was acceptable since SAS based flash behaved very similarly to a hard disk drive. NVMe however connects via a different bus (PCIe) and supports much higher queue depths and command counts but the driver software needs updating to take advantage of it. Most parallel file systems run on top of a Linux foundation and don’t fully exploit the performance of NVMe. While these systems will see a performance improvement over traditional SAS flash they won’t achieve anything close to the per drive potential performance of the hardware.
Why Direct Attached Storage Falls Short
These challenges eventually lead the organization to use direct attached storage for their AI and ML workloads. Direct attached storage eliminates the overhead of the network, but in most cases still uses an inferior NVMe driver. Additionally, a direct attached storage solution inherits all of the challenges common to direct attached solutions which led to the use of shared storage in the first place. Data is rarely in the right place at the right time and IT needs to constantly copy it from one compute server to another. Capacity utilization is also inefficient. Most direct attached AI/ML environments use less than 30% of storage capacity, which means the most premium tier of storage goes largely unused.
Time For A New File System
AI and ML are the definition of modern workloads. It makes sense then that the storage architecture should also be modern. It needs to be parallel in nature but also be optimized for modern multi-core storage servers and have native, built-in support for advanced storage technology like NVMe. The goal of the modern file system should be to extract maximum performance out of the flash media while also delivering scalability, efficiency and ease of use.
In our next blog Storage Switzerland discusses how to design a file system capable of supporting AI and ML workloads as well as having advanced capabilities, like native cloud integration, that modern data centers require.
Sponsored by WekaIO