Storage Switzerland has previously discussed the problems that legacy storage file systems have when it comes to serving modern workloads such as artificial intelligence (AI) and high-velocity analytics. We have also explored the qualities that a modern file system requires. In this chapter, we will evaluate WekaIO as a solution to the storage IO challenges modern workloads create.
The Requirements of AI and High-Velocity Analytics Workloads
AI and high-velocity workloads impose new demands on storage and compute infrastructure alike. They require that unprecedented volumes of data (on the scale of terabytes and, increasingly frequently, petabytes) be processed, to ensure the most accurate response to analytics queries and training of neural networks that fuel AI. This data typically has variable and unpredictable access patterns. These workloads necessitate high-end graphics processing units (GPUs) to enable this data to be processed quickly and accurately. Because this compute infrastructure is expensive, optimal utilization is critical. However, most storage infrastructures lack the levels of bandwidth and low latency that are required to fully saturate the GPU clusters.
WekaIO Matrix File System
WekaIO dubs this storage bottleneck “I/O starvation.” Its Matrix file storage architecture was designed to be massively scalable and parallel in nature, so that large amounts of random data from a centralized shared pool can be fed instantaneously and continuously to multiple GPUs, whether on or off-premises. The Matrix architecture can achieve performance of over 10 Gigabytes per second per GPU node, which is ten times that of traditional network file systems and three times that of a local non-volatile memory express (NVMe) solid-state disk (SSD), according to WekaIO. It distributes data and metadata across the infrastructure for parallel access, and employs an InfiniBand or 10Gbit and above Ethernet network stack to facilitate rapid and predictable performance without the complexity associated with copying data between direct-attached storage nodes. Performance scales linearly to further support GPU utilization.
The Matrix architecture facilitates a centralized, singular global namespace across performance SSD storage capacity and lower-cost object storage for long-term retention. According to WekaIO, users may scale their namespace up to an exabyte of capacity, and any Amazon S3 compatible object store (whether on or off-premises) is supported.
The solution can be deployed on-premises on pre-validated industry-standard servers, as well as in the public cloud on Amazon Web Services EC2 P3 GPU instances. Cloud bursting via a hybrid model is supported to address peak workload periods. Users with an on-premises footprint may scale on demand to cloud GPU clusters as needed, and then migrate their data back on-premises when processing is complete. The user either creates a snapshot of the file system that runs in the cloud environment, or they store a backup copy of the file system, in the cloud environment, that can be rehydrated on demand.
Conclusion
As discussed throughout this blog series, AI and high-velocity analytics are among the growing number of modern workloads that require parallel processing of large volumes of data. These workloads require a re-think of file storage architectures to deliver required levels of performance and storage capacity without breaking the budget. For its part, WekaIO’s architecture offers levels of linearly scalable performance, parallel processing and cloud bursting that can help to optimize utilization of expensive on-premises infrastructure deployments, and to facilitate agile responsiveness to data intensive application requirements. Enterprises should consider the Matrix solution as a path to faster and more cost-effective analytics and AI workloads storage infrastructure.
Sponsored by WekaIO