Deep Learning is a machine learning method that uses algorithms to determine a predictable behavior by analyzing seemingly disparate types of data. Use cases include fraud prevention, image classification, speech recognition, and countless others. To deliver a frictionless interaction with the user, these systems must analyze potentially millions of small files very quickly, providing what seems like a real-time answer to the inquiry. The required compute demand is met by leveraging a cluster of servers that utilize a GPU to aid the CPU in inspecting information. The weak link in these designs is the storage architecture.
The Deep Learning Storage Challenge
The decision for designers of deep learning storage architectures is, do they use storage internal to the GPU server or do they use a shared storage system? Internal storage has very low latency but limits flexibility and scalability. Data can’t be easily moved between nodes. It also means resource inefficiencies. One node might be overloaded while another may have plenty of capacity and remain idle. These are very expensive resources to not be used to their fullest potential. A shared storage system provides flexibility, scale, and is more efficient, but it introduces network latency.
An NVMe-based storage system eliminates the concerns over network latency, but these systems are typically block-based. The designer will need to lay in a file system, typically Gluster or Luster. These file systems are decades old, weren’t designed for flash and were not designed for NVMe. As a result, they introduce latencies of their own. These file systems are bandwidth optimized and more appropriate for large file transfers; machine learning is the opposite and requires rapid inspection of very small files.
Deep Learning Storage Solution
WekaIO’ Matrix is a modern file system designed specifically for flash, SATA, SAS, NVMe and NVMe over Fabrics. Like the compute layer, Matrix clusters together server nodes so they are managed as a single entity and namespace.
(To learn more about Matrix, see our Briefing Note, “How to Solve the Unstructured Data Paradox.”)
Deep learning requires the inspection of millions of small files. Those files are often created by devices or sensors, part of the Internet of Things. Unlike a bandwidth heavy use case, the Deep Learning application will try to quickly traverse the file system finding the data relevant to the request, inspect those files and relevant connections to others before generating an answer. It is much more of an IOPS transaction than the bandwidth use case. Latency, metadata efficiency and parallel access to data nodes is critical to achieving the desired result. As a distributed, parallel file system, Matrix is ideal for this use case.
Deep Learning in the Cloud
The cloud seems like an obvious complement to an on-premises deep learning environment. Many organizations want to use the cloud to support peak workload activity and for testing and development. The challenge is getting the data to the cloud, especially deep learning data that often consists of data sets with millions of small files.
Matrix supports multiple public, private and hybrid cloud use cases. First, it can leverage the cloud for storing or archiving infrequently accessed data on more cost-effective object storage with its integrated policy based tiering. This allows organizations that work on a subset of their data to deploy less on-premises flash-based storage; the remaining data resides on a public or private cloud until needed. When needed, the data is rehydrated to the flash tier for processing.
WekaIO’s Matrix also supports cloud bursting for peak needs, remote backup, and disaster recovery. A snapshot of the file system is taken instantaneously and uploaded to AWS’ S3 service. At runtime, the snapshot is mounted to a cluster of EC2 I3 instances running Matrix software, serving as a dedicated storage server. This provides complete data center resiliency in the event of a disaster or the working file system is compromised. For cloud bursting, the process is the same except that the snapshot is mounted on a compute cluster that can be several times larger than the on-prem cluster.
I3 instances are well-suited for I/O intensive workloads and equipped with super-efficient NVMe SSD storage that can deliver up to 3.3 million of random 4 KB IOPS and up to 16 GB/second of sequential disk throughput. This level of performance makes them a great fit for any deep learning workload. As a dedicated storage server, Matrix can then be accessed using a variety of EC2 instance types. For example, Amazon EC2 P3 instances are the next generation of GPU compute instances that are powerful and scalable to provide GPU-based parallel compute capabilities. Once data has been processed it can be either pushed to S3 for long-term economical storage, or if desired, the results can be sent back to the on-premises Matrix cluster.
The ability to analyze discrete and seemingly disparate data sets to predict outcomes or to make better decisions is something most organizations need, and deep learning is the way to get there. Striking a balance between low latency, internal storage and highly flexible shared storage is a key challenge when designing a deep learning storage architecture.
WekaIO’s Matrix promises to give deep learning environments the best of both worlds, internal performance and latencies with shared storage flexibility and data protection. Combine that with WekaIO’s ability to leverage cloud resources, and it presents a very compelling alternative to traditional deep learning storage approaches.