Overcoming NFS as a Machine Learning Inhibitor – Quobyte Briefing Note

In today’s insights economy, much can be gleaned from the hyperscale cloud (Amazon, Google, Azure) providers’ ability to deliver an infrastructure that provides strong degrees of data resiliency and protection, agility, and performance – with minimal intervention from storage administrators. For example, as high-performance computing (HPC) workloads find their way into the enterprise to support big data analytics, machine learning (ML) and artificial intelligence (AI), manually copying data across silos becomes too expensive and cumbersome, hindering the organization’s ability to maximize value from these workloads.

Who is Quobyte?

Quobyte offers a software-based parallel file system that can run on any standard x86 servers. The software facilitates unified access for block, file and object storage access protocols as well as a multitude of infrastructure platforms including OpenStack; Docker, Kubernetes, and Mesosphere containers; Apache Hadoop and Spark big data analytics; as well as virtualized and bare metal infrastructure. The architecture uses file system drivers and plugins to deliver storage services directly to the infrastructure platform without requiring the application to be changed. The ability to combine various storage media and to serve various storage interfaces in a unified manner is becoming important in the era of analytics and ML. Data is entering the organization and must be processed through a wide variety of means. For instance, data may be generated on a Linux-based Internet of Things (IoT) sensor and then be ingested via non-volatile memory express (NVMe).

According to Quobyte, its architecture provides sub-millisecond latency that can scale linearly from clusters of four to thousands of nodes. For example, a throughput-heavy workload like video streaming may be run in parallel across multiple nodes to support application performance via increased network bandwidth. Performance is further accelerated by the fact that metadata operations and data services are parallelized and dedicated to specific central processing unit (CPU) cores.

Another key aspect of Quobyte’s value proposition is streamlined storage administration. To this end, its platform integrates via an API with existing automation tools to support automated, policy-based data distribution and retention from a centralized user interface. ForQuobyte can be used with hard-disk drives (HDDs) and solid-state drives (SSDs) side by side in one system so for example, the storage administrator can prescribe that solid-state drives (SSDs) be used for fast data ingest, and that files are then tieredmoved to hard drives as they become less frequently accessed. They also might dictate that data cannot be modified for a certain period of time. Policies can be dictated down to the file level and users can be isolated from one another, which helps to support granular quality of service (QoS) and data privacy in a multi-tenant construct.

Quobyte’s New TensorFlow Plugin

Quobyte has taken the next step in supporting machine learning by unveiling a plug in for TensorFlow, a popular open source ML platform. Typically, ML training workloads are characterized either by smaller files, in which case latency determines performance, or by larger files such as videos, in which case throughput determines performance. The problem in both instances is that every operation must traverse the Linux kernel and the local driver. With Quobyte’s plugin, data buffering goes directly to the Quobyte library. The kernel, context switching, and data copying are all eliminated. According to Quobyte, it has achieved 30% faster data loading for mixed workloads of mid-sized files on a TCP network. Furthermore, memory bandwidth and CPU cycles are preserved and graphics processing units (GPUs) can be more fully saturated. The user does not need to modify their application to use Quobyte’s plugin.

StorageSwiss Take

Quobyte’s core focus on enabling massively scalable infrastructure to be managed by a small IT support team will grow increasingly valuable as HPC becomes more commonly used by enterprises. These projects typically start small and scale across the organization, necessitating a centralized storage fabric that is elastic and durable and provides consistent levels of fast performance, alongside management simplicity. Bypassing the Network File System (NFS) protocol, (which cannot scale without bottlenecks, which relies on data to transit through the Linux kernel in TensorFlow ML training modules, and which does not incorporate end-to-end checksums for data integrity), is a unique approach that enables betterhigher saturation of expensive GPU, storage memory and storage capacity resources.

Sign up for our Newsletter. Get updates on our latest articles and webinars, plus EXCLUSIVE subscriber only content.

Senior Analyst, Krista Macomber produces analyst commentary and contributes to a range of client deliverables including white papers, webinars and videos for Storage Switzerland. She has a decade of experience covering all things storage, data center and cloud infrastructure, including: technology and vendor portfolio developments; customer buying behavior trends; and vendor ecosystems, go-to-market positioning, and business models. Her previous experience includes leading the IT infrastructure practice of analyst firm Technology Business Research, and leading market intelligence initiatives for media company TechTarget.

Tagged with: , , , , , , , , , , , ,
Posted in Briefing Note

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 25,553 other subscribers
Blog Stats
%d bloggers like this: