One of the requirements facing buyers of software defined storage (SDS) is that, by definition, it still requires hardware. SDS Vendors deliver some solutions in the form of either a dedicated appliance, some offer flexibility by specifying a Hardware Compatibility List (HCL), and others can deploy in VM environments. In many cases, SDS solutions have very rigid dependencies on media, hardware configuration, and networking, which deflates many buyers’ expectations of SDS cost and flexibility benefits. Not only do these dependencies impact up-front cost and flexibility, but they also impose substantial costs over the life of the deployment because hardware refresh becomes a major challenge.
Introducing NooBaa
NooBaa, a new SDS solution, took a different architectural approach in order to improve flexibility over traditional clustered architectures. It can leverage any capacity on almost any server, anywhere, including the option to gracefully share space on existing hosts. With the strong trend towards cloud for DR and archival, NooBaa also integrates with public cloud storage to accommodate hybrid storage models. The result is the solution does not require the purchase of specific hardware “nodes” for storage, nor does it necessarily require the budgeting of extra resources in a virtual environment. IT can contribute capacity to the NooBaa cluster from almost any server in the environment (or in the cloud). This flexibility allows administrators to self-install NooBaa typically in 15-minutes or less for evaluation.
NooBaa Architecture
There are three components to the NooBaa architecture.
The first is the NooBaa Access Node, an HTTP server that supports IO through the S3 API, which is now widely accepted as a de-facto standard for file IO. There is no requirement for hardware for the Access Node, when a system is set up a system, the Access Node runs (initially) in the same VM as the NooBaa Core. Applications write and read data through a mount-point that the access node presents. The access node provides deduplication, compression, and encryption of data.
The NooBaa Core provides the fundamental innovation for NooBaa, by moving the complete system control-plane out of the data-path and into a reliable VM environment. The Core optimizes the placement of data for resilience, performance and locality. Resilience for now is via replication. A minimum of 3 copies of each unique data chunk are dispersed around the cluster, with further replication specified by the administrator. Performance is optimized through heat-mapping, auto-tiering, and by automatic localization of data near application hosts. In short, the core does the heavy lifting of data management.
The third component is the NooBaa Daemon, which is a lightweight agent that runs on any Linux or Windows host that will contribute storage to the NooBaa cluster. The storage daemon stores the data, responds to requests for data, monitors local resource consumption and keeps the core appraised of its health. If a disk or node fails, then the NooBaa Core coordinates a distributed self-healing operation that uses peer-to-peer data transfers for efficiency and speed.
Most interestingly is the way NooBaa manages resource consumption when the Daemon is deployed on a shared host. NooBaa is designed to be a second-class-citizen by reducing footprint when capacity resources dwindle. For example, the daemon could run on a physical server that is also acting as a compute node in a Hadoop cluster. When that node suddenly needs to process a job, NooBaa can lower its resource consumption and utilization on that node until the job is finished, including evacuation if appropriate.
Run Anywhere
The NooBaa solution can run on-premises, spread across datacenters, and in the cloud. Unlike other SDS solutions that can run in cloud VMs, NooBaa has native integration with RESTful storage including AWS S3 and Azure Blob Storage as additional capacity options. Customers can leverage cloud as a DR destination for replication jobs or run NooBaa entirely in the cloud, taking advantage of features like deduplication, encryption and compression.
Is Programmability the New Storage Frontier?
NooBaa’s current release is showing a “serverless computing” capability that follows Amazon’s Lambda conventions. In serverless computing, custom script functions can be executed based on external triggers such as a button-press in a mobile app, an incoming IOT packet, or simply a new object being added to storage. In NooBaa’s case, JavaScript functions can be executed to extend storage functionality or integrate with custom workflows. Simple examples could be the extraction of metadata from a custom file-format or custom alert messages when a filter detects unauthorized PII data.
StorageSwiss Take
Most SDS solutions should be easy to evaluate: Download it, install it as a virtual machine and start testing. But in practice, rigid dependencies on hardware and networking can add up-front cost and complexity for customers. Many people learned the hard way that with SDS there is “no free lunch”. Furthermore, when these solutions move to production, administrators need to do some planning to determine which resources they will actually need. NooBaa seems to solve these problems thanks to its virtual-controller architecture that provides far greater flexibility that the existing generation of clustered storage solutions, and it makes scaling throughout the environment more cost effective and practical.
We also see value in NooBaa’s emerging capability to support Lambda programmability, unleashing the underutilized compute resources common in traditional SDS solutions, reducing data movement, and unabling rapid customizations for specific use-cases.
