The use cases for Network Attached Storage (NAS) have changed dramatically in the 20 years since the first NAS appeared. NAS hardware architectures have also changed. So have the capacities and performance levels that NAS systems are asked to support.
These changes have fostered an evolution from first generation monolithic NAS infrastructure to a more flexible, efficient NAS model. With the advent of virtualization, big data analytics and the massive increases in traditional unstructured data, it’s time for the next generation of scale-out NAS to resolve today’s challenges.
Originally, NAS was developed to replace sprawling file server environments struggling to support user demands to store and share basic office productivity data sets. These were often single-system configurations designed to support a finite amount of capacity and I/O bandwidth. Over time, a growing number of users began using desktops and laptops. As a result, many organizations had to deploy more NAS systems almost as quickly as they had been adding file servers in the past.
At the same time, the datasets themselves began to change. The I/O profile, file size and quantity of data grew. As workstations, servers and NAS systems became more integrated into the workflow of the organization, the speed at which these larger, more complex data sets could be accessed and processed became more important.
Legacy NAS systems were unable to keep pace. These scale-up architectures could add a finite amount of capacity to a system and had rigid data-to-controller relationships, rendering resource utilization less efficient. To compensate, IT administrators often had to buy NAS systems that provided more power than needed at the time of acquisition, incurring higher capital expenditures. And as data sets grew, managing large amounts of data in multiple small volumes became an administrative burden. When the requirements outgrew the horsepower of the NAS system, customers had two options: replace the entire solution or continue to deploy additional but separate NAS systems to keep up with demand, exacerbating administrative costs and complexity.
These issues led to the initial wave of scale-out NAS, technology that allowed users to expand the computing power of their NAS as performance requirements grew, thus lowering initial capital expenditures, increasing flexibility, and simplifying data storage administration.
NAS solution providers have come up with different methodologies to address scalability and data management concerns. Some vendors have built architectures that stitch together many small volumes across multiple systems to provide a single namespace and the ability to scale the aggregate capacity and performance of the system. Such architectures add scale-out features to traditional scale-up models, and only resolve the single namespace and scalability issues. The same rigid data-to-controller and disk relationships prohibit them from taking full advantage of the true scale-out architecture. Performance for each volume is still limited by the power of the controller and disks to which the volume is tied, hence providing very little performance or flexibility improvements over traditional scale-up architectures.
Other scale-out NAS architectures are nodal, or modular. Each node provides compute, memory, network connectivity and capacity that becomes part of a clustered, scale-out NAS. As data is written to the cluster, it is segmented and written across all the nodes in the cluster. This architecture takes advantage of distributed computing resources and, for a given volume, provides both capacity and performance improvements as nodes are added.
However, as the system scales, both capacity and performance are added by default, even if the user only requires one or the other. In some use cases, the inability to decouple capacity and performance growth results in wasteful CAPEX. For example, in large archival depots, capacity scalability is typically more important than performance; this use case does not demand linear scalability on both dimensions. Hence, customers are forced to either over-provision performance because they needed more capacity or over-provision capacity because they needed more performance.
As the demands for unstructured data increase, features such as thin provisioning, deduplication and compression can lead to sizable capacity savings for customers, thus increasing their return on investment. However, not all first generation scale-out solutions provide these capabilities.
The Time for Next Generation Scale-Out NAS
Legacy scale-out architectures have their drawbacks. Some have rigid expansion parameters. Others maximize resource utilization at the expense of feature flexibility. A shift in NAS architecture design is needed to address these limitations.
Next generation NAS architectures should enable expansion of system performance and/or capacity and incorporate feature sets that improve return on investment. Dell Fluid File System (FluidFS), and its integration with the company’s Compellent and EqualLogic storage platforms, is an example of a next generation architecture that overcomes many of the limitations of legacy NAS systems.
The Requirements for next generation Scale-Out NAS
Choice of scale-out or -up – The first weakness of legacy scale-out NAS is the tight locking of storage capacity, storage processing and storage I/O. To address this problem next generation solutions should separate the storage performance decision from the capacity decision. Independently scaling performance and capacity is accomplished by using discrete NAS processing units that are tightly coupled with the backend storage platform providing block storage services and managed by a clustered scale-out file system. This enables storage administrators to scale front-end NAS processing power separately from storage capacity, providing more ability to architect the solution according to the precise workload requirements.
Distributed Processing – In a next generation scale-out architecture, virtualizing storage to take advantage of multiple processing units is important. This allows the user to maximize resource utilization and scale performance. Abstracting a single logical volume allows this architecture to distribute data in any single volume across multiple NAS controllers and the backend storage, allowing parallel processing via multiple storage controllers. As more NAS appliances are added, additional compute, cache and network resources become available to serve I/O needs for existing volumes.
An excellent example of how effective the distributed processing technique can be is found in Dell’s recent filing on SPECsfs. This benchmark is an ideal comparison because the SPECsfs benchmark was designed to evaluate the speed and request-handling capabilities of file-serving devices.
In this test, Dell’s Fluid File System configured with 4-appliances (FS8600) and 2 Compellent Storage Centers were able to generate a 494,244 SPECsfs OPS score with a 1.85 ms overall response time. Even a minimal configured test system with a single appliance and a single Compellent Storage Center was able to achieve 131,684 SPECsfs OPS and a 1.68 ms overall response time. For a comparison see the SPEC.org site, make sure to pay attention to the cost of each system to get a sense of how realistic the vendors were during their tests.
Automated Tiering – Not all datasets in a given scale-out NAS system have the same I/O characteristics or importance to the end user. Instead of adding administrative burden to manually assign data to specific backend disks or LUNs or even to the NAS appliances, these decisions can be made automatically by the backend storage platforms. Ideally the platform should use a mix of SSDs, 10/15K SAS drives and high capacity near-line (NL) SAS drives,
Choice of block and file – Combining scale-out file technology with existing feature-rich, high performance block storage platforms such as Compellent and EqualLogic allows IT administrators to consolidate their block and file workloads using a single storage solution all managed from a single interface.
Storage Optimization – Features such as thin provisioning, deduplication and compression improve storage efficiency and lead to better overall system utilization. With a distributed architecture, the efficiency of dedupe and compression improves as more NAS appliances are added to the storage system.
Data Protection – Most vendors provide at least basic data protection features such as backups, snapshots, replication, anti-virus, etc. Often, however, such features come at a premium price and involve complex licensing. Ideally a next generation system will include all features in the base product, eliminating any additional costs and complexity for implementing such features.
The data center is changing. It’s becoming increasingly more flexible and responsive to the needs of the business. But delivering this flexibility requires a flexible storage infrastructure as well. The ability of next generation scale-out NAS to abstract storage processing, storage capacity and network I/O allows IT administrators and managers to expand the environment incrementally, grow it in the specific areas required by the applications, make best use of their resources and control acquisition and operational expenditures.
Dell is a client of Storage Switzerland