A Hybrid NAS for Unstructured Data

Periodically a data center will try to consolidate down to a single storage system. In almost every case this proves to be a futile exercise. Some workloads need high performance but have modest capacity requirements, and others have modest performance requirements but high capacity and longer retention needs. A storage system that tries to meet all of these demands typically has to overcompensate and provides way too much performance or not enough capacity, all of which leads to a much higher cost.

Nowhere is this gap more apparent than when having to store unstructured data, the traditional sweet spot for network attached storage (NAS) systems. Most NAS from vendors like EMC, NetApp, and HDS have moved upstream from simple file storage into databases and virtualization. While they can still support the dramatic growth that’s often seen in unstructured data, they do so at a higher cost.

When a company places unstructured data on these premium NAS systems, it is providing performance that’s typically not needed, because as much as 80% of these data sets won’t be accessed again 90 days after creation. IT planners would do well to consider a hybrid solution that can help drive down costs by moving unstructured data from these primary NAS systems to a more cost-effective storage area after their initial creation. In some cases, the data should never be on these systems at all, but instead be created and stored only on the Hybrid NAS system.

As traditional NAS systems began to focus on higher performance workloads, like databases and virtualization, they added capabilities such as caching and tiering to flash storage. But the effort to meet performance demands moved the cost of these systems up, creating a contradictory problem. These use cases don’t typically need the high performance, they need high capacity. To deliver that capacity many of these mainstream vendors are providing a scale-out NAS storage option.

The Scale-out NAS Problem

Scale-out NAS attempts to address the capacity demands of unstructured data by creating a NAS comprised of storage nodes (servers with storage inside them). These nodes are then clustered to present an aggregated pool of storage to the connecting users and applications. Nodes can be seamlessly added to this environment without disruption. Most scale-out NAS environments can expand into the petabyte range, so the capacity problem of unstructured data is solved. But IT planners need to consider the cost and complexity of this solution when determining if there might be a better way for their data center.

Many leverage a backend Ethernet network for internode communication. This network can become complex to manage and may limit performance as the number of nodes scales. Even the clustering software itself can become overwhelmed as all the nodes converse with each other to make sure that data is correctly stored and protected.

Scale-out solutions also don’t tend to be as efficient as other options. With a scale-up solution, as an example, you keep adding capacity until you have reached the compute and networking limits of that system. Most vendors will cap their systems to the capacity limit of the compute and networking components. Scale-out systems tend to be short on per-node storage capacity but have plenty of compute and networking. This means that nodes are added to meet capacity demands long before compute and networking resources are fully utilized. The net result is that in a large scale-out cluster, much of the compute resource in particular is wasted.

What is needed is a more hybrid approach, one where a scale-up design can be matched with a scalable back end for maximum resource utilization at a minimal cost.

A New Type of Hybrid Storage

When the term “hybrid” is thrown around in NAS storage circles, it is often assumed that the “hybrid” refers to a mixture of flash and hard disk drives. But as described above this combination does little to curtail costs or allow for massive capacity expansion. Instead, a new type of hybrid storage is needed, one that seamlessly integrates hard disk drives with tape libraries. The front end of this system would be NAS storage, not a flash-based cache. But because it has a tape library back end the hard disk storage area could leverage a simpler, more cost effective scale-up design.

This Hybrid NAS would be interfaced the same way that a more traditional NAS is, via CIFS and NFS. Users would store data on this NAS like they would any other file system. It would not replace current NAS systems, but instead would free these resources up to focus on the new workloads where their modern capabilities are more appropriate, databases and virtualized environments. The Hybrid NAS would be for unstructured data.

It is important that the Hybrid NAS system be able to both work with existing NAS systems and as standalone storage targeted specifically at unstructured data. There are times when even unstructured data can benefit from the performance of a premium NAS so the customer needs the flexibility to move data back and forth between these systems. If the Hybrid NAS presents the same CIFS/NFS interface as does the premium NAS then that data movement can be accomplished with a simple copy command.

At the same time, there is a much larger set of data that will never need to take advantage of the performance of a premium NAS. Data created by user productivity applications (MS Office) and data that’s designed to be streamed (MP3/MP4) are two examples. This data typically has a limited number of simultaneous users and is often accessed over a slow WiFi or broadband connection, both of which become a bottleneck to premium NAS performance.

Conclusion

Unstructured data comes in two different forms, with different storage requirements. There is a small percentage that needs high performance, potentially even flash based storage, and a much larger percentage that requires cost effective capacity. The premium NAS systems do an excellent job of meeting the performance demand of the first type, but there is room for improvement in the way the second type of unstructured data is stored. This data is better served by a new class of NAS, a Hybrid NAS that leverages simple scale-up disk and tape.

Sponsored by Crossroads

Click Here To Sign Up For Our Newsletter

George Crump is the Chief Marketing Officer of StorONE. Prior to StorONE, George spent almost 14 years as the founder and lead analyst at Storage Switzerland, which StorONE acquired in March of 2020. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Prior to founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration, and product selection.

Tagged with: , , , , , , , , ,
Posted in Article

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 21,783 other followers

Blog Stats
  • 1,829,094 views
%d bloggers like this: