Scale out NAS has proven the old adage that in IT we never solve any problems; we just move them around. Scale out systems were built for a lot of reasons, and one of the chief reasons was performance; a single NAS system could only store so many files in a certain amount of time. It could also only hold so many disks, and that limited number of disks had a limited number of I/O operations they could perform. Expanding from one node to many nodes was meant to solve both these problems, but in many cases it just moved the performance problem to other areas of the architecture.
The dream is a global filesystem that can grow as large as we need to grow and can operate as fast as we need to operate by simply buying additional nodes or disks. The dream also includes putting the right data on the right media at the right time. Frequently accessed files, or files that need high performance end up on flash. Less frequently accessed files end up on magnetic disk of various types. Infrequently accessed or rarely accessed files can be moved to somewhere very inexpensive, be it cold storage in an object store, the cloud or tape storage. Object storage, the cloud and tape storage can also be used as a data protection layer for the higher performance tiers by ensuring that all versions of all files we want to protect are stored multiple times in multiple places.
The reality from many vendors is quite different, starting with the fact that most storage products are built with two tiers at most. Many times they are built with only one tier, using flash only as a cache or flash-only storage. Most scale out storage products ignore the possibilities of a cloud tier or tape tier. But the economics and long-term storage capabilities of both systems make them quite compelling for those needing to keep data for a long time – especially if that data is rarely accessed. Scale out NAS 2.0 should incorporate all known tiers of storage: flash, magnetic disk, and an object storage, cloud or tape tier.
The other way in which the reality of global filesystems and scale out NAS are different than the dream is that there appears to be limitations in some of the filesystems that cause there to be a mismatch between the number of processing cores and the number of disk drives. In order to get more performance you must add more nodes, even though the current nodes could easily handle more disk capacity. You might think that this is a limitation more of the CPU than of the filesystem, but evidence suggests differently. There are global filesystem storage vendors that have much more disk capacity per node than their competitors, therefore requiring far fewer nodes for the same performance. That would suggest the limitation is with the software, not the hardware. Modern software has to be multi-threaded so it can correctly take advantage of all the available cores.
Another thing to think about is what a given scale out NAS system is optimized for. Some systems are optimized for large numbers of smaller files, where other systems are optimized for smaller numbers of large files. The former is interesting to someone who is going to use the scale out NAS product to have tens of thousands of users store their word processing documents on the system. The latter is more interesting to someone who is going to have hundreds of users downloading, editing, and transcribing video streams.
You should therefore understand the workloads that your company is planning on performing and make sure that the NAS system that you are looking at is capable of handling the number of simultaneous workloads that you plan to have. Again, different global filesystems are designed to handle different kinds of workloads. So if you are considering the purchase of a large scale out NAS system, it would behoove you to talk to multiple vendors about your potential workloads.
Another thing to look into is whether your global filesystem truly is a global filesystem, or is it just an aggregation of smaller filesystems. This is important when you start to have different members of your organization collaborating on large sets of data. If two different users are technically storing their data on two separate filesystems that are being aggregated into a centralized filesystem, the performance that they experience might be different depending on how they are accessing their data. However, if it truly is a scale out global filesystem, then everyone’s performance should be the same regardless of where their data is.
Know your workloads. Understand that not all scale out NAS systems are built the same, and make sure you discuss your potential workloads with any vendors you are considering to buy product from. Ask them how you need to architect their product in order to meet that workload, including number of nodes and number of disks. Also, depending on your workload, do not ignore less expensive tiers of storage such as tape or cloud. If you have the type of workload where you have a large working set of data and a much larger set of inactive data, the use of such tiers may save you significant amounts of money.
Sponsored by Quantum