Filers can start out so fast and end up so slow. How is that possible? A lot of companies buy a new filer and run a few performance tests against it, and they’re happy with what they see. The problem with that is that performance tests of a greenfield environment aren’t really valid. It’s how your filer performs once you fill it up with files that matters. Then once the performance goes down below the threshold that you deem unsatisfactory, how do you figure out what caused that problem and fix it?
Last month we did a webinar where my colleague George Crump discussed these questions with John Gentry from Virtual Instruments. It looks like there are a number of things that can create performance challenges for your filer, and they talked about each of them.
One thing is bottlenecks that come from metadata and the overhead of writing the metadata for each file. They also discussed the idea of an NFS or SMB client going rogue. A shared filer only works when everybody plays along. If every client uses just a little bit of the available throughput and IOPs of a filer, everything will be just fine. However, as soon as you have one client that uses a significant portion of the power of your filer, all bets are off. In a multitenant environment, a similar problem is the noisy neighbor issue. The question is how do you identify a rogue client or noisy neighbor?
Sometimes the performance problems that you perceive your filer to have actually don’t come from the filer itself. Sometimes they come from the VM and the hypervisor it is running on. If the VM is short on resources, then it can make it look like you have an NFS performance problem.
Of course, sometimes the problem is the filer itself. Perhaps the disk drives are not fast enough to keep up with the number of IOPs you’re generating. Perhaps the way you have laid out the volume is not the optimum method for performance. Perhaps the processor of the filer is busy doing things other than serving NFS and SMB requests, such as backups via NDMP. And finally, perhaps you are experiencing a bottleneck created by a single node of a cluster.
George and John discussed all of these issues at length, as well as how to determine which of these problems is causing the performance issue. If you missed the webinar, you can watch it on demand any time.