Optimizing storage performance is almost an art. One of the earliest papers I wrote for Storage Switzerland was “Visualizing SSD Readiness“, which articulated how to determine if your application could benefit from implementing solid state disk (SSD). It also discussed how to determine which files of your application should be put on the SSD. Remember that in 2009 no one could imagine putting an entire application on SSD, let alone an entire data center! Now though, thanks to all-flash arrays, we can. But does that mean we can abandon performance management as a discipline?
Will Performance Management Go the Way of Capacity Management?
For years we used to teach IT professionals the importance of, and how to manage their storage capacity. They had to, storage was just too expensive and technologically challenging to let it grow endlessly. Then, we went through an era where hard disk capacity had become inexpensive enough that cost containment was not as much of a motivating factor. Advancements in technology like scale-out storage and automated tiering addressed many of the management issues, since much of this data could be stored in a single repository, something that some vendors are now calling a “data lake”.
Performance management is facing the same challenge today. There are plenty of tools and appliances available that will test all aspects of your storage infrastructure, and even the applications within those infrastructures, to determine if the various components are running at optimal levels. But are these tools and appliances needed any more? After all, an all-flash array armed with a high performance 10Gbps or 16Gbps network should be able to deliver more performance than almost any application or user could demand. Why not just put everything on an all-flash array with a high performance network? In essence creating a “performance lake”. It seems once a data center makes that move, performance management and tuning almost becomes irrelevant.
The truth is that for some environments, especially smaller ones, performance management just like capacity management may indeed be optional, at least for a while. But any data center of size or any organization that intends to grow may want to keep those performance management tools available.
All-Flash Is Not An Infrastructure
The number one reason to keep using performance management tools and continue to create a skill set around performance tuning, is that an all-flash array is not an infrastructure, it is a component of an infrastructure. Even in a software-defined, server-side flash environment, the flash is simply a component of a more complex system. In both cases, there are network interconnections to worry about and they can dramatically impact storage performance.
In other words, even in an all-flash environment, an application may suffer a performance problem caused by an improperly configured network or malfunctioning network component. An increasingly common issue occurring in high speed storage networks, for example, is the impact of light loss in optical connections. The faster the network, the less tolerant it is to this optical degradation. As a result, a network that never exhibited any problems in the past may suddenly cause intermittent performance problems that, without analysis tools, can be almost impossible to identify and fix.
In addition, the storage software that makes all-flash functional, affordable and enterprise ready, requires computing power to drive it. Evidence of this can be seen when an all-flash array vendor releases a new array, driven by faster Intel processors, and performance takes a significant leap forward.
Understanding how much compute resource is being utilized and how much more the flash technology could benefit from additional compute power is vital to maximizing all-flash performance. This may be especially true, and harder to isolate, in software defined server side flash systems that aggregate flash storage internal to the nodes in a virtual cluster.
What happens when a performance demanding application executes a section of code that triggers a spike in compute and storage I/O demand that causes all the other VMs in the environment to grind to a halt? As we discuss in our article “How Do I Know My Virtual Environment is Ready For SSD?“, because of the level of abstraction, without a performance analysis tool it will be difficult to know what is happening, let alone know how to fix the problem and make sure it does not happen again.
Applications Can Make Flash Look Bad
Another part of the infrastructure is a layer deeper than the virtual machine: the application itself. As we discuss in our article “How Do I Know My SQL Server Environment is Ready for SSD?“, poor application code is very hard to identify, especially for the storage team since they are not database experts. In fact, poor application code can potentially waste an investment in flash or at least not allow it to reach its full potential. Therefore, it is critical that the storage team be armed with tools that not only can monitor and manage performance within the parts of the infrastructure that they control, but also extend into others that they don’t – like database environments.
Not All All-Flash Arrays Are Created Equal
Finally, as we discussed in our recent ChalkTalk Video, “Perfecting The Flash SSD Evaluation Process“, not all flash arrays are created equal, some sacrifice performance for features and others, features for performance. Finding the balance that best fits your data center is critical to making the right flash selection long term. The problem is that the basic benchmarking tools that IT planners have counted on are hopelessly flawed. They were designed in a non-virtualized, hard disk based era. We cover the weaknesses in traditional tools in our ChalkTalk video “Performance Management is Broken“.
IT used to be judged on its ability to deliver application uptime and in large part, it still is today. The challenge is the definition of “uptime” has been expanded to include acceptable application performance, no matter what the external conditions may be. This means that performance not only needs to be managed, it needs to be analyzed so that future performance pressures can be predicted.
Because of the increasing number of moving parts in the data center combined with the abstraction that virtualization brings, simply throwing high performance flash at the problem is not going to be enough to meet this new definition of uptime. Performance management, tuning and forecasting has to become the bedrock of any data center skill set.