All-Flash Arrays vs. Performance Management

Optimizing storage performance is almost an art. One of the earliest papers I wrote for Storage Switzerland was “Visualizing SSD Readiness“, which articulated how to determine if your application could benefit from implementing  solid state disk (SSD). It also discussed how to determine which files of your application should be put on the SSD. Remember that in 2009 no one could imagine putting an entire application on SSD, let alone an entire data center! Now though, thanks to all-flash arrays, we can. But does that mean we can abandon performance management as a discipline?

Will Performance Management Go the Way of Capacity Management?

For years we used to teach IT professionals the importance of, and how to manage their storage capacity. They had to, storage was just too expensive and technologically challenging to let it grow endlessly. Then, we went through an era where hard disk capacity had become inexpensive enough that cost containment was not as much of a motivating factor. Advancements in technology like scale-out storage and automated tiering addressed many of the management issues, since much of this data could be stored in a single repository, something that some vendors are now calling a “data lake”.

Performance management is facing the same challenge today. There are plenty of tools and appliances available that will test all aspects of your storage infrastructure, and even the applications within those infrastructures, to determine if the various components are running at optimal levels. But are these tools and appliances needed any more? After all, an all-flash array armed with a high performance 10Gbps or 16Gbps network should be able to deliver more performance than almost any application or user could demand. Why not just put everything on an all-flash array with a high performance network? In essence creating a “performance lake”. It seems once a data center makes that move, performance management and tuning almost becomes irrelevant.

The truth is that for some environments, especially smaller ones, performance management just like capacity management may indeed be optional, at least for a while. But any data center of size or any organization that intends to grow may want to keep those performance management tools available.

All-Flash Is Not An Infrastructure

The number one reason to keep using performance management tools and continue to create a skill set around performance tuning, is that an all-flash array is not an infrastructure, it is a component of an infrastructure. Even in a software-defined, server-side flash environment, the flash is simply a component of a more complex system. In both cases, there are network interconnections to worry about and they can dramatically impact storage performance.

In other words, even in an all-flash environment, an application may suffer a performance problem caused by an improperly configured network or malfunctioning network component. An increasingly common issue occurring in high speed storage networks, for example, is the impact of light loss in optical connections. The faster the network, the less tolerant it is to this optical degradation. As a result, a network that never exhibited any problems in the past may suddenly cause intermittent performance problems that, without analysis tools, can be almost impossible to identify and fix.

In addition, the storage software that makes all-flash functional, affordable and enterprise ready, requires computing power to drive it. Evidence of this can be seen when an all-flash array vendor releases a new array, driven by faster Intel processors, and performance takes a significant leap forward.

Understanding how much compute resource is being utilized and how much more the flash technology could benefit from additional compute power is vital to maximizing all-flash performance. This may be especially true, and harder to isolate, in software defined server side flash systems that aggregate flash storage internal to the nodes in a virtual cluster.

What happens when a performance demanding application executes a section of code that triggers a spike in compute and storage I/O demand that causes all the other VMs in the environment to grind to a halt? As we discuss in our article “How Do I Know My Virtual Environment is Ready For SSD?“, because of the level of abstraction, without a performance analysis tool it will be difficult to know what is happening, let alone know how to fix the problem and make sure it does not happen again.

Applications Can Make Flash Look Bad

Another part of the infrastructure is a layer deeper than the virtual machine: the application itself. As we discuss in our article “How Do I Know My SQL Server Environment is Ready for SSD?“, poor application code is very hard to identify, especially for the storage team since they are not database experts. In fact, poor application code can potentially waste an  investment in flash or at least not allow it  to  reach its full potential. Therefore, it is critical that the storage team be armed with tools that not only can monitor and manage performance within the parts of the infrastructure that they control, but also  extend into others that they don’t – like database environments.

Not All All-Flash Arrays Are Created Equal

Finally, as we discussed in our recent ChalkTalk Video, “Perfecting The Flash SSD Evaluation Process“, not all flash arrays are created equal, some sacrifice performance for features and others, features for performance. Finding the balance that best fits your data center is critical to making the right flash selection long term. The problem is that the basic benchmarking tools that IT planners have counted on are hopelessly flawed. They were designed in a non-virtualized, hard disk based era. We cover the weaknesses in traditional tools in our ChalkTalk video “Performance Management is Broken“.

Conclusion

IT used to be judged on its ability to deliver application uptime and in large part, it still is today. The challenge is the definition of “uptime” has been expanded to include acceptable application performance, no matter what the external conditions may be. This means that performance not only needs to be managed, it needs to be analyzed so that future performance pressures can be predicted.

Because of the increasing number of moving parts in the data center combined with the abstraction that virtualization brings, simply throwing high performance flash at the problem is not going to be enough to meet this new definition of uptime. Performance management, tuning and forecasting has to become the bedrock of any data center skill set.

Click Here To Sign Up For Our Newsletter

Twelve years ago George Crump founded Storage Switzerland with one simple goal; to educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought after public speaker. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , , , , , ,
Posted in Blog
2 comments on “All-Flash Arrays vs. Performance Management
  1. Vikram says:

    George, very well written blog. I agree with your comments on how Performance management, tuning and forecasting has to become the bedrock of any data center skill set.

    Virtual Instruments solutions are helping enterprises leverage I/O performance management and make accurate, confident decisions based on data that nobody else has – the data that matters most. Virtual Instruments has delivered infrastructure performance optimization across hundreds of SAN environments, including many that have or are in the process of adopting flash storage.

    http://virtualinstruments.com/blog/flash-readiness-requires-wisdom-and-insight/

  2. Jim Bahn says:

    George … you’re spot on as usual. And to support your point, I’ll use one small data point from a Load DynamiX customer who used our performance validation appliance to do a bake-off of competing Flash storage vendors. When deduplication was factored in, one test resulted in IOPS of approx 150K for one vendor and approx 250K for the other. But that delta completely disappeared and almost reversed as the % of deduplicable data climbed closer to 100%. Really, you have to test using workloads that resemble your production applications. That test enabled the customer to select the solution that offered better performance for THEIR workload.

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,209 other followers

Blog Stats
  • 1,529,136 views
%d bloggers like this: