Overcoming The Risk of Mixing Storage Workloads

Posted on February 20, 2014 by George Crump

Silos of storage within the storage environment are increasing at an alarming rate with each being dedicated to a specific task or workload. Why? The answer is simple: risk mitigation. The data center can’t afford to have applications experience unpredicted drops in and inconsistent performance. The simplest way to assure this predictability is to dedicate a storage system to each type of workload. The problem with this, of course, is that dedicating storage systems to workloads is expensive, both in financial and human resources.

Workload Proliferation

One of the reasons that storage silos are on the rise is that the number and variety of workloads is increasing. Most data centers have critical applications that require high storage transaction rates and highly random I/O responses; but also have traditional home directory and file sharing demands. Now, they’re also generating rich media content for customer outreach, internal training and video surveillance data, which tends to be large in size and sequentially read. And, an increasing number of organizations need to capture sensor data, from the “Internet of Things”, which tends to be billions of small files that are also sequentially read.

The challenge of workload proliferation is being compounded by virtualization initiatives. Server virtualization takes all of the above workloads and potentially places them on the same physical host. Now a single host can transmit requests from small random I/O, large sequential I/O or small sequential I/O, all at the same time from the same network connection.

Adding to this ‘workload mashup’ are desktop virtualization initiatives. While managing desktops is not new to the IT organization, hosting all their data on data center storage is a new initiative. Their I/O patterns are truly unique, ranging from high intensity read I/O at morning login, to moderate write I/O throughout the day and then high intensity write I/O when users logout. In addition to software updates and virus scans, this pattern creates an I/O problem that the data center has not seen before.

The Risks of Workload Consolidation

From a financial and human investment perspective the ideal solution would be to put all this data onto one storage array. The challenge with doing this is making sure these applications get the storage performance they need, when they need it.

But there are two primary roadblocks when trying to assure this application performance. The first is that most storage systems were not designed for all these different types of data. The high transaction, random I/O applications need high performance and low latency, often provided by flash storage today. But flash needs to be used intelligently to ensure ROI, as it can be expensive to use for storing data that does not need high performance. Rich media content, for example, is typically streamed and users’ are most often connected to their home directories via a much slower network like WiFi.

The second concern is the ‘noisy neighbor’ issue that Storage Switzerland documented in a recent article “What Is A Noisy Neighbor?”. This is the term used to describe what happens when a single application has a peak in I/O demand that consumes the available storage resources. Essentially, all the other applications are “starved” out, often leading to user dissatisfaction and customer abandonment.

Working Around The Risks of Workload Consolidation

The risks of workload consolidation eventually lead to the creation of storage silos. Most data centers start off with a single storage system, and as they scale and workloads are added attempts are made to make the single storage system work. A typical first step is creating dedicated LUNs for certain applications. While effective at isolating disk spindles, it does not isolate the workload from the storage controller, or the storage network interfaces from other workloads sharing the same array. The noisy neighbor could still consume those resources and degrade overall system performance.

Another solution could be server-side SSD with caching. This would offload the read workload from the array, assuming the data is in cache. But the problem is that some of these workloads are not cache friendly and others don’t need to be in cache to meet the performance demands of their applications. Also, a server side cache does not assure that the noisy neighbor won’t consume all the resources of the cache as well, again putting all the other workloads at risk.

A key challenge to server-side caching is a lack of end-to-end understanding of current I/O resource utilization. Some caching software can’t make intelligent, per-workload decisions, it can only be assigned to a single workload or endure the risk of being shared by multiple workloads.

The net result is again, silos of storage. It’s not uncommon for a large data center to have a storage system dedicated to the virtual desktop infrastructure (VDI) and several dedicated to the virtualized server infrastructure, depending on the performance demands of specific VMs. There are also typically systems dedicated to stand-alone, non-virtualized database applications and, because of their mission critical nature, these systems can’t risk sharing compute or storage resources. Then finally there is storage for user home directories, rich media, archive and backup, where cost per GB is far more important than performance.

Overcoming Workload Consolidation Risks

This lack of consolidation reduces risk but increases cost. The multi-system storage infrastructure described in the previous paragraph is going to be expensive to purchase and time consuming to manage, probably requiring multiple storage administrators. As a result IT planners are on a constant quest to find a single storage system that will meet all their workload needs, or at least most of them.

The key to overcoming the risks associated with workload consolidation is to leverage all the available technology like high performance storage controllers, low latency flash storage and cost effective hard drive storage – and then to couple those technologies with intelligent allocation of resources by performance Quality of Service (QoS).

What is Performance Quality of Service (QoS) for Storage?

Performance QoS, as a concept, is similar to storage or server virtualization. The performance resources like storage controller, compute, network bandwidth and IOPS, are pooled together and then allocated to workloads as needed. This allows the overall performance of the system to be shared among applications, but also ensures that certain applications have the performance they require when they experience a peak demand.

Companies like Fusion-io with their ioControl Hybrid Storage solution provide this performance quality of service that leverages the performance and economics of PCIe flash and spinning disk. ioControl’s storage software is designed to deliver storage QoS,which guarantees performance to each application and isolates workloads from one another. It accomplishes this by assigning performance policies to each volume which defines the performance minimum a volume gets in terms of IOPS, throughput and latency. This QoS intelligence can make sure that certain workloads only use flash when needed and that their data is evicted from flash first when a more mission critical workload begins to peak. It also makes sure that those mission critical workloads leverage server-side flash when it makes the most sense.

Conclusion

Silos of storage are expensive but the risks associated with unpredictably poor application performance can be a much bigger problem. As a result, most data centers allow the unfortunate accumulation of storage silos. Now however, there is no longer a need for storage silos thanks to the capabilities of new storage architectures that leverage PCIe flash, server side flash, intelligent real-time QoS engines and advances in storage networking technology,. With flash storage and QoS, vendors have the ability to create a single workload system by giving an end-to-end view of consumption and for performance to be provisioned out in an intelligent fashion.

Fusion-io is a client of Storage Switzerland

Click Here To Sign Up For Our Newsletter

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Caching, Fusion-io, I/O, Internet of Things, IOPS, Migration, performance, QOS, Silos, SSD, Virtual machine, Virtualization
Posted in Article

4 comments on “Overcoming The Risk of Mixing Storage Workloads”

Overcoming The Risk of Mixing Storage Workloads | Storage CH Blog says:

February 21, 2014 at 12:42 am

[…] Read on here […]
Overcoming The Risk of Mixing Storage Workloads | TwinStrata says:

February 21, 2014 at 8:50 am

[…] Click here to read the whole article storageswiss.com […]
Former Storage Architect and Administrator says:

February 23, 2014 at 6:58 pm

George another good article. However, I think that most companies are not taking advantage of Infrastructure Performance Management (IPM) platforms to first understand all the environmental workloads and second to manage the various workloads on the SAN’s in the various silos. Taking this step to manage the Storage/SAN environment can help almost all IT shops improve performance by monitoring, report and alerting on the complete Initiator, Target relationship to the LUN level the. system administrators can proactively prevent issues you described. Misconfiguration issues create these performance issues of other more critical applications and workloads. This is often a combination of issue with in the SAN Infrastructure and not just a storage controller issues. Which is why I would encourage many of your other readers to implement a IPM solution so they have visibility in to the complete environment, an example of a common issue is a slow drain device across an ISL, which impacts all the other applications communicating across that ISL. Now ultimately this is a design issue of the SAN environment in which the storage administrator or solution architect did not properly understand or account for the configurations changes necessary to avoid these issues. Many possible tuning measures can be taken on any major storage array to minimize the need to have silos such as controller partitioning, cache partitioning, disk pooling, and tiered disk pools. Not a to mention the most overlooked server configuration issue is queue depths are a more often than not managed properly which can have dramatic effect on the application performance and greatly mitigate performance issues, that make the silos a necessary evil. Other configurations can be performed on the switch or core directors to effect performance improvements of the SAN, which help to improve overall performance. Others are issue of proper disk allocation, overloaded storage port and unmanaged queue depth can reduce the risk and mitigate the need for silos if IPM was introduced and use to manage the storage environment to reduce and eliminate performance issues.
Storage QoS – A New Requirement for Shared Storage | Storage Swiss - Storage Switzerland says:

March 13, 2014 at 9:34 am

[…] virtual server will receive the performance it’s expecting, consistently. What’s needed is a quality of service (QoS) function within the shared storage […]

Comments are closed.