The Storage Problems That HCI Creates

Posted on May 31, 2018 by George Crump

Hyperconverged Infrastructures (HCI), as they scale, start to experience several storage challenges. HCI deals with any scaling requirement with a single move, adding a node. This additional node comes with CPU and Memory to meet the compute demand, flash storage to meet high performance IO demands and potentially it can require hard disk drives to meet capacity demands. The problem is that most requests for scale don’t require all three of these resources in equal parts at the same time. As a result, HCI clusters quickly go out of balance and end up with too much of some resources and not enough of others. This imbalance is most visible in the storage infrastructure as the customer ends up either with too much capacity or too much performance.

The Single Tier Problem

The storage component of HCI needs to deliver three basic services; performance, resiliency and data protection. Each of these is vastly different from the other and serving them from a single tier limits how well each service can be delivered.

Rule number 1 is protect all data. The responsibility for primary storage protection is to ensure that data is not lost in the event of a media failure, often accomplished by erasure coding or RAID. Additionally, a common expectation is for the storage architecture to provide some point-in-time and rapid rollback capabilities, often in the form of snapshots. In a single tier HCI design, either replication or erasure coding protects data across nodes. Both of these approaches consume processing power and capacity on other nodes.

Each node in the cluster is responsible for maintaining master copies of VM data making the performance and capacity effectiveness of the flash limited. This becomes quickly apparent when examining the resiliency factor (RF2 or RF3) model being used. RF2 is the typical architecture, meaning an environment is able to sustain 2 simultaneous failures before data is at risk – this equates to 2x storage capacity. RF3 is an industry best practice and recommended architecture. With HCI RF3 quickly becomes an immense waste of resources. Regardless, in the end with HCI storage all IO must traverse the IP network.

Storing data on hard disk drives, again typically across the nodes in the HCI cluster, addresses cost effective retention requirements. The problem is that the amount of capacity per node is limited because the node delivers performance via flash devices. Those flash devices need space in the nodes as well, limiting how much of each can be leveraged. Most HCI clusters pick an arbitrary balance of flash and hard disk capacity or increasingly they are flash-only forcing the customer to pay a premium to store rarely accessed data.

The Two-Tier Solution

A solution to the single tier HCI design is to implement a two-tier solution. One where flash media handles storage performance at the compute layer while a dedicated storage node loaded with either hard disk drives or flash media handles data protection and capacity requirements. In this design, the HCI’s storage software uses internal flash as a tier or cache for read data, writing new or modified data to both compute tier flash and the dedicated storage node. Virtual machines on a particular node never need to reach beyond that node to service read IO, maximizing performance by eliminating network hops.

The elimination of network hops also optimizes the more expensive flash storage in the nodes because it does not need to account for data protection. The data node handles all media failure protection as well as snapshot storage, which means that compute tier flash capacity is not wasted on data protection. The storage software optimizes network bandwidth by applying both data deduplication and compression to data before sending it to the data network. Given that most data centers are typically 80% reads and the reductions of new data by compression and deduplication, the amount of data actually traversing the network to get to the data node is relatively small.

StorageSwiss Take

Most demands for additional compute are combined with demands for increased storage performance. Putting the storage performance in the node with the compute makes sense. Most demands for capacity do not require additional performance. The capacity storage is used to simply store data just in case it is needed again. The two-tier HCI design modifies the original architecture just a bit to better match how storage resources are consumed. The result is an architecture which scales more logically and cost effectively.

Watch the on demand webinar “Considering Hyperconverged for Your Enterprise? Three Critical Questions to Ask” to learn more about the single and dual tier HCI designs as well as two other HCI areas of concern and how to address them.

Watch On Demand

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Cloud, Datrium, Docker, Flash, HCI, Hyper-converged, SaaS, SQL
Posted in Blog

The Storage Problems That HCI Creates

The Single Tier Problem

The Two-Tier Solution

StorageSwiss Take

Share this:

Related