Implementing a Hyperconverged Infrastructure (HCI) seems easy, buy the basic three-node configuration, plug it into the network and start creating virtual machines. To a large extent, the initial setup and first days of use are just that, easy. The simple start-up experience makes HCI evaluations go very smoothly. The problem is this simplicity tempts the IT professional not to be as careful as they might be with the design of the architecture. One area often overlooked is the data protection scheme used and how it impacts the environment.
Understanding HCI Data Protection Techniques
Data protection in the HCI context has two responsibilities. The first, of course, is to protect data. The second is to make sure data is available to the other nodes, so if a virtual machine (VM) migrates to that host, the new node can access that VM’s data. HCI typically offers two types of data protection, replication or erasure coding.
When an HCI environment uses replication as its primary form of data protection, it makes a full copy of the VM on two or more physical hosts. The typical setting is to make two to three copies of a VM, but even four is not uncommon for more critical applications. As data within a VM changes or it adds new data, those changes are replicated in near-real time to the other nodes in the cluster. While replication may seem wasteful requiring 3 to 4X the capacity requirements it does have some advantages. First, the VM’s data is intact on the host it is running on, which means there is no network activity for reads. Second, replication is a significantly less processor intensive protection process. Since the CPU in an HCI environment drives everything then taking some load off those CPUs leads to better consistency. The problem is, of course, storage capacity consumption is a big negative.
An HCI environment using Erasure Coding is more storage capacity efficient. Erasure coding is somewhat similar to RAID 5, in that it is a parity-based protection algorithm that distributes data and generates a parity to enable rebuilds. The main difference is, while RAID 5 distributes data across disk drives in a self-contained storage shelf, erasure coding distributes data across nodes in the HCI cluster.
Similar to how RAID 5 is more capacity efficient than disk mirroring, erasure coding is more efficient than replication. The problem is erasure coding is much more CPU intensive and it requires much more network activity since all nodes are involved in every read and write IO. Additionally, the network is typically the same network used for all cluster communications and VM migrations. In other words, with an erasure coding protection scheme, the network interconnecting all the nodes becomes very busy and very critical to maintain.
An Alternative – Enterprise HCI
A solution to the data protection problem may be to borrow some ideas from the traditional dedicated architectures of the past. A traditional architecture with a dedicated compute layer and a dedicated storage layer meant that for storage traffic, each node had direct access, via a dedicated network, to the storage system. That storage system also had dedicated processors that managed the system and provided data services. The problem was that it was a dedicated architecture requiring all data to move across the network.
An Enterprise-HCI solution leverages internal storage, typically flash, to store the most active portion of a virtual machine’s data. It would then store a copy of this data, plus older data, on a centralized storage system. The centralized storage system handles all of the data protection responsibilities. With this design in place, most of a VM’s read activity would be from flash drives located on the node it is on, eliminating network traffic and better optimizing flash performance.
The internal flash capacity is not wasted with data protection responsibilities, which are handled by the shared storage appliance. The only network IO is writes to the VM, which are sent to both the internal flash in the node and to the centralized storage system. Depending on the workload, the design can decrease network traffic by 50% or more.
When a VM migration is needed the VM moves to a new node, accessing its data from the shared storage appliance until a local copy of the most active data is built.
StorageSwiss Take
Data protection is just one design consideration to take into account. In our on demand webinar, “Considering Hyperconverged for Your Enterprise? Three Key Questions to Ask“, we will detail these considerations:
- Will the HCI scale to meet the organization’s performance demands?
- What complexities will HCI introduce into the environment?
- Will HCI extend your on-premises data center to the cloud?
Register now and get a copy of our white paper “The Requirements for Enterprise Hyperconverged Infrastructure“.