As we discussed in our recent webinar, “Is Convergence Right for You?“, converged or hyperconverged architectures can come in various forms, ranging from a bundle of multiple vendor’s hardware and software, to a software only – bring your own server solution, to a turnkey purpose built converged architecture. The form that your converged solution comes in will be a key decision point in the selection process. But another decision point should be how that solution manages data placement and data protection.
Most hyperconverged architectures leverage internal server storage to simplify deployment and to keep costs down. But in a virtualized environment, some form of shared access to the virtual machine (VM) data is required, both to enable migration and to eliminate the single point of failure that storing all your data on one server could present.
The first method that some hyperconverged architectures use is a shared nothing type of design, where all data is stored on a single node and does not require any other nodes in order for the virtual machine to execute the application. To prevent a single point of failure they then replicate the data from each VM to another node or nodes in the infrastructure. If the primary server or a drive within that server fails, the VM can be restarted on one of the other nodes that was receiving its data.
There are some pros and cons to this method. On the plus side all reads now come from the primary server without having to deal with the overhead of the network. Also there is no parity calculation or degraded state on a failure. The data is intact on both the primary server and any target node.
On the downside, two or three replicated copies of data is going to consume much more storage space, somewhere between 2X and 3X. When a VM is migrated it can only be migrated to one of its target nodes, unless the converged software has a re-routing capability where a host server can access another host’s storage. But that feature is rare and adds to the complexity. Finally, if there is a complete failure of one of the nodes, the time and network involved to make a complete replacement copy can be significant because complete copies of the data on the failed node need to be sent from one single node to another new node.
The second method that some hyperconverged systems use, still leverage the internal storage on each node but then aggregates that storage into one or multiple virtual pools of storage. A VMs data is scattered across all the nodes in the cluster and parity information is generated to protect from a drive failure. This method is much more space efficient, similar to how RAID 5 is more efficient than mirroring. Also a VM can be migrated to any node in the cluster since the virtual volume is shared across all the nodes in that cluster.
There is a downside that all writes AND reads now require network activity but that activity can be reduced via caching of data within each primary server. When data is being written it is writing much smaller blocks to many more nodes, so the performance impact should be minimal, if noticeable at all.
StorageSwiss Take – Which is Better?
The answer, as always, is it depends. Since capacity or the lack thereof is often a concern for IT planners the storage efficiency of aggregated access is hard to beat, especially if the data center is going to require three or more compute nodes. The replication method does have an advantage of rapid read access and certain solutions will allow you to configure two node environments, which could be ideal for small businesses and remote offices.
Data placement is not the only part of convergence to understand. To get the complete picture attend our on-demand webinar “Is Convergence Right for You?“.