Hyper-converged Storage may be the purest form of the software defined storage (SDS) concept, as it runs storage services as a virtual machine within the customer infrastructure. This new take on storage architectures collapses the traditional compute, storage and networking layers in the data center into a single layer. Hyper-converged architectures differ from converged architectures in that they allow the customer to leverage or purchase a mixture of compute and storage hardware. This is essentially true software defined storage as it frees the data center from any specific vendor hardware obligation because the storage intelligence has migrated to the software stack within the hypervisor.
But vendors are only scratching the surface of this technology’s potential. Vendors need to improve these true software defined storage solutions in three key areas; network optimization, storage performance and storage efficiency so that the full potential of this architectures can be realized.
What is Hyper Converged Storage?
In Hyper-Converged Storage (HCS) environments the SDS virtual machine (VM) is typically installed on every host. It then aggregates the internal storage within those host servers into a single pool and shares it with every application VM in the infrastructure. These HCS architectures should greatly reduce the cost associated with storage since they can leverage internal, server-class storage, which is much less costly than the storage used in enterprise storage arrays.
In addition to enterprise arrays, HCS should also provide a compelling alternative to traditional scale-out storage as well. Instead of scaling independently of the compute layer, HCS promises near perfect, linear scaling as the environment grows. When a host is added to support more applications or users, the SDS Software is installed on that host. Each time that happens the storage infrastructure can take advantage of more processing power and storage capacity.
The next generation of HCS solutions should use the three types of available storage. First, an in-memory tier, consisting of either DRAM or high performance flash, should be created to provide the data services discussed below. These data services will use that in-memory tier to manage and to optimize the storage while improving performance and reducing network load. Next, a second performance tier, consisting typically of standard flash, can be used to store the active and near active data sets. Finally, HCS solutions should create a capacity tier to store data that’s at rest until it becomes active again.
These tiers of available storage should enable a next generation HCS architecture to improve the three key areas described below to create a more scaleable and cost effective converged solution than today’s SDS products.
1. Better Network Optimization
Since HCS architectures use server-side storage they also need a server side network to aggregate and share that storage. This means the reduction in latency that ‘siloed’ server-side storage promises is not available in many of the current crop of HCS solutions. The networking of storage does simplify the use of HCS in virtualized and clustered environments and capabilities like live machine migration should work as well as they do with traditional, shared storage. Ideally, IT planners want the best of both worlds; high performance and the ease of use that a shared infrastructure brings.
The next generation HCS solution should optimize its use of this network in two ways so as to have less impact on overall performance. First, the HCS solution should only write the minimal amount of data possible to the shared pool. The first step in this process is to run compression and deduplication so that only unique data is sent across the server network to the shared pool.
While there’s typically a performance concern with deduplication and compression, remember that each HCS VM usually has its own CPU core for that process and only has to analyze data on its host against the shared pool. Most importantly, if the HCS solution can leverage in-memory storage the time it takes to compress data and analyze it for redundancy should have no noticeable impact on performance. This in-memory storage is typically DRAM allocated to the HCS in each host.
Another advantage to using in-memory storage is that it enables the HCS VM to perform write coalescing, an important compliment to compression and deduplication, without impacting performance or putting data at risk. Writes typically occur to the exact same data area over a period of time, meaning a block of information can change dozens of times within a few seconds. All that really needs to be stored is the last modification of that block.
Write coalescing eliminates the transmission and writing of net new data that would not be eliminated by deduplication because it’s unique. But that data is actively changing and there is no value in storing every iteration. All that really needs to be sent across the network and stored is the last version of that block which can be identified when the write activity settles. A side benefit of write coalescing is that it improves read performance from the storage media since writes are now better organized and less fragmented.
The net impact of compression, deduplication and write coalescing should be a 90% or greater reduction in the amount of data transmitted across the server-side network and a 90% better utilization of a flash tier of storage.
2. Improved Storage Performance
In-memory storage should also be leveraged by HCS solutions to improve overall performance. Presently some HCS solutions can support data tiering but that data has to be manually identified and placed on the specific type of storage. Some HCS solutions can work with server side caching to provide a read performance boost, but these capabilities are not integrated and are not “aware” of each other.
Instead, the next generation of HCS software should leverage in-memory storage to improve performance. All active data, both read and written, should be stored in the in-memory tier first. Because the first tier of in-memory storage is DRAM, it’s ideal for this highly transient data. Also, if the performance capability can be integrated into HCS it can leverage deduplication, compression and write coalescing to optimize the capacity of the DRAM storage area, essentially making the typically small DRAM area as much as 90% larger.
Of course DRAM is volatile, so the HCS solution should provide the flexibility to choose how much exposure data may have in the event of a server failure. Options should be given to use flash memory inside the server or highly reliable flash memory shared on a storage network. But no matter the type or location of flash, the HCS integration should lead to ideal storage performance.
3. Improved Storage Efficiency
The capacity tier of the HCS architecture is typically hard-drive based, so the optimization of data is often not considered worth the risk of performance impact. If the HCS solution can leverage the work done optimizing network and storage performance then carrying these optimizations through to the capacity tier should not impact performance.
This first means leveraging all the hard work done by deduplication, compression and write coalescing to optimize the memory storage tier and follow that through to a capacity tier. While the capacity tier is less expensive per GB, optimization of it still can greatly reduce cost.
The capacity tier essentially becomes the tier of storage that is accessed when data is not in the flash or memory tier of storage. It, thanks to the performance optimization improvements, becomes almost 100% reads as well. Once again the integration of these three capabilities bears fruit. Write coalescing down to the capacity tier allows for much better read performance when old data needs to be promoted to the performance tier.
Finally, the capacity tier should be any available tier of storage, not just the capacity drives within the participating server hosts. Ideally the next generation HCS solution should leverage the existing shared storage capacity that is likely already in place in the environment. Re-purposing this storage as the capacity tier and further optimizing its use via compression and deduplication should allow the existing storage resources to serve in this role for a long time to come. Imagine no longer having to buy storage for capacity purposes.
HCS architectures have enormous potential to show data centers what the true value of software defined storage is. But there are three important areas where these solutions need to improve in order to fully realize their potential. They need network optimization, provide better storage performance and improved storage efficiencies. Next-generation SDS solutions like Atlantis’s ILIO USX In-Memory Hyper Converged Storage are fulfilling the full promise of this architecture.
Atlantis Computing is a client of Storage Switzerland