Three Ways To Improve Software Defined Storage

Posted on February 11, 2014 by George Crump

Hyper-converged Storage may be the purest form of the software defined storage (SDS) concept, as it runs storage services as a virtual machine within the customer infrastructure. This new take on storage architectures collapses the traditional compute, storage and networking layers in the data center into a single layer. Hyper-converged architectures differ from converged architectures in that they allow the customer to leverage or purchase a mixture of compute and storage hardware. This is essentially true software defined storage as it frees the data center from any specific vendor hardware obligation because the storage intelligence has migrated to the software stack within the hypervisor.

But vendors are only scratching the surface of this technology’s potential. Vendors need to improve these true software defined storage solutions in three key areas; network optimization, storage performance and storage efficiency so that the full potential of this architectures can be realized.

What is Hyper Converged Storage?

In Hyper-Converged Storage (HCS) environments the SDS virtual machine (VM) is typically installed on every host. It then aggregates the internal storage within those host servers into a single pool and shares it with every application VM in the infrastructure. These HCS architectures should greatly reduce the cost associated with storage since they can leverage internal, server-class storage, which is much less costly than the storage used in enterprise storage arrays.

In addition to enterprise arrays, HCS should also provide a compelling alternative to traditional scale-out storage as well. Instead of scaling independently of the compute layer, HCS promises near perfect, linear scaling as the environment grows. When a host is added to support more applications or users, the SDS Software is installed on that host. Each time that happens the storage infrastructure can take advantage of more processing power and storage capacity.

The next generation of HCS solutions should use the three types of available storage. First, an in-memory tier, consisting of either DRAM or high performance flash, should be created to provide the data services discussed below. These data services will use that in-memory tier to manage and to optimize the storage while improving performance and reducing network load. Next, a second performance tier, consisting typically of standard flash, can be used to store the active and near active data sets. Finally, HCS solutions should create a capacity tier to store data that’s at rest until it becomes active again.

These tiers of available storage should enable a next generation HCS architecture to improve the three key areas described below to create a more scaleable and cost effective converged solution than today’s SDS products.

1. Better Network Optimization

Since HCS architectures use server-side storage they also need a server side network to aggregate and share that storage. This means the reduction in latency that ‘siloed’ server-side storage promises is not available in many of the current crop of HCS solutions. The networking of storage does simplify the use of HCS in virtualized and clustered environments and capabilities like live machine migration should work as well as they do with traditional, shared storage. Ideally, IT planners want the best of both worlds; high performance and the ease of use that a shared infrastructure brings.

The next generation HCS solution should optimize its use of this network in two ways so as to have less impact on overall performance. First, the HCS solution should only write the minimal amount of data possible to the shared pool. The first step in this process is to run compression and deduplication so that only unique data is sent across the server network to the shared pool.

While there’s typically a performance concern with deduplication and compression, remember that each HCS VM usually has its own CPU core for that process and only has to analyze data on its host against the shared pool. Most importantly, if the HCS solution can leverage in-memory storage the time it takes to compress data and analyze it for redundancy should have no noticeable impact on performance. This in-memory storage is typically DRAM allocated to the HCS in each host.

Another advantage to using in-memory storage is that it enables the HCS VM to perform write coalescing, an important compliment to compression and deduplication, without impacting performance or putting data at risk. Writes typically occur to the exact same data area over a period of time, meaning a block of information can change dozens of times within a few seconds. All that really needs to be stored is the last modification of that block.

Write coalescing eliminates the transmission and writing of net new data that would not be eliminated by deduplication because it’s unique. But that data is actively changing and there is no value in storing every iteration. All that really needs to be sent across the network and stored is the last version of that block which can be identified when the write activity settles. A side benefit of write coalescing is that it improves read performance from the storage media since writes are now better organized and less fragmented.

The net impact of compression, deduplication and write coalescing should be a 90% or greater reduction in the amount of data transmitted across the server-side network and a 90% better utilization of a flash tier of storage.

2. Improved Storage Performance

In-memory storage should also be leveraged by HCS solutions to improve overall performance. Presently some HCS solutions can support data tiering but that data has to be manually identified and placed on the specific type of storage. Some HCS solutions can work with server side caching to provide a read performance boost, but these capabilities are not integrated and are not “aware” of each other.

Instead, the next generation of HCS software should leverage in-memory storage to improve performance. All active data, both read and written, should be stored in the in-memory tier first. Because the first tier of in-memory storage is DRAM, it’s ideal for this highly transient data. Also, if the performance capability can be integrated into HCS it can leverage deduplication, compression and write coalescing to optimize the capacity of the DRAM storage area, essentially making the typically small DRAM area as much as 90% larger.

Of course DRAM is volatile, so the HCS solution should provide the flexibility to choose how much exposure data may have in the event of a server failure. Options should be given to use flash memory inside the server or highly reliable flash memory shared on a storage network. But no matter the type or location of flash, the HCS integration should lead to ideal storage performance.

3. Improved Storage Efficiency

The capacity tier of the HCS architecture is typically hard-drive based, so the optimization of data is often not considered worth the risk of performance impact. If the HCS solution can leverage the work done optimizing network and storage performance then carrying these optimizations through to the capacity tier should not impact performance.

This first means leveraging all the hard work done by deduplication, compression and write coalescing to optimize the memory storage tier and follow that through to a capacity tier. While the capacity tier is less expensive per GB, optimization of it still can greatly reduce cost.

The capacity tier essentially becomes the tier of storage that is accessed when data is not in the flash or memory tier of storage. It, thanks to the performance optimization improvements, becomes almost 100% reads as well. Once again the integration of these three capabilities bears fruit. Write coalescing down to the capacity tier allows for much better read performance when old data needs to be promoted to the performance tier.

Finally, the capacity tier should be any available tier of storage, not just the capacity drives within the participating server hosts. Ideally the next generation HCS solution should leverage the existing shared storage capacity that is likely already in place in the environment. Re-purposing this storage as the capacity tier and further optimizing its use via compression and deduplication should allow the existing storage resources to serve in this role for a long time to come. Imagine no longer having to buy storage for capacity purposes.

Conclusion

HCS architectures have enormous potential to show data centers what the true value of software defined storage is. But there are three important areas where these solutions need to improve in order to fully realize their potential. They need network optimization, provide better storage performance and improved storage efficiencies. Next-generation SDS solutions like Atlantis’s ILIO USX In-Memory Hyper Converged Storage are fulfilling the full promise of this architecture.

Atlantis Computing is a client of Storage Switzerland

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Architecture, Atlantis, Hyper-converged, ILIO, In-memory, Optimize, SDS, Software Defined Storage, Virtual machine
Posted in Article

7 comments on “Three Ways To Improve Software Defined Storage”

Three Ways To Improve Software Defined Storage | TwinStrata says:

February 13, 2014 at 4:08 pm

[…] Click here to read the whole article storageswiss.com […]
Three Ways To Improve Software Defined Storage | Storage CH Blog says:

February 17, 2014 at 12:27 am

[…] Read on here […]
John F. Kim says:

February 17, 2014 at 12:41 am

George, good blog. Agree with just about everything you said here. My one comment is that modern lossless networks (DCB Ethernet or InfiniBand) and RDMA allow each server to access the shared storage pool at effectively the same speed as it it were local storage. With today’s SSDs, the latency of a fast cluster network can be less than the latency of flash storage. Remote storage access times can be so close to local storage access times that most applications won’t notice the difference.

There are already some scale-out storage products high-speed lossless connections (typically with RDMA) for cluster network, and I think we’ll soon see more of the HCS (hyper-converged storage) products dong the same. EMC’s recently announced Scale-IO appliance (with SuperMicro and LSI) and the Virident FlashMax Connect software are two examples.
The Evolution Of Software Defined Storage | Storage Swiss - Storage Switzerland says:

February 21, 2014 at 9:42 am

[…] available to it in the hyper-converged model. As we discussed in the recent article “Three Ways To Improve Software Defined Storage” the first of these is RAM. Almost every virtual host in an architecture has some RAM […]
pepe says:

February 26, 2014 at 4:05 pm

nice , HCS sounds me “independend plug-in” for cust.
Chalk Talk Video – How To Improve Software Defined Storage | Storage Swiss - Storage Switzerland says:

March 18, 2014 at 2:55 pm

[…] Software Defined Storage is an emerging category of solutions that’s designed to decrease the cost of delivering performance and capacity. It does this by abstracting storage software services from the physical hardware. This allows a variety of storage hardware to be selected based on the needs of the application. In this video, George Crump, Chief Steward at Storage Switzerland, chalk-talks through how SDS started, how it is evolving and how it can be improved upon. For more details please see our related article “Three Ways To Improve Software Defined Storage“ […]
Server-Side, Converged Storage vs. Shared Storage | Storage Swiss - Storage Switzerland says:

April 11, 2014 at 11:18 am

[…] Also the decision is not exclusively an either or decision. Just like data centers today have multiple types of storage systems, they more than likely will have a mixture of server-side/converged and shared storage in the future. There are also server-side solutions that can aggregate shared storage resources into their offering as we discussed in our article “Three Ways To Improve Software Defined Storage”. […]

Comments are closed.