The Role of Object Storage in HPC Environments?

Posted on November 30, 2016 by George Crump

For most High Performance Computing (HPC) environments, high performance storage is a critical component in the infrastructure. The storage systems need to feed the compute infrastructure as quickly as possible. With object storage improving its performance and adding flash to its nodes, is it now suitable to do more for the HPC infrastructure than just be an archive for old data? This was a question posed on our recent webinar “Performance vs. Cost – Solving The HPC Storage Tug-of-War“.

HPC storage infrastructures are often file based, high performance NAS, frequently using parallel NFS architectures. Again the goal is to feed the compute architecture as fast as possible. But in addition to being compute intensive, HPC environments are also capacity intensive, having to store hundreds of terabytes or even dozens of petabytes of information. And, like traditional data centers, most of this data is not active. However, unlike traditional data centers when users need this HPC data, they need it rapidly.

As we discussed in a recent entry “High Performance Object vs. High Performance NAS“, flash based object storage will not out perform a high performance, parallel file system, especially if it also uses flash. But this performance reality does not mean that you should relegate object storage to the role of HPC archive exclusively. In fact it could be an ideal initial ingestion point for HPC data, which is then fed transparently to the high performance file system.

Object Storage as an HPC Data Lake

Several vendors have made a lot of noise about creating a data lake, which is a storage system that serves as the ingestion point for all data being captured by the organization. Data fed into this lake could be from Internet of Things (IoT) devices, log data from internal systems, dumps from databases, to name just a few. The lake would then feed all of the compute processes that want to access and analyze this data.

Object storage, assuming that the system has the right protocol support, could be an ideal data lake-type of solution for HPC. All the data the HPC environment wants to analyze goes to the data lake for initial storage, then that lake then feeds specific HPC processes that want that data. In some cases that may mean the object storage system moving data to a higher performance storage system that the HPC process would then run its analysis or simulations against.

To learn more about, the different roles that object storage can play in HPC environments watch our webinar, “Performance vs. Cost – Solving The HPC Storage Tug-of-War“, which is now available on-demand.

Watch On Demand

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Caringo, Data Lake, Flash, HPC, IoT, NAS, NFS, Object Storage, performance
Posted in Blog

The Role of Object Storage in HPC Environments?

Object Storage as an HPC Data Lake

Share this:

Related