For most High Performance Computing (HPC) environments, high performance storage is a critical component in the infrastructure. The storage systems need to feed the compute infrastructure as quickly as possible. With object storage improving its performance and adding flash to its nodes, is it now suitable to do more for the HPC infrastructure than just be an archive for old data? This was a question posed on our recent webinar “Performance vs. Cost – Solving The HPC Storage Tug-of-War“.
HPC storage infrastructures are often file based, high performance NAS, frequently using parallel NFS architectures. Again the goal is to feed the compute architecture as fast as possible. But in addition to being compute intensive, HPC environments are also capacity intensive, having to store hundreds of terabytes or even dozens of petabytes of information. And, like traditional data centers, most of this data is not active. However, unlike traditional data centers when users need this HPC data, they need it rapidly.
As we discussed in a recent entry “High Performance Object vs. High Performance NAS“, flash based object storage will not out perform a high performance, parallel file system, especially if it also uses flash. But this performance reality does not mean that you should relegate object storage to the role of HPC archive exclusively. In fact it could be an ideal initial ingestion point for HPC data, which is then fed transparently to the high performance file system.
Object Storage as an HPC Data Lake
Several vendors have made a lot of noise about creating a data lake, which is a storage system that serves as the ingestion point for all data being captured by the organization. Data fed into this lake could be from Internet of Things (IoT) devices, log data from internal systems, dumps from databases, to name just a few. The lake would then feed all of the compute processes that want to access and analyze this data.
Object storage, assuming that the system has the right protocol support, could be an ideal data lake-type of solution for HPC. All the data the HPC environment wants to analyze goes to the data lake for initial storage, then that lake then feeds specific HPC processes that want that data. In some cases that may mean the object storage system moving data to a higher performance storage system that the HPC process would then run its analysis or simulations against.
To learn more about, the different roles that object storage can play in HPC environments watch our webinar, “Performance vs. Cost – Solving The HPC Storage Tug-of-War“, which is now available on-demand.