Commercial HPC makes products better, makes customers happier and improves organizational processes. The infrastructures that support commercial HPC need to accomplish those objectives transparently. The commercial HPC storage system, being one component of those systems, needs to remove the mundane tasks often associated with traditional HPC storage systems.
Transparent Data Placement
Traditional storage systems typically manage data placement to lower cost, but commercial HPC systems place data to improve access to it. The key file variable with which HPC systems concern themselves is file or data size, not access dates. Commercial HPC data can reactivate at almost any time so HPC needs to store data for easy accessibility regardless of access dates.
Commercial HPC file systems deal with content (files, data or objects) of different sizes. The commercial HPC system needs to place that content logically, based on their size. A key concern is how the HPC storage system deals with the metadata of that content as well as the accesses to that metadata. Certain commercial HPC applications can generate a high number of metadata accesses to generate results. Storing metadata on flash media allows rapid access to the corresponding content which speeds time to results.
Organizations’ commercial HPC sequential and metadata intensive workloads are routinely run on the same commercial HPC storage. The streaming performance of a hard disk system, tuned for sequential access, usually provides adequate performance for those workloads. Flash is often unnecessary and of course expensive. For these sequential file workloads, equipping the HPC storage system with hard disks enables the organization to keep costs down while meeting performance expectations.
The IT team’s time is too limited to make sure that the appropriate data types are stored on the media to which they are best suited. Additionally, how does IT deal with a workload that requires both streaming and metadata intensive operations? The manual monitoring required cripples IT’s ability to properly manage the process.
Automatic Protocol Management
Many commercial HPC workloads use a parallel file protocol so that they can directly interact with the storage system components storing the data that the workloads needs. It is well worth it for the organization to fine tune the applications to take advantage of a parallel protocol. The problem is the systems feeding the HPC storage system or the applications that need only occasional access may not support the parallel protocol, especially in the commercial market.
The commercial HPC storage system needs to interact seamlessly with both parallel and traditional file protocols like SMB and NFS without having to setup separate volumes for each. Seamless protocol access enables the commercial HPC system to ingest data from IoT devices or log files from legacy systems via NFS or SMB while enabling simultaneous analysis from modern applications via the parallel protocol.
Automatic Operation
The commercial HPC system needs to appear, to IT, like any other storage system. Turnkey implementation, especially in the commercial space, means that IT doesn’t have to separately source software and hardware, assembling its own solution from scratch. The special HPC capabilities of the storage system like those we’ve discussed in the previous blogs need to operate transparently so the administrator doesn’t require special training.
Conclusion
An increasing number of commercial organizations need HPC to solve modern day challenges and remain competitive. Those organizations can’t settle for stretching traditional storage systems to meet the demands of commercial HPC. IT professionals within these organizations need to look for storage systems designed specifically for the commercial HPC use case.