High-Performance Compute (HPC) and the storage infrastructure that supports it has long been considered a realm unto itself. Enterprise IT would have nothing to do with it. However, in the past decade, massive I/O bottlenecks have emerged throughout IT infrastructure due to:
- Everything related to silicon evolution has gotten faster: systems, storage, processors, servers, interconnect
- Data that needs to be processed, distributed, collaborated is much larger. Applications are far more demanding, collaboration is at much larger scale
The combination of these bottlenecks are bringing IT infrastructure to its knee. Companies either are willingly, or are being forced to embark on initiatives that have requirements similar to HPC. It may be time for the enterprise to look at HPC storage solutions to not only solve these HPC types of problems, but to also solve more traditional data center challenges created by high density virtualization and high performance databases.
What Makes HPC Storage Different?
HPC Storage is different from traditional enterprise storage because of the variety of workloads that it has to support. The same system can be tasked to provide high performance sequential I/O by one group of applications, highly responsive random I/O by another group of applications and cost effective, high capacity storage by still another.
HPC is prevalent in industries like Life Sciences, Oil and Gas Exploration, Financial Services High Frequency Trading and Academic Research. Each of these industries has a unique requirement, but instead of designing a particular storage system for each use case, successful HPC storage vendors have developed storage platforms that meet all the various requirements. These conditions include high performance, unprecedented scale, high reliability, data durability and cost-efficient long-term retention.
Of course, many of these industries have a more traditional part of the business and use more traditional storage to support those efforts. These industries were the first to realize the redundancy of effort and consider leveraging their HPC storage investment for the traditional data center use cases, essentially merging the two realms.
HPC in The Enterprise
The first reason to consider HPC storage systems in the infrastructure is that many enterprises now have HPC type requirements on the traditional side of the business thanks to initiatives like Big Data, Analytics, Cloud Computing and Video Surveillance. Each of these initiatives requires the ability to ingest large streams of sequential data, often in real-time. They also need the capacity to parse this data, and they often need to be able to retain, cost-effectively, this data for an extended period. The storage systems that supports these initiatives has to provide excellent sequential performance during ingesting, rapid response during data parsing, fast response times during recall, and massive scalability for long-term storage. These are the exact attributes of what some HPC storage systems can deliver, even at the largest scale.
As traditional data centers begin to deploy workloads with HPC attributes, using HPC storage systems seem like an obvious choice. But organizations may not want to stop there. Thoroughly analyzing the capabilities of an HPC storage system will lead the IT planner to the conclusion that they can also meet the demands of the more traditional business applications and environments that the data center handles.
HPC Storage for Virtualization
The second reason to consider HPC in the enterprise is that traditional workloads are looking more like HPC workloads. For example, the moment virtualization moved into production and began to scale, it started to show HPC attributes. A virtualized environment has a clustered file system, as do HPC environments. It also has dozens if not hundreds of nodes (virtual hosts), as do HPC environments. Finally, like an HPC environment, each of those nodes generates dozens of uniquely random I/O streams.
The response to this requirement from legacy vendors is a hybrid flash or all-flash based storage system. HPC storage systems have long incorporated memory to enhance the performance of their systems and continue to do so with flash technology. HPC systems, however, don’t count exclusively on flash for their performance, instead leveraging RAM as well as the wide striping of hard disk drives to gain high performance from a more cost effective media. Driving high performance from three media types is ideal for the data center. Many data centers go all-flash or over-buy flash to avoid the performance delta between flash and hard disk when there is a cache miss. High performance on both the flash and HDD tiers reduces this concern thanks to the minimized performance delta.
HPC Storage for Databases
Databases have also evolved to display HPC attributes. Today in the data center they can be deployed on a finite number of servers in a scale-up fashion, or they can be implemented in a scale-out design using a NoSQL solution. In either case, HPC storage can again meet the storage I/O demands of these environments. In a scale-up database environment, a few servers need to support thousands of users, all simultaneously making requests of the database. In a scale-out design, these users are scattered across potentially hundreds of database nodes. The impact on storage is similar, thousands of simultaneous and uniquely random I/O requests that need to be serviced.
The HPC storage system design that can scale-out to meet both capacity and performance demands of these environments is ideally suited for the new generation of databases.
Data Centers Need Performance and Scalability without Penalty
The common aspect of all data center initiatives like video surveillance, big data, high density virtualization, and high performance databases is the need for upfront performance that won’t degrade as the environment scales. Delivering consistent performance at scale requires a scalable architecture that won’t bottleneck when under duress. IT Planners need to look for an HPC storage system that can deliver on this need and should avoid designs where a single controller or node can bottleneck storage I/O performance.
Fine Tuning HPC Storage For The Enterprise
Despite all the capabilities that a standard HPC storage system could bring to a traditional data center, an HPC storage system, finely tuned for the business, is a critical requirement for the IT planner who is considering broadening its use. For example, the HPC storage system needs to be able to integrate with the storage protocols that are common to the traditional data center. These protocols typically include block (fibre channel and iSCSI) and File (CIFS, NFS). The modern data center does not have the luxury to stop everything and re-write applications. Support of legacy protocols will be a requirement for some time to come.
Of course these systems should also support “new” storage initiatives like Object, Rest and S3 so the enterprise can modernize and plan appropriately for it. The combination of supporting legacy and modern protocols allows the data center to implement a storage system that will meet a multitude of its application requirements.
The HPC storage system also needs to have the enterprise class features that storage administrators have become accustomed to like snapshots, replication, cloning and various data efficiency techniques. These capabilities are essential to enabling the data center to provision new applications quickly as well as to meet backup and recovery requirements.
Determining the difference between a traditional data center and a specialized HPC environment is difficult. The traditional data center’s workloads, when looked at from an I/O perspective, appear very similar to that of the HPC environment. Partly this sudden similarity comes from traditional data center deploying HPC workloads like video surveillance and big data. However, the primary reason for the similarity is that traditional workloads are now clustered, a hallmark of the HPC environment. It makes sense then that the IT planner should include traditional HPC storage vendors in their decision making when trying to solve modern day storage challenges.
Sponsored by DataDirect Networks
About DataDirect Networks
DataDirect Networks (DDN) is the leader in scalable storage originally designed for HPC, Big Data and Cloud Applications and now moving into the enterprise. Whether you need to accelerate your data-intensive applications and workflow or scale in performance and capacity for your requirements, DDN can help you unlock the value of your data by delivering systems with the highest performance, scalability, efficiency and simplicity.