HPC data centers are the harbinger of what the enterprise will look like in the coming years. As such enterprise data centers should pay close attention to a recent survey of HPC data centers, which was sponsored by DataDirect Networks. If they did, they would realize flash is not the answer to all their problems. Enterprises will need to learn there is simply too much data, and flash is still too expensive to have all of the organization’s data stored there.
The HPC to Enterprise Connection
High-performance computing environments used to be the exclusive domain of universities and federally funded labs. But because of analytics, big data and machine learning enterprises are realizing their environments are looking increasingly like HPC. Even traditional enterprise workloads like databases and unstructured data, because of capacity and performance demands, share similar attributes to HPC storage architectures.
All-Flash is not the Answer
A key finding in the DDN survey was how few of these HPC organizations were all-flash, even on their production workloads. Most used flash to accelerate workloads but the percentage of data that was actually on flash was incredibly low. Some of the rationale for this was simply the nature of the HPC beast. Respondents felt that their workloads were too big to be stored only on flash. Roughly 84% of the organizations surveyed had more than a petabyte of storage and more than 30% had more than 10PBs of storage!
Enterprise IT may look at HPC and use their capacities as rationalization as to why this survey doesn’t apply to them. But it is important to remember that the expected capacity growth in the enterprise data center will outpace growth in HPC as they bring in big data, Internet of Things (IoT), video surveillance, etc, plus the growth rate of their more traditional workloads
Data Management is the Key
Another aspect of the low percentage of data being on flash is the widespread adoption of global file systems that have the intelligence to move data between tiers of storage based on use case. In other words, HPC doesn’t need as much flash because they can shift data sets around, either manually or automatically without disrupting workflow. These intelligent data management systems can knit together various IT storage silos into a homogeneous storage architecture that ranges from internal flash, shared flash, high capacity NAS, object storage and public cloud storage.
Priority one for enterprises, and for HPC environments, is to implement a data management architecture that goes beyond traditional archive. Data management moves data up and down storage tiers, where archive generally is only used to move data down tiers.
Mixed IO a Common Challenge
Another aspect of the survey is HPC’s challenge with dealing with mixed IO performance with almost 60% of the respondents indicating it as their top challenge. Mixed IO means that the storage infrastructure has to deal with large sequential IO (bandwidth) and massively randomized IO (IOPS and latency) at the same time. Again the global file system and intelligent data management help with this challenge. The file system is needed so that components of the storage architecture can be tuned for bandwidth while other components can be tuned for high IOPS and low latency. The customer can then place the data on the specific component that makes the most sense for that workload, or it can leverage the intelligence layer to automatically place the data.
Today’s HPC problems typically end up being tomorrow’s enterprise challenges. Enterprise would do well to keep a careful eye on the HPC industry to see how they are addressing their problems. A key takeaway is the need for a global approach to file systems instead of a file system for each application cluster. Also, HPC is embracing the idea of the intelligent placement of data so that the right data is in the right location at the right time. Intelligence enables them to reduce their flash investment and to meet the needs of mixed workloads.
The reason the enterprise needs to pay attention to these initiatives now is that they require time to become part of the IT status quo. One does not implement a global file system and copy all data to it overnight. Instead, it takes a gradual approach that places new workloads on the file system, and slowly transitions old workloads to it. Most HPC environments have been working on their global file system strategy for years. The enterprise needs to get started on developing a data management strategy now, and the first part of that is selecting and implementing a file system that acts as the foundation for the future.