Benchmarks are necessary when trying to understand the performance characteristics of a particular storage system in a particular environment. The problem is they are susceptible to manipulation by vendors to get the best marketing results. The Standard Performance Evaluation Corporation (SPEC) reduces some of this manipulation by enforcing standardized testing and results submission. Vendors have to clearly document their test configurations so that unrealistic designs are easily exposed. The problem with benchmarks gets worse as an increasing number of organizations begin selecting storage systems for artificial intelligence (AI) and machine learning (ML) workloads.
The Proof of Concept Challenge
AI and ML workloads are very difficult to set up in a proof of concept, testing environment. Part of the problem is that understanding what the AI/ML project will look like three to five years from now, when it is in full production, is hard to determine. Another part is that gathering the hardware and software needed to test a potential new storage system is very expensive. Finally, there is also the time involved in configuring, and reconfiguring, the test environment as each storage system candidate shows up.
The organization is stuck at a crossroads. The popular benchmarks for AI/ML were designed to test model efficiency assuming data was running on a local file system, not the underlying storage system itself. There are no AI/ML specific benchmarks that are built to test storage efficiency and internally testing every system is almost an impossible task. Organizations need to consider a blended strategy where they intelligently dissect benchmark results to develop a (very) short list of storage candidates to test.
Dissecting Benchmark Data
While SPEC is doing an amazing job standardizing test results and providing transparency into the configurations used, organizations still need to be careful when they interpret the results. Storage vendors often use unrealistic hardware configurations in an attempt to achieve a top spot position.
Another variable to consider is that many of the vendors submitting results are primarily software companies. They are limited by the configuration they used.
In some cases these configurations are valid, as they are attempting to show that their software is not the limiting factor, and that they can max out the hardware configuration. In other cases the configurations are suspect and should be looked at with some level of skepticism. If a system is able to deliver an unprecedented SPEC SFS score but the configuration to achieve that score is 10X the organization’s budget then it doesn’t have much value. Some vendors will submit multiple configurations so that customers can see the performance difference at different hardware scale and price bands.
Another aspect to consider is the nature of the benchmark itself, SPEC SFS has five different tests that measure performance from many different workloads, small file, large file, read intensive, write intensive, metadata intensive or a mixture. Vendors should review the benchmarks to find what maps best to their workloads.
Ideally, organizations should use the benchmark as an initial guideline to narrow down the field of potential vendors to two or three systems that are brought in-house for on-premises testing.
Test Equal to Your Budget
When it comes time to test, make sure that during the proof of concept the vendor sends a storage system configuration that is within budget. Know exactly what the test configuration costs. Simulating the workload to perform the actual test is difficult, again especially with AI/ML workloads. The best case scenario is to use an application, server and storage configuration that duplicates production as closely as possible. An alternative is to use a workload generator solution that can capture realtime IO from production and play back that IO on the test configurations. A final option is to use standard testing tools on the equipment, tweaked to simulate the workload’s IO pattern. Each of these options gets steadily worse in terms of accuracy.
Leverage the Cloud
An increasing number of modern file systems can run equally well in the cloud as they can on-premises. Leveraging the public cloud may be the ideal test environment. Compute power and storage IO can be “rented” as needed during the test and then “torn down” after the test is complete. The organization is only paying for the test environment while an actual test is in progress. Even if the organization still decides to test on-premises, leveraging the cloud may lower the short list to a single candidate.
Deciding on the storage platform for the organization’s AI/ML initiatives is not a task to be taken lightly. Selecting the right storage solution lays a foundation for future AI/ML investment and keeps the organization from buying a new system as each AI/ML project spins up. Testing and evaluating these systems is difficult but leveraging published benchmarks, dissecting them for reality and then performing limited internal testing can lead the organization to the right choice. If the file system has native cloud functionality, that makes the internal testing easier and much less expensive.
Sponsored by WekaIO