Debunking AI and High Velocity Analytics Benchmark Results

Benchmarks are necessary when trying to understand the performance characteristics of a particular storage system in a particular environment. The problem is they are susceptible to manipulation by vendors to get the best marketing results. The Standard Performance Evaluation Corporation (SPEC) reduces some of this manipulation by enforcing standardized testing and results submission. Vendors have to clearly document their test configurations so that unrealistic designs are easily exposed. The problem with benchmarks gets worse as an increasing number of organizations begin selecting storage systems for artificial intelligence (AI) and machine learning (ML) workloads.

The Proof of Concept Challenge

AI and ML workloads are very difficult to set up in a proof of concept, testing environment. Part of the problem is that understanding what the AI/ML project will look like three to five years from now, when it is in full production, is hard to determine. Another part is that gathering the hardware and software needed to test a potential new storage system is very expensive. Finally, there is also the time involved in configuring, and reconfiguring, the test environment as each storage system candidate shows up.

The organization is stuck at a crossroads. The popular benchmarks for AI/ML were designed to test model efficiency assuming data was running on a local file system, not the underlying storage system itself. There are no AI/ML specific benchmarks that are built to test storage efficiency and internally testing every system is almost an impossible task. Organizations need to consider a blended strategy where they intelligently dissect benchmark results to develop a (very) short list of storage candidates to test.

Dissecting Benchmark Data

While SPEC is doing an amazing job standardizing test results and providing transparency into the configurations used, organizations still need to be careful when they interpret the results. Storage vendors often use unrealistic hardware configurations in an attempt to achieve a top spot position.

Another variable to consider is that many of the vendors submitting results are primarily software companies. They are limited by the configuration they used.

In some cases these configurations are valid, as they are attempting to show that their software is not the limiting factor, and that they can max out the hardware configuration. In other cases the configurations are suspect and should be looked at with some level of skepticism. If a system is able to deliver an unprecedented SPEC SFS score but the configuration to achieve that score is 10X the organization’s budget then it doesn’t have much value. Some vendors will submit multiple configurations so that customers can see the performance difference at different hardware scale and price bands.

Another aspect to consider is the nature of the benchmark itself, SPEC SFS has five different tests that measure performance from many different workloads, small file, large file, read intensive, write intensive, metadata intensive or a mixture.  Vendors should review the benchmarks to find what maps best to their workloads.

Ideally, organizations should use the benchmark as an initial guideline to narrow down the field of potential vendors to two or three systems that are brought in-house for on-premises testing.

Test Equal to Your Budget

When it comes time to test, make sure that during the proof of concept the vendor sends a storage system configuration that is within budget. Know exactly what the test configuration costs. Simulating the workload to perform the actual test is difficult, again especially with AI/ML workloads. The best case scenario is to use an application, server and storage configuration that duplicates production as closely as possible. An alternative is to use a workload generator solution that can capture realtime IO from production and play back that IO on the test configurations. A final option is to use standard testing tools on the equipment, tweaked to simulate the workload’s IO pattern. Each of these options gets steadily worse in terms of accuracy.

Leverage the Cloud

An increasing number of modern file systems can run equally well in the cloud as they can on-premises. Leveraging the public cloud may be the ideal test environment. Compute power and storage IO can be “rented” as needed during the test and then “torn down” after the test is complete. The organization is only paying for the test environment while an actual test is in progress. Even if the organization still decides to test on-premises, leveraging the cloud may lower the short list to a single candidate.


Deciding on the storage platform for the organization’s AI/ML initiatives is not a task to be taken lightly. Selecting the right storage solution lays a foundation for future AI/ML investment and keeps the organization from buying a new system as each AI/ML project spins up. Testing and evaluating these systems is difficult but leveraging published benchmarks, dissecting them for reality and then performing limited internal testing can lead the organization to the right choice. If the file system has native cloud functionality, that makes the internal testing easier and much less expensive.

Sponsored by WekaIO

Sign up for our Newsletter. Get updates on our latest articles and webinars, plus EXCLUSIVE subscriber only content.

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: , , , , , , , , , ,
Posted in Blog

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 25,514 other subscribers
Blog Stats
%d bloggers like this: