The first benchmark results from the MLPerf consortium have been released and Nvidia is a clear winner for inference performance.
For those unaware, inference takes a deep learning model and processes incoming data however it’s been trained to.
MLPerf is a consortium which aims to provide “fair and useful” standardised benchmarks for inference performance. MLPerf can be thought of as doing for inference what SPEC does for benchmarking CPUs and general system performance.
The consortium has released its first benchmarking results, a painstaking effort involving over 30 companies and over 200 engineers and practitioners. MLPerf’s first call for submissions led to over 600 measurements spanning 14 companies and 44 systems.
However, for datacentre inference, only four of the processors are commercially-available:
- Intel Xeon P9282
- Habana Goya
- Google TPUv3
- Nvidia Turing
Nvidia wasted no time in boasting of its performance beating the three other processors across various neural networks in both server and offline scenarios:
The easiest direct comparisons are possible in the ImageNet ResNet-50 v1.6 offline scenario where the greatest number of major players and startups submitted results.
In that scenario, Nvidia once again boasted the best performance on a per-processor basis with its Titan RTX GPU. Despite the 2x Google Cloud TPU v3-8 submission using eight Intel Skylake processors, it had a similar performance to the SCAN 3XS DBP T496X2 Fluid which used four Titan RTX cards (65,431.40 vs 66,250.40 inputs/second).
Ian Buck, GM and VP of Accelerated Computing at NVIDIA, said:
“AI is at a tipping point as it moves swiftly from research to large-scale deployment for real applications.
AI inference is a tremendous computational challenge. Combining the industry’s most advanced programmable accelerator, the CUDA-X suite of AI algorithms and our deep expertise in AI computing, NVIDIA can help datacentres deploy their large and growing body of complex AI models.”
However, it’s worth noting that the Titan RTX doesn’t support ECC memory so – despite its sterling performance – this omission may prevent its use in some datacentres.
Another interesting takeaway when comparing the Cloud TPU results against Nvidia is the performance difference when moving from offline to server scenarios.
- Google Cloud TPU v3 offline: 32,716.00
- Google Cloud TPU v3 server: 16,014.29
- Nvidia SCAN 3XS DBP T496X2 Fluid offline: 66,250.40
- Nvidia SCAN 3XS DBP T496X2 Fluid server: 60,030.57
As you can see, the Cloud TPU system performance is slashed by over a half when used in a server scenario. The SCAN 3XS DBP T496X2 Fluid system performance only drops around 10 percent in comparison.
You can peruse MLPerf’s full benchmark results here.
Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.