16x Tesla V100 Server, Benchmarks and Architecture - ydau
======
ydau
TLDR:

Tesla V100s have I/O pins for at most 6x 25 GB/s NVLink traces. So, systems
with more than 6x GPUs cannot fully connect GPUs over NVLink. This causes I/O
bottlenecks that significantly diminish returns of scaling beyond six GPUs.

This article provides an overview of their architecture that bypasses this
limitation using additional high bandwidth links. Looking at the benchmarks,
multi-GPU performance scales almost perfectly linearly from 1x GPU 16x GPUs.

I'm one of the engineers who worked on this project. Happy to answer any
questions!

~~~
core-questions
I think you missed the actual link to the article?

