
Introduction to Distributed Training of Neural Networks - sytelus
https://blog.skymind.ai/distributed-deep-learning-part-1-an-introduction-to-distributed-training-of-neural-networks/
======
rerx
The asynchronous parameter server based approach to distributed training just
does not make any sense in a cluster of GPUs. Communication overhead would be
massive. Synchronous training with the ring-allreduce method of Horovod etc.
is the way to go.

~~~
mkolodny
I agree. Here's Horovod for anyone who hasn't heard of it:
[https://github.com/uber/horovod](https://github.com/uber/horovod)

~~~
singularity2001
I wished HN had a budget where you could upvote some people/posts 10 times.

