Hacker News new | comments | ask | show | jobs | submit login

Can anybody offer a TLDR of how this works (or point me to one)? It seems particularly well-suited for convolutional nets with many layers if I understand correctly, but I am curious as to whether e.g. recurrent nets may receive the same speed-ups from parallelization.



People at Google use multiple replicas to train RNNs and LSTMs to very good effect.

At heart, the most common distributed training mechanism creates multiple "replias" of the model -- each replica has a full copy. It splits the training data among the replias, and then at the end of every batch, synchronizes the updates to the model weights between the replicas. (A simple way would be to think of taking the average of the gradients produced at each replica, and having all replicas apply the average gradient. Equivalently, just reduce the per-replica learning rate and apply all of the gradients, then propagate back the new state to everyone.)


Ah, great, thanks for the explanation.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: