
Mesh-TensorFlow: Model Parallelism for Supercomputers (TF Dev Summit ‘19) - legel
https://www.youtube.com/watch?v=HgGyWS40g-g
======
legel
Mesh-TensorFlow
([https://github.com/tensorflow/mesh](https://github.com/tensorflow/mesh))
solves the problem of networks being too large to fit into a single GPU's
memory. I just trained a network with > 1 billion parameters across 4 GPUs.
Beside size of data set and compute power, representational capacity (size of
embeddings) is a key blocker to learning, and so this library opens up new
possibilities for many of us.

