Is there ever going to be a way to distribute training - I would think this is t...

jessfyi · on Feb 15, 2023

The BigScience team (a working group of researchers that trained the BLOOM-176B LLM last year) released Petals [0][1] which allows distributed inference and fine-tuning of BLOOM, with the option to pick a custom model + private swarm. SWARM [2][3] is a WIP from yandex and UW that shares some of the same codebase, but is for distributed training.

[0] https://petals.ml/ [1] https://github.com/bigscience-workshop/petals [2] https://github.com/yandex-research/swarm [3] https://twitter.com/m_ryabinin/status/1625175933492641814

mrdoops · on Feb 15, 2023

There almost surely will and Elixir/OTP is going to do it best.

nerdponx · on Feb 15, 2023

Training is distributed already, but over a big cluster of machine in a data center.

I've always wanted there to be something like BOINC/Gridcoin for fitting these giant neural networks.