The BigScience team (a working group of researchers that trained the BLOOM-176B LLM last year) released Petals [0][1] which allows distributed inference and fine-tuning of BLOOM, with the option to pick a custom model + private swarm. SWARM [2][3] is a WIP from yandex and UW that shares some of the same codebase, but is for distributed training.
[0] https://petals.ml/ [1] https://github.com/bigscience-workshop/petals [2] https://github.com/yandex-research/swarm [3] https://twitter.com/m_ryabinin/status/1625175933492641814