I tried that as well, but maybe I did not use it correctly. I did not see the fu... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

dauertewigkeit on Feb 11, 2023 | parent | context | favorite | on: How to train large models on many GPUs? (2021)

I tried that as well, but maybe I did not use it correctly. I did not see the full sharding that I was hoping for. I only saw results similiar to FSDP.

cma on Feb 11, 2023 [–]

How about flexflow?

https://huggingface.co/transformers/v4.9.2/parallelism.html#...

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact