Hacker News new | past | comments | ask | show | jobs | submit login

4090 only has 24 GB and will only be able to fine tune (and merge, which is more memory intensive) the 7B model. The RTX6000 with 48 GB is able to fine tune the 13B model. The 70B model presumably needs multiple GPUs, like 4 RTX6000. For people starting out, you can also use a free GPU from Google colab to fine tune a 7B model. Finetuning 70B gets more expensive and I would suggest trying smaller models first with a high quality dataset.

It is mostly linear I think.




Thanks. My plan is to use this research cluster: https://www.ex3.simula.no/resources

I will probably train how to fine tune on the small model but I don’t really need to use a worse model to save money.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: