4090 only has 24 GB and will only be able to fine tune (and merge, which is more... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

bart__ 9 months ago | parent | context | favorite | on: Beginner's Guide to Llama Models

4090 only has 24 GB and will only be able to fine tune (and merge, which is more memory intensive) the 7B model. The RTX6000 with 48 GB is able to fine tune the 13B model. The 70B model presumably needs multiple GPUs, like 4 RTX6000. For people starting out, you can also use a free GPU from Google colab to fine tune a 7B model. Finetuning 70B gets more expensive and I would suggest trying smaller models first with a high quality dataset.

It is mostly linear I think.

speedgoose 9 months ago [–]

Thanks. My plan is to use this research cluster: https://www.ex3.simula.no/resources

I will probably train how to fine tune on the small model but I don’t really need to use a worse model to save money.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact