Hacker News new | past | comments | ask | show | jobs | submit login

Have you looked into quantization? At 8-bit quantization, a 7B model requires ~7GB of RAM (plus a bit of overhead); at 4-bit, it would require around 3.5GB and fit entirely into the RAM you have. Quality of generation does degrade a bit the smaller you quantize, but not as much as you may think.



This is interesting; I've written how I set it up here; https://christiaanse.ca/posts/running_llm/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: