Have you looked into quantization? At 8-bit quantization, a 7B model requires ~7...

evnc on Dec 10, 2023 | parent | context | favorite | on: Ask HN: What's the best hardware to run small/medi...

Have you looked into quantization? At 8-bit quantization, a 7B model requires ~7GB of RAM (plus a bit of overhead); at 4-bit, it would require around 3.5GB and fit entirely into the RAM you have. Quality of generation does degrade a bit the smaller you quantize, but not as much as you may think.

RandomWorker on Dec 12, 2023 [–]

This is interesting; I've written how I set it up here; https://christiaanse.ca/posts/running_llm/