I got the 70b qwen llama distill, I have 24GB of vram. I opened aider and gave a...

mechagodzilla · 2025-06-07T11:40:34 1749296434

Buy a used workstation with 512GB of DDR4 RAM. It will probably cost like $1-1.5k, and be able to run a Q4 version of the full deepseek 671B models. I have a similar setup with dual-socket 18 core Xeons (and 768GB of RAM, so it cost about $2k), and can get about 1.5 tokens/sec on those models. Being able to see the full thinking trace on the R1 models is awesome compared to the OpenAI models.