Llama v3.3 70B after quantization runs reasonably well on a 24GB GPU (7900XTX or...

skirmish 46 days ago | parent | context | favorite | on: GPT-5 is behind schedule

Llama v3.3 70B after quantization runs reasonably well on a 24GB GPU (7900XTX or 4090) and 64GB of regular RAM. Software: https://github.com/ggerganov/llama.cpp .