If this quantization method works with smaller models, it would enable running u... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

tarruda 5 months ago | parent | context | favorite | on: QuIP#: 2-bit Quantization for LLMs

If this quantization method works with smaller models, it would enable running up to 33B models with only 12GB VRAM.

Especially important for democratizing access to Mistral MoE new model.

MrNeon 5 months ago [–]

IIRC quantizing small models causes a higher relative drop in the metrics.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact