Hacker News new | past | comments | ask | show | jobs | submit login

If this quantization method works with smaller models, it would enable running up to 33B models with only 12GB VRAM.

Especially important for democratizing access to Mistral MoE new model.




IIRC quantizing small models causes a higher relative drop in the metrics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: