Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
tarruda
5 months ago
|
parent
|
context
|
favorite
| on:
QuIP#: 2-bit Quantization for LLMs
If this quantization method works with smaller models, it would enable running up to 33B models with only 12GB VRAM.
Especially important for democratizing access to Mistral MoE new model.
MrNeon
5 months ago
[–]
IIRC quantizing small models causes a higher relative drop in the metrics.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
Especially important for democratizing access to Mistral MoE new model.