Unless something has changed, it needs to load the full 8 models at the same tim... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

brandall10 7 months ago | parent | context | favorite | on: Mistral AI Launches New 8x22B MOE Model

Unless something has changed, it needs to load the full 8 models at the same time. During inference it performs like a 2 x base model.

Mixtral 7B @ 5 bit takes up over 30gb on my M3 Max. That's over 90 for this at the same quantization. Realistically you probably need a 128gb machine to run this with good results.

fzzzy 7 months ago [–]

A 4 bit quant of the new one would still be about 70 gb, so yeah. Gonna need a lot more ram.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact