I have a M2 Pro 32B memory Mac mini, and I can run mixtral-7-8-Q3 - that is with... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

mark_l_watson 80 days ago | parent | context | favorite | on: Amazon spends another $2.7B on Anthropic

I have a M2 Pro 32B memory Mac mini, and I can run mixtral-7-8-Q3 - that is with 3 bit quantization.

woadwarrior01 68 days ago [–]

IIRC, you're mentioned once before that you've used Private LLM. :) Please try the 4-bit OmniQuant quantized Mixtral 8x7B Instruct model in it. It runs circles around RTN Q3 models at speed and RTN Q8 models at text generation quality.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact