Hacker News new | past | comments | ask | show | jobs | submit login

I have a M2 Pro 32B memory Mac mini, and I can run mixtral-7-8-Q3 - that is with 3 bit quantization.



IIRC, you're mentioned once before that you've used Private LLM. :) Please try the 4-bit OmniQuant quantized Mixtral 8x7B Instruct model in it. It runs circles around RTN Q3 models at speed and RTN Q8 models at text generation quality.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: