Hacker News new | past | comments | ask | show | jobs | submit login

The weights are quantized down to fewer bits in order to save on memory. The quantization loss is going to result in worse generations.



Ollama serves multiple versions, you can get Q8_0 from it too:

ollama run deepseek-r1:8b-llama-distill-q8_0

The real value from the unsloth ones is that they were uploaded before R1 appeared on Ollama's model list.


Unsloth also works very diligently to find and fix tokenizer issues and many other problems as soon as they can. I have comparatively little trust on ollama following up and updating everything in a timely manner. Last I checked, there is little information on when the GGUFs and etc. on ollama were updated or what llama.cpp version / git commit did they use for it. As such, quality can vary and be significantly lower with the ollama versions for new models I believe.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: