Hacker News new | past | comments | ask | show | jobs | submit login

Hi Kye, I tried a version of this model to assess its capabilities.

I would recommend you to try to run the llama-based distill (same size, same quantization) that you can find here: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Llama-8...

It should take the same amount of memory as the one you currently have.

In my experience the Llama version performs much better at adhering to the prompt, understanding data in multiple languages, and going in-depth in its responses.






Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: