I am legitimately curious about the parameters that the person used for running the model locally to get the results they got because I am myself currently experimenting with running models locally myself. You can see I am asking similar questions to others in this same thread and correlate the timestamps.
Apparently there is a whole science behind running models. I have seen the instructions that unsloth publishes for their quants and depending on the model they'll tweak things like the temperature, top k, etc.
The size of the quantization you chose also makes a difference.
The GPU driver also plays an important role.
What was your approach? What software did you use to run the models?
FWIW, while I find it appealing, I also strongly associate it with "vibe coded webapp of dubious quality," so personally I'm not gonna try to replicate it myself.
> [...] _but not necessarily use the right format._
This has also been my experience. But isn't the harness sending the instructions on how to invoke a tool? Maybe it is missing the formatting part. What do you think?
Through my Kagi subscription I get access to quite a few models [1] but I tend to rely on Qwen3 (fast) for quick questions and Qwen3 (reasoning) when I want a more structured approach, for example, when I am researching a topic.
I have tried the same approach with Kimi K2.5 and GLM 5 but I keep going back fo Qwen3.
I also have access to Perplexity which is quite decent to be honest, but I prefer to keep everything in Kagi.
> [...] it's much easier to fine-tune a "general" model into performing some very specific custom task (like classifying text, or translation, etc)
Is this fine-tunning process similar to training models? As in, do you need exhaustive resources? Or can this be done (realistically) on a consumer-grade GPU?
reply