Hacker Newsnew | past | comments | ask | show | jobs | submit | mongrelion's commentslogin

At what temperature did you run it and what was your context limit?

I don't understand why I'm getting downvoted.

I am legitimately curious about the parameters that the person used for running the model locally to get the results they got because I am myself currently experimenting with running models locally myself. You can see I am asking similar questions to others in this same thread and correlate the timestamps.


Apparently there is a whole science behind running models. I have seen the instructions that unsloth publishes for their quants and depending on the model they'll tweak things like the temperature, top k, etc.

The size of the quantization you chose also makes a difference.

The GPU driver also plays an important role.

What was your approach? What software did you use to run the models?


What front-end framework did you use? I find the UI so visually appealing

FWIW, while I find it appealing, I also strongly associate it with "vibe coded webapp of dubious quality," so personally I'm not gonna try to replicate it myself.

Thanks. I actually used Google AI Studio for this. Prompted with my color choices and let it do the rest, turned out pretty good.

Which quantization are you running and what context size? 32tok/s for that model on that card sounds pretty good to me!

It might be that the system prompt sent by codex is not optimal for that model. Try with open code and see if your results improve

By anyone do you mean a well-established business or any entity willing to serve you?

> [...] _but not necessarily use the right format._

This has also been my experience. But isn't the harness sending the instructions on how to invoke a tool? Maybe it is missing the formatting part. What do you think?


Through my Kagi subscription I get access to quite a few models [1] but I tend to rely on Qwen3 (fast) for quick questions and Qwen3 (reasoning) when I want a more structured approach, for example, when I am researching a topic.

I have tried the same approach with Kimi K2.5 and GLM 5 but I keep going back fo Qwen3.

I also have access to Perplexity which is quite decent to be honest, but I prefer to keep everything in Kagi.

1: https://help.kagi.com/kagi/ai/assistant.html#available-llms


Great idea of inferbench (similar to geekbench, etc.) but as of the time of writing, it's got only 83 submissions, which is underwhelming.

> [...] it's much easier to fine-tune a "general" model into performing some very specific custom task (like classifying text, or translation, etc)

Is this fine-tunning process similar to training models? As in, do you need exhaustive resources? Or can this be done (realistically) on a consumer-grade GPU?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: