Maybe I'm using it wrong, but when I try to use the full precision FP16 model, l...

mdp2021 · 2025-08-15T08:34:57 1755246897

As many repeated here, it's (generally) not for direct use. It is meant to be a good base for fine-tuning and getting something very fast.

(In theory, if you fine-tuned Gemma3:270M over "templating cold calls to leads" it would become better than Qwen and faster.)

wanderingmind · 2025-08-15T10:41:09 1755254469

Why should we start fine tuning gemma when it is so bad. Why not instead focus the fine-tuning efforts on Qwen, when it starts off with much, much better outputs?

mdp2021 · 2025-08-15T12:14:58 1755260098

Speed critical applications, I suppose. Have you compared the speeds?

(I did. I won't give you number (which I cannot remember precisely), but Gemma was much faster. So, it will depend on the application.)