"write me a template to make a cold call to a potential lead",
It throws me absolute rubbish. On the other hand, Qwen 0.6B Q8 quantized model nails the answer for the same question.
Qwen 0.6B is smaller than gemma full precision. The execution is a tad slow but not by much. I'm not sure why I need to pick a gemma over qwen.
(In theory, if you fine-tuned Gemma3:270M over "templating cold calls to leads" it would become better than Qwen and faster.)
reply
(I did. I won't give you number (which I cannot remember precisely), but Gemma was much faster. So, it will depend on the application.)
"write me a template to make a cold call to a potential lead",
It throws me absolute rubbish. On the other hand, Qwen 0.6B Q8 quantized model nails the answer for the same question.
Qwen 0.6B is smaller than gemma full precision. The execution is a tad slow but not by much. I'm not sure why I need to pick a gemma over qwen.