Gemma is amazing with tools for anything that is not crazy complex. I think a lo...

user43928 · 2026-06-14T09:50:53 1781430653

Do you guys actually work with these models?

I have to use GPT 5.4 Mini at work. It benchmarks higher than that Gemma 4 model.

In my experience it's next to useless. It cannot even move 20 existing lines of code from A to B without breaking them half of the time.

If you tell it to look something up in your dependencies, it's 50/50 on whether the answer is correct, incorrect, or it simply didn't perform the search at all.

I find it next to useless, and I'm mostly better off doing the work manually.

It's a night and day difference to even Sonnet, not to mention the SOTA.

zhshshshs · 2026-06-14T14:06:42 1781446002

Counter: I use 5.4 mini all time for coding. No trouble letting it implement features. Entire new screens, APIs and various components.

It ain’t the best for sure, but if you have trouble letting it move 20 lines I don’t know what’s the cause but that’s not my experience at all. I do make pretty extensive use of guardrails and proper instructions in my AGENTS.md.

I also value super boring code bases with an as much as possible uniform shape. I guess that’s also helping out.

sigmoid10 · 2026-06-14T15:15:20 1781450120

>It benchmarks higher than that Gemma 4 model.

Depends on what you look at. Gemma 4 31B without reasoning benchmarks significantly higher than GPT-5.4 without reasoning on artificial analysis. Even the new Gemma 4 12B beats it. And while GPT-5.4 with xhigh reasoning beats the reasoning version of Gemma 4 31B, the question is why you would throw such a complicated task that needs so much reasoning at such a small model to begin with. So if you do coding, you'll probably not have much success with either model. But for actual simple tasks that these models were made for, they are extremely capable. E.g. hook it up to the Atlassian MCP and have it do all the stuff that is supplemental to coding in big enterprises.

pixlmint · 2026-06-14T13:37:58 1781444278

Like I said in my original comment, it’s fine for non-coding tasks, meaning I primarily use it to answer questions

sowbug · 2026-06-15T15:48:47 1781538527

The MoE variant was perfect for speedily generating hundreds of vocabulary mnemonic flash cards for my daughter to study for the SAT. "Ant bait abates our ant problem" and "A droid adroitly fixes things around the house," for example.

We also used z-image to generate accompanying illustrations.

pixlmint · 2026-06-14T20:14:55 1781468095

“Moving lines of code” is a very peculiar eval tbh. I’ve never used Gemma for agentic tasks, but did have it write code, including multi-turn, and I was very positively surprised how well it performed.

user43928 · 2026-06-14T20:26:18 1781468778

It wasn't so much an eval, I really just wanted a small change moved out to another branch.

GPT 5.4 mini couldn't do it. Not even on the second attempt, where it went from obviously wrong to a subtly wrong copy.

In the end I had to manually copy and paste the 10-20 lines over.

If it can't even do that job, I seriously doubt it's going to be adequate for implementing a plan, like people often seem to suggest it could do, in order to save output tokens of a better model.

pixlmint · 2026-06-15T07:22:30 1781508150

Like I said, I never really used it for agentic work. I had previously evaluated locally runnable models with opencode (such as qwen3-coder), but found that it wasn't really feasible.

Since then I've adopted a different philosophy, and I actually prefer it this way.

I still very much enjoy doing most coding myself, but when I tried using tools like Claude Code, it felt very difficult to return to the codebase after letting Claude make some changes. Maybe that's just because of poor AI-use discipline, I don't know. But with smaller models, that's not even an issue. I can't just let it do all the coding and thinking for me, however if I can describe a function I want to great detail in plain english, then Gemma can write it for me, and it will most likely work. It's perfect for boilerplate.

I also recently worked with a web framework I'd never worked before, though I'm deeply familiar with other ones. So I asked it "I know how to do this in Y framework, what's the best-practice approach to doing it in Z framework?" and it was incredibly helpful, even pushing back on some of my 'bad' attempts at solving a problem.

I think GPT5.4 mini might fall into a similar category, in that it probably performs best when not overwhelmed with too many tools/ skills/ mcps, instead being given clearly defined tasks by an orchestrator model. I call those my token burners, as they're super cheap to run and have high tokens/second.

matt-p · 2026-06-14T10:00:28 1781431228

Cursor 2.5 is essentially kimi and I find it eminently usable.

dominotw · 2026-06-14T13:27:32 1781443652

i use for tasks like object recognition in my family photos and cooking videos . seems to be fine