Hacker Newsnew | past | comments | ask | show | jobs | submit | girvo's commentslogin

Considering the full fat Qwen3.5-plus is good, but barely Sonnet 4 good in my testing (but incredibly cheap!) I doubt the quantised versions are somehow as good if not better in practice.

I think it depends on work pattern.

Many do not give Sonnet or even Opus full reign where it really pushes ahead of over models.

If you're asking for tightly constrained single functions at a time it really doesn't make a huge difference.

I.e. the more vibe you do the better you need the model especially over long running and large contexts. Claude is heading and shoulders above everyone else in that setting.


>I.e. the more vibe you do the better you need the model especially over long running and large contexts

For sure, but the coolest thing about qwen3.5-plus is the 1mil context length on a $3 coding plan, super neat. But the model isn't really powerful enough to take real advantage of it I've found. Still super neat though!


When you say Sonnet 4, do you mean literally 4, or 4.6?

It's not as capable as Sonnet 4.6 in my usage over the past couple days, through a few different coding harnesses (including my own for-play one[0], that's been quite fun).

[0] https://github.com/girvo/girvent/


What is the benefit of writing your own harness? I am asking because I need to get better at using AI for programming. I have used Cursor, Gemini CLI, Antigravity quite a bit and have had a lot of difficulties getting them do what I want. They just tend to "know better."

Purely as an exercise to see how they operate, and understand them better. Then additionally because I was curious how much better one could make something like qwen3.5-plus with its 1 mil context window despite its weaker base behaviour, if I was to give it something very focused on what I want from it

The Pi framework is probably right up your alley btw! Very extensible


I’m not an expert but I started with smaller tasks to get a feel for how to phrase things, what I need to include. It’s more manageable to manually fix things it screwed up than giving it full reign.

You may want to look at the AGENTS.md file too so you can include your stock style things if it’s repeatedly screwing up in the same way.


I think it's the same instinct as making your own Game Engine. You start off either because you want to learn how they work or because you think your game is special and needs its own engine. Usually, it's a combination of both.

> why does healthy food cost so much more than processed food?

It doesn’t.


And yet models get things wrong all the time, too.

That’s what I would expect even if it can have the concept of truth. Like humans.

I absolutely tell a coworker their code is slow and expect them to fix it…

I too tell my boss to promote me and expect him to do so.

> Code now does not need much design.

I’ll bite: why? Genuine question, not a weird gotcha.


That has always been true (not that I’m saying you don’t know that, I’m using your comment as a jumping off point) in this industry. I am a good developer, but I’m a very good teacher and leader, and soft skills are why I’ve had the career I’ve had over the past two decades.

Sometimes I’m scared.

Sometimes I realise that this particular task has been slower than if I’d done it myself when I take in to account full wall clock time.

I can’t tell what type of task is going to work ahead of time yet.


Glad it's not just me then, it's been driving me slightly batty.

In my experience, sometimes. Not that often, depends on the task.

The benefit is I can keep some things ticking over while I’m in meetings, to be honest.


The funny thing is that my Apple TV has profiles! Surely the iPad can too.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: