Hacker Newsnew | past | comments | ask | show | jobs | submit | ryanackley's commentslogin

I agree completely. I haven't noticed much improvement in coding ability in the last year. I'm using frontier models.

What's been the game changer are tools like Claude Code. Automatic agentic tool loops purpose built for coding. This is what I have seen as the impetus for mainstream adoption rather than noticeable improvements in ability.


My anecdotal experience is rather different.

I write a lot of C++ and QML code. Codex 5.3, only released in Feb, is the the first model I've used that would regularly generate code that passes my 25 years expert smell test and has turned generative coding from a timesap/nuisance into a tool I can somewhat rely on not to set me back.

Claude still wasn't quite there at the time, but I haven't tried 4.6 yet.

QML is a declarative-first markup language that is a superset of the JavaScript syntax. It's niche and doesn't have a giant amount of training data in the corpus. Codex 5.3 is the first model that doesn't super botch it or prefers to write reams of procedural JS embeds (yes, after steering). Much reduced is also the tendency to go overboard on spamming everything with clouds of helper functions/methods in both C++ and QML. It knows when to stop, so to speak, and is either trained or able to reason toward a more idiomatic ideal, with far less explicit instruction / AGENTS.md wrangling.

It's a huge difference. It might be the result of very specific optimization, or perhaps simultaneous advancements in the harness play a bigger role, but in my books my kneck of the woods (or place on the long tail) only really came online in 2026 as far as LLMs are concerned.


As a Qt C++ and QML developer myself[1], Opus 4.6 thinking is much better than any other model I've tested (Codex 5.3/GPT 5.4/Gemini 3.1 Pro).

[1] https://rubymamistvalove.com/block-editor


Maybe n=1, but I disagree? I notice that Sonnet 4.6 follows instructions much better than 4.5 and it generates code much closer to our already in-place production code.

It's just a point release and it isn't a significant upgrade in terms of features or capabilities, but it works... better for me.


Are you using a tool like Claude Code or Codex or windsurf? I ask because I've found their ability to pull in relevant context improves tasks in exactly the way you're describing.

My own experience is that some things get better and some things get worse in perceived quality at the micro-level on each point release. i.e. 4.5->4.6


Ok, but the real issue with kids looking up porn is how it warps general expectations around sex. Singling out specific fetishes and taboos that involve consenting adults seems a little bit like misdirected moral panic.

To be more specific, the idea that step-cest warps children's minds is laughable when the larger issue is that 95% of porn portrays women as submissive sex dolls that exist for male pleasure. Don't forget the unrealistic expectations around body and beauty standards


Yeah I feel like this will funnel everyone's opinion into sounding like it was written by an AI.

Love the idea but the example they give with bears is absolutely hilarious. Calling bears dumb animals is offensive? God help us.


Hah, the idea is to have an example on the site that is not offensive -- we're not going to write something offensive down -- but where you can understand what it would be or could be. It lets you infer / understand the point without us actually writing something awful. (Maybe we can do it better, though.)

Bears seemed a pretty inoffensive target, plus our backend uses Python with beartype and that library is all about bear jokes.


If they tank the white-collar middle class, there won't be anyone to buy the goods and services their potential AI customers will be trying to sell.

It's like a snake eating its own tail.


> The thing with coding agents is that it seems now that you can eat your cake and have it too. We are all still adapting, but results indicate that given the right prompts and processes harnessing LLMs quality code can be had in the cheap.

It's cheaper but not cheap

If you're building a variation of a CRUD web app, or aggregating data from some data source(s) into a chart or table, you're right. It's like magic. I never thought this type of work was particularly hard or expensive though.

I'm using frontier models and I've found if you're working on something that hasn't been done by 100,000 developers before you and published to stackoverflow and/or open source, the LLM becomes a helpful tool but requires a ton of guidance. Even the tests LLMs will write seem biased to pass rather than stress its code and find bugs.


> It's cheaper but not cheap

It's quite cheap if you consider developer time. But it's only as cheap as you can effectively drive the model, otherwise you are just wasting tokens on garbage code.

> LLM becomes a helpful tool but requires a ton of guidance

I think this is always going to be the case. You are driving the agent like you drive a bike, it'll get you there but you need to be mindful of the clueless kid crossing your path.

For some projects I had good results just letting the agent loose. For others I'd have to make the tasks more specific and granular before offloading to the LLM. I see nothing wrong with it.


> I never thought this type of work was particularly hard or expensive though.

Maybe not intrinsically hard, but hard because it's so boring you can't concentrate.

> the LLM becomes a helpful tool but requires a ton of guidance. Even the tests LLMs will write seem biased to pass rather than stress its code and find bugs.

ISTR some have had success by taking responsibility for the tests and only having the LLM work on the main code. But since I only seem to recall it, that was probably a while ago, so who knows if it's still valid.


Yes, AI companies are bleeding money with current pricing. Your AI usage is heavily subsidized by investor dollars.


Better take the opportunity then and have it build good stuff while it lasts. :)


How far can Claude can take this beyond a cool demo.

Does it become exponentially harder to add the missing features or can you add everything else you need in another two days? I'm guessing the former but would be interested to see what happens.

Are you going to continue trying? I ask because it's only been two days and you're already on Show HN. It seems like if you waited for it to be more complete, it would have been more impressive.


Not my experience at all. Claude Code is impressive but it needs to be used iteratively for serious development and requires lots of testing.

Need it to one shot that report against your db using React/Typescript? Or pump out a web form that submits to your backend? works every time.

Need it to do something a little more creative? It frequently fails in subtle ways that aren't apparent until later.


nobody was talking about one-shotting

but I could see why we would be talking past each other, there's one-shotters and everyone else


Jira has had free competitors that do at least 75% of what it does since it's inception. You could find a dozen on github that actually look good right now.

In spite of this, Jira is bigger than ever.


Sure, and Github is filled with these 10 feature competitors. Some even have active communities.

Didn't seem to kill off the big SaaS players or even weaken them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: