I read this and thought, "are we using the same software?" For me, I have turned the corner where I barely hand-edit anything. Most of the tasks I take on are nearly one-shot successful, simply pointing Claude Code at a ticket URL. I feel like I'm barely scratching the surface of what's possible.
I'm not saying this is perfect or unproblematic. Far from it. But I do think that shops that invest in this way of working are going to vastly outproduce ones that don't.
LLMs are the first technology where everyone literally has a different experience. There are so many degrees of freedom in how you prompt. I actually believe that people's expectations and biases tend to correlate with the outcomes they experience. People who approach it with optimism will be more likely to problem-solve the speed bumps that pop up. And the speed bumps are often things that can mostly be addressed systemically, with tooling and configuration.
If all you're doing is reviewing behaviour and tests then yes almost 100% of the time if you're able to document the problem exact enough codex 5.3 will get it right.
I had codex 5.3 write flawless svelte 5 code only because I had already written valid svelte 5 code around my code.
The minute I started a new project and asked it to use svelte 5 and let it loose it not only started writing a weird mixture of svelte 3/4 + svelte 5 code but also straight up ignored tailwind and started writing it's own CSS.
I asked it multiple times to update the syntax to svelte 5 but it couldn't figure it out. So I gave up and just accepted it, that's what I think is going to happen more frequently. If the code doesn't matter anymore and it's just the process of evaluating inputs and outputs then whatever.
However if I need to implement a specific design I will 100% end up spending more time generating than writing it myself.
I'm working in a very mature codebase on product features that are not technically unprecedented, which probably is determining a lot of my experience so far. Very possible that I'm experiencing a sweet spot.
I can totally imagine that in greenfield, the LLM is going to explore huge search spaces. I can see that when observing the reasoning of these same models in non-coding contexts.
That's exactly what I meant, when I've used LLMs on mature code bases it does very well because the code base was curated by engineers. When you have a greenfield project it's slop central, it's literally whatever the LLM has been trained on and the LLM can get to compile and run.
Which is still okay, only until I have access to good and cheap LLMs.
This person is not using Claude Code or Cursor. They refuse to use the tools and have convinced themselves that they are right. Sadly, they won't recognize how wrong they were until they are unemployable.
I was a huge skeptic on this stuff less than a year ago, so I get it. For a couple years, the hype was really hype, when it came to the actual business utility of AI tools. It's just interesting to me the extent to which people have totally different lived experiences right now.
I do agree that some folks are in for rude awakening, because markets (labor and otherwise) will reveal winning strategies. I'm far from a free market ideologist, but this is a place where the logic seems to apply.
To be totally fair to them... it is quite literally in the last few months that the tools have actually begun to meet the promises that the breathless hypers have been screeching about for years at this point.
But it's also true that it simply is better than the OP is giving it credit for.
If Claude Code or Cursor is actually that good then we're all unemployed anyway. Using the tools won't save any of our jobs.
I say this as someone who does use the tools, they're fine. I have yet to ever have an "it's perfect, no notes" result. If the bar is code that technically works along the happy path then fine, but that's the floor of what I'm willing to put forth or accept in a PR.
> If Claude Code or Cursor is actually that good then we're all unemployed anyway. Using the tools won't save any of our jobs.
There is absolutely reason for concern, but it's not inevitable.
For the foreseeable future, I don't think we can simply Ralph Wiggum-loop real business problems. A lot of human oversight and tuning is required.
Also, I haven't seen anything to suggest that AI is good at strategic business decisionmaking.
I do think it dramatically changes the job of a software developer, though. We will be more like developers of software assembly lines and strategists.
Every company I have ever worked for has had a deep backlog of tasks and ideas we realistically were never going to get to. These tools put a lot of those tasks in play.
> I have yet to ever have an "it's perfect, no notes" result.
It frequently gets close for me, but usually some follow-up is needed. The ones that are closest to pure one-shot are bug fixes where replication can be captured in a regression test.
> Every company I have ever worked for has had a deep backlog of tasks and ideas we realistically were never going to get to. These tools put a lot of those tasks in play.
Some of that backlog was never meant to be implemented. “Put it in the backlog” is a common way to deflect conflict over technical design and the backlog often becomes a graveyard of ideas. If I unleashed a brainless agent on our backlog the system would become a Frankenstein of incompatible design choices.
An important part of management is to figure out what actually brings value instead of just letting teams build whatever they want.
That's different form my experience. I've worked many places where there are loads of valuable ideas in the backlog or bugs that are real, but don't have enough impact to prioritize. But the business has limited resources, and there are higher value things on the roadmap.
I'm experiencing the early stages of a reality where much more of this stuff is possible to build. I say early stages, because there's still plenty of friction between what we have now and a true productivity multiplier. But most of that friction is solvable without speculative improvements, like the models themselves getting better.
If I worked someplace where there was nothing of value on the backlog, then I would be worried about my job.
> If Claude Code or Cursor is actually that good then we're all unemployed anyway.
I don't know about that. This PR stunt is a greenfield project that no one really knows what volume of work went behind it, and targeted a problem (bootstrapping a C compiler) that is actually quite small and relatively trivial to accomplish.
Go ahead and google for small C compilers. They are a dime a dozen, and some don't venture beyond a couple thousand lines of code.
> Who you are: Strong software engineering background with TypeScript in production. Hands-on with AI coding tools (Cursor, Claude Code, Aider, Copilot)
Hilarious take. There's absolutely no advantage to learning to use LLMs now. Even LLM "skills", if you can call it that, that you may have learnt 6 months ago are already irrelevant and obsolete. Do you really think a smart person couldn't get to your level in about an hour? You are not building fundamental skills and experience by using LLM agents now, you're just coasting and possibly even atrophying.
I am one of the ones who reviews code and pushes projects to the finish line for people who use AI like you. I hate it. The code is slop. You don’t realize because you aren’t looking close enough, but we do and it’s annoying
I disagree with the characterization as "slop", if the tools are used well. There's no reason the user has to submit something that looks fundamentally different from what they would handwrite.
You can't simply throw the generated code over the wall to the reviewer. You have to put in the work to understand what's being proposed and why.
Lastly, an extremely important part of this is the improvement cycle.
The tools will absolutely do suboptimal things sometimes, usually pretty similar to a human who isn't an expert in the codebase. Many people just accept what comes out. It's very important to identify the gaps between the first draft, what was submitted for code review, and the mergeable final product and use that information to improve the prompt architecture and automation.
What I see is a tool that takes a lot of investment to pay off, but where the problems for operationalizing it are very tractable, and the opportunity is immense.
I'm worried about many other aspects, but not the basic utility.
Here’s the thing, they say all the same things you just said in this comment. Yet, the code I end up having to work in is still bad. It’s 5x longer than it needs to be and the naming is usually bad so it takes way longer to read than human code. To top it off, very often it doesn’t integrate completely with the other systems and I have to rewrite a portion which takes longer because the code was designed to solve for a different problem.
If you are really truly reviewing every single line in a way that it is the same as if you hand wrote it… just hand write it. There’s no way you’re actually saving time if this is the case. I don’t buy that people are looking at it as deeply as they claim to be.
> If you are really truly reviewing every single line in a way that it is the same as if you hand wrote it… just hand write it.
I think this is only true for people who are already experts in the codebase. If you know it inside-out, sure, you can simply handwrite it. But if not, the code writing is a small portion of the work.
I used to describe it as this task will take 2 days of code archaeology, but result in a +20/-20 change. Or much longer, if you are brand new to the codebase. This is where the AI systems excel, in my experience.
If the output is +20/-20, then there's a pretty good chance it nailed the existing patterns. If it wrote a bunch more code, then it probably deserves deeper scrutiny.
In my experience, the models are getting better and better at doing the right thing. But maybe this is also because I'm working in a codebase where there are many example patterns in the codebase to slot into and the entire team is investing heavily in the agent instructions and skills, and the tooling.
It may also have to do with the domain and language to some extent.
Yes, the code archaeology is the time consuming part. I could use an LLM to do that for me in my co-workers generated code, but I don’t want to because when I have worked with AI I have found it to typically create overly-complex and uncreative solutions. I think there may be some confirmation bias with LLM coders where they look at the code and think it’s pretty good, so they think it’s basically the same way they would have written it themselves. But you miss a lot of experiences compared to when you’re actually in the code trenches reading, designing, and tinkering with code on your own. Like moving around functions to different modules and it suddenly hits you that there’s actually a conceptual shift you can make that allows you to code it all much simpler, or recalling that shareholder feedback from last week that — if worked with —could allow you a solution pathway that wasn’t viable with the current API design. I have also found that LLMs make assumptions about what parts of the code base can and can’t be changed, and they’re often inaccurate.
> But you miss a lot of experiences compared to when you’re actually in the code trenches reading, designing, and tinkering with code on your own.
Completely agree. Working with this tooling is a fundamentally different practice.
I'm not trying to suggest that agentic coding is superior in every way. I simply believe that in my own experience, the current gains exceed the drawbacks by a large margin for many applications, and that significantly higher gains are within close reach (e.g. weeks).
I spent years in management, and it's not dissimilar to that transition. In my first role as a manager, I found it very difficult to divest myself of the need to have fine-grained knowledge of and control over the team's code. That doesn't scale. I had to learn to set people up for success and manage from a place of uncertainty. I had to learn to think like a risk manager instead of an artisan.
I'll also say that when it comes to solution design, I have found it very helpful to ask the agent to give me options when it comes to solutions that look suboptimal. Often times, I can still find great refactor opportunities, and I can have agent draw up plans for those improvements and delegate them to parallel sessions, where the focus can be safely executing a feature-neutral refactor.
Separately from that, I would note that the business doesn't always need us to be making conceptual shifts. Great business value can be delivered with suboptimal architecture.
It is difficult to swallow, but I think that those of us whose market value is based on our ability to develop systems by manipulating code and getting feedback from the running product will find that businesses believe that machines can do this work more than good enough and at vastly higher scale.
For the foreseeable future, there will be places where hands-on coding is superior, but I see that becoming more the exception than the norm, especially in product engineering.
Your perspective is quite thoughtful, thank you. I do agree that if you are just fixing a bug or updating function internals, +20/-20 is certainty good enough and I wouldn’t oppose AI used there.
I am going to have to agree to disagree overall though, because the second there is something the AI can’t do the maintenance time for a human to learn the context and solve the problem skyrockets (in practice, for me) in a way I find unacceptable. And I may be wrong, but I don’t see LLMs being able to improve to close that gap soon because that would require a fundamental shift away from what the LLMs are under the hood.
This was a really interesting conversation, and I learned a lot from your thoughts and everyone else's on this thread.
As I said up top:
> LLMs are the first technology where everyone literally has a different experience.
I totally believe you when you say that you have not found these tools to be net useful. I suspect our different perceptions probably come from a whole bunch of things that are hard to transmit over a discussion like this. And maybe factors we're not even aware of -- I am benefiting from a lot of investment my company has made into all of the harness around this.
But I do pretty strongly believe that I'm not hallucinating how well it's all working in my specific context.
I feel so weird not being the grumpy one for once.
Can't relate to GP's experience of one-shotting. I need to try a couple of times and really hone in on the right plan and constraints.
But I am getting so much done. My todo list used to grow every year. Now it shrinks every month.
And this is not mindless "vibe coding". I insist on what I deploy being quality, and I use every tool I can that can help me achieve that (languages with strong types, TDD with tests that specify system behaviour, E2E tests where possible).
I'm on my 5th draft of an essentially vibe-coded project. Maybe its because I'm using not-frontier models to do the coding, but I have to take two or three tries to get the shape of a thing just right. Drafting like this is something I do when I code by hand, as well. I have to implement a thing a few times before I begin to understand the domain I'm working in. Once I begin to understand the domain, the separation of concerns follows naturally, and so do the component APIs (and how those APIs hook together).
- like the sister comment says, use the best model available. For me that has been opus but YMMV. Some of my colleagues prefer the OAI models.
- iterate on the plan until it looks solid. This is where you should invest your time.
- Watch the model closely and make sure it writes tests first, checks that they fail, and only then proceeds to implementation
- the model should add pieces one by one, ensuring each step works before proceeding. Commit each step so you can easily retry if you need to. Each addition will involve a new plan that you go back and forth on until you're happy with it. The planning usually gets easier as the project moves along.
- this is sometimes controversial, but use the best language you can target. That can be Rust, Haskell, Erlang depending on the context. Strong types will make a big difference. They catch silly mistakes models are liable to make.
Cursor is great for trying out the different models. If opus is what you like, I have found Claude code to be better value, and personally I prefer the CLI to the vscode UI cursor builds on. It's not a panacea though. The CLI has its own issues like occasionally slowing to a crawl. It still gets the work done.
With the AI. I read the whole thing and correct the model where it makes mistakes, fill the gaps where I find them.
I also always check that it explicitly states my rules (some from the global rules, some from the session up until that moment) so they're followed at implementation time.
In my experience opus is great at understanding what you want and putting it in a plan, and it's also great at sticking to the plan. So just read through the entire thing and make sure it's a plan that you feel confident about.
There will be some trial and error before you notice the kind of things the model gets wrong, and that will guide what you look for in the plan that it spits out.
> Maybe its because I'm using not-frontier models to do the coding
IMO it’s probably that. The difference between where this was a a year ago and now is night and day, and not using frontier models is roughly like stepping back in time 6-12 months.
I'm guessing denounce is for bad faith behavior, not just low quality contributions. I think it's actually critical to have a way to represent this in a reputation system. It can be abused, but abuse of denouncement is grounds for denouncement, and being denounced by someone who is denounced by trusted people should carry little weight.
I'm pretty sure this project just does the storage model. It's up to communities that use it to determine the semantics and derive reputation and other higher level concepts from the data.
I don't understand the dismissiveness. I think it's pretty clear. Keeping it going for the current users, but not trying to innovate. It might seem weird in an industry that prizes constant innovation and disruption, but this is a mature thing to do.
Not really. Investors with hundreds of billions of dollars have decided it. The process by which capital has been allocated the way it has isn't some mathematically natural or optimal thing. Our market is far from free.
Saying "investors with hundreds of billions decided it" makes it sound like a few people just chose the outcome, when in reality prices and capital move because millions of consumers, companies, workers, and smaller investors keep making choices every day. Big investors only make money if their decisions match what people actually want; they can't just command success. If they guess wrong, others profit by allocating money better, so having influence isn't the same as having control.
The system isn't mathematically perfect, but that doesn't make it arbitrary. It works through an evolutionary process: bad bets lose money, better ones gain more resources.
Any claim that the outcome is suboptimal only really means something if the claimant can point to a specific alternative that would reliably do better under the same conditions. Otherwise critics are mostly just expressing personal frustration with the outcome.
This is reminding me of the crypto self-custody problem. If you want complete trustlessness, the lengths you have to go to are extreme. How do you really know that the machine using your private key to sign your transactions is absolutely secure?
But it's actually a tremendous amount of friction, because it's the difference between being able to let agents cook for hours at a time or constantly being blocked on human approvals.
And even then, I think it's probably impossible to prevent attacks that combine vectors in clever ways, leading to people incorrectly approving malicious actions.
Yes, because the upside is so high. Exploits are uncommon, at this stage, so until we see companies destroyed or many lives ruined, people will accept the risk.
I'm not saying this is perfect or unproblematic. Far from it. But I do think that shops that invest in this way of working are going to vastly outproduce ones that don't.
LLMs are the first technology where everyone literally has a different experience. There are so many degrees of freedom in how you prompt. I actually believe that people's expectations and biases tend to correlate with the outcomes they experience. People who approach it with optimism will be more likely to problem-solve the speed bumps that pop up. And the speed bumps are often things that can mostly be addressed systemically, with tooling and configuration.
reply