I find it amusing that people (even here on HN) are expecting a brand new tool (among the most complex ever) to perform adequetely right off the bat. It will require a period of refinement, just as any other tool or process.
Would you buy a car that's been thrown together by a immature production and testing system with demonstrable and significant flaws, and just keep bringing it back to the dealership for refinements and the fixing of defects when you discover them? Assuming it doesn't kill you first?
These tools should be locked away in an R&D environment until sufficiently perfected.
MVP means 'ship with solid, tested basic features', not 'Ship with bugs and fix in production'.
People have grown to expect at least adequate performance from products that cost up to 39 dollars a month (* additional costs) per user. In the past you would have called this a tech demo at best.
this stuff works. it takes effort and learning. it’s not going to magically solve high-complexity tasks (or even low-complexity ones) without investment. having people use it, learn how it works, and improve the systems is the right approach
People, specifically managers and C-levels, are being sold on this crap on the idea that it can replace people now, today as-is. Billions upon billions of dollars are being shoved in indiscriminately, toothbrushes are coming with "AI" slapped on somehow from how insane the hype bubble is.
And here we have many examples from the biggest bullshit pushers in the whole market of their state of the art tool being hilariously useless in trivial cases. These PRs are about as simple as you can get without it being a typo fix, and we're all seeing it actively bullshit and straight up contradict itself many times, just as anyone who's ever used LLMs would tell you happens all the time.
The supposed magic, omnipotent tool that is AI apparently can't even write test scaffolding without a human telling it exactly what it has to do, yet we're supposed to be excited about this crap? If I saw a PR like this at work, I'd be going straight to my manager to have whoever dared push this kind of garbage reprimanded on the spot, except not even interns are this incompetent and annoying to work with.
it’s not magic. it can make meaningful contributions (if you actually invest in learning the tools + best practices for using them)
you’re taking an anecdote and blowing it out of proportion to fit your preformed opinion. yes, when you start with the tool and do literally no work it makes bad PRs. yes, it’s early and experimental. that doesn’t mean it doesn’t work (I have plenty of anecdotes that it does!)
the truth lies in between and the mob mentality it’s magic or complete bullshit doesn’t help. I’d love to come to a thread like this and actually hear about real experiences from smart people using these kind of tools, but instead we get this bullshit
> ...(if you actually invest in learning the tools + best practices for using them)
So I keep being told, but after judiciously and really trying my damned hardest to make these tools work for ANYTHING other than the most trivial imaginable problems, it has been an abject failure for me and my colleagues. Below is a FAR from comprehensive list of my attempts at having AI tooling do anything useful for me that isn't the most basic boilerplate (and even then, that gets fucked up plenty often too).
- I have tried all of the editors and related tooling. Cursor, Jetbrains' AI Chat, Jetbrains' Junie, Windsurf, Continue, Cline, Aider. If it has ever been hyped here on HN, I've given it a shot because I'd also like to see what these tools can do.
- I have tried every model I reasonably can. Gemini 2.5 Pro with "Deep Research", Gemini Flash, Claude 3.7 sonnet with extended thinking, GPT o4, GPT 4.5, Grok, That Chinese One That Turned Out To Be Overhyped Too. I'm sure I haven't used the latest and greatest gpt-04.7-blowjobedition-distilled-quant-3.1415, but I'd say I've given a large number of them more than a fair shot.
- I have tried dumb chat modes (which IME still work the best somehow). The APIs rather than the UIs. Agent modes. "Architect" modes. I have given these tools free reign of my CLI to do whatever the fuck they wanted. Web search.
- I have tried giving them the most comprehensive prompts imaginable. The type of prompts that, if you were to just give it to an intern, it'd be a truly miraculous feat of idiocy to fuck it up. I have tried having different AI models generate prompts for other AI models. I have tried compressing my entire codebase with tools like Repomix. I have tried only ever doing a single back-and-forth, as well as extremely deep chat chains hundreds of messages deep. Half the time my lazy "nah that's shit do it again" type of prompts work better than the detailed ones.
- I have tried giving them instructions via JSON, TOML, YAML, Plaintext, Markdown, MDX, HTML, XML. I've tried giving them diagrams, mermaid charts, well commented code, well tested and covered code.
Time after time after time, my experiences are pretty much a 1:1 match to what we're seeing in these PRs we're discussing. Absolute wastes of time and massive failures for anything that involves literally any complexity whatsoever. I have at this point wasted several orders of magnitudes more time trying to get AIs to spit out anything usable than if I had just sat down and done things myself. Yes, they save time for some specific tasks. I love that I can give it a big ass JSON blob and tell it to extract the typedef for me and it saves me 20 minutes of very tedious work (assuming it doesn't just make random shit up from time to time, which happens ~30% of the time still). I love that if there's some unimportant script I need to cook up real quick, I can just ask it and toss it away after I'm done.
However, what I'm pissed beyond all reason about is that despite me NOT being some sort of luddite who's afraid of change or whatever insult gets thrown around, my experiences with these tools keep getting tossed aside, and I mean by people who have a direct effect on my continued employment and lack of starvation. You're doing it yourself. We are literally looking at a prime of example of the problem, from THE BIGGEST PUSHERS of this tool, with many people in this thread and the reddit thread commenting similar things to myself, and it's being thrown to the wayside as an "anecdote getting blown out of proportion".
What the fuck will it take for the AI pushers to finally stop moving the god damn goal posts and trying to spin every single failure presented to us in broad daylight as a "you're le holding it le wrong teehee" type of thing? Do we need to suffer through 20 million more slop PRs that accomplish nothing and STILL REQUIRE HUMAN HANDHOLDING before the sycophants relent a bit?
just wanted to say this was the most relatable take i have read so far, and i've read a lot. Exact same experiences. And you didnt even touch on the MCP's that enable these things to go wild as well. I think our takes are not being taken seriously for 2 reasons.
First marketing gaslighting from the faangs and hot startups with grifters that managed to raise and need to keep the bullshit windmill going.
Second is that these tools are relatively the best in boilerplate nextjs code that the vibecoders use to make a very simple dashboard and stuff, and they're the noisy minority on twitter.
There is basically zero financial incentive to admit LLM's are pushed dangerously beyond their current limits. I'm still figuring a way to go short this, apart from literally shorting the market.
People see that these things generate code and due to their lack of understanding they automatically assume this is all software engineering is.
Then we have the current batch of YC execs heavily pushing "vibe coded" startups. The sad reality is that this strategy will probably work because all they need is the next incredulous business guy to buy the vibe coded startup. There's so much money in the AI space to the point where I fully believe you can likely make billions of dollars this way through acquisition (see OAI buying Windsurf for billions of dollars, likely to devalue Cursor's also absurd valuation).
I'm not a luddite. I'm a huge fan of companies spending a decent chunk of money on R&D on innovative new projects even when there's a high risk of failure. The current LLM hype is not just an R&D project anymore. This is now being pushed as a full on replacement of human labor when it's clearly not ready. And now we're valuing AI startups at billions of dollars and planning to spend $500B on AI infrastructure so that we can generate more ghibli memes.
At some point this has to stop but I'm afraid by that point the damage will already be done. Even worse, the idiots who led this exercise in massive waste will just hop onto the next hype train.
Literally you're in a site where we are anything but armchair. We have years of experience. You're using your ad hominems wrong. Save them for a football thread and come up with actual arguments next time.
its more that the AI-first approach can be frustrating for senior devs to have to deal with. This post is an example of that. We're empathising with the code reviewers.
"brand new"? really? we've had these slop bots for years now. what's with all the fanboys pretending it was just released this month or something. it's not brand new. it's old and failing and executives who bet billions are now dismantling their engineering capabilities to try to make something out of those burned billions. claiming it's brand new is just silly.
It is difficult for ceo/management to understand that the ai tools dont work when their salary depends on them working since they have invested billions into it.