> We should remember that the outside world care about things that work, not about how good they are inside sadly.
How good something is inside is directly responsible for how well it works. Your customers might not care about the former, but they will care when your cuts to the former impact the latter (and they always do impact it, in the end).
That's the exact opposite of rational. It is, in fact, a formal logical fallacy (ad hominem). His argument can be correct even if he himself is not typically correct.
On the surface, that's quite fair. However, there's one problem: it is much easier to make statements than to verify them, and that asymmetry is part of why the internet has been slowly eroding society.
It's useful/necessary to use past writing/arguments from an author to say whether they should actually receive any further critical evaluation, or be dismissed. We shouldn't say definitively "they're always wrong, so they're wrong now". However, it's reasonable to say: the author has a demonstrated lack of credibility, so we can probably assume they're wrong here, particularly if they have been wrong in this domain so many times before. Or if they happen to be correct, it's probably not strongly demonstrated by their work.
> hallucinations are dramatically less of a problem
No they aren't. The models still hallucinate just like they always did. You cannot trust them, ever, to get something right.
> several mass market use cases have emerged, most notably coding
They aren't really useful for coding based upon the above. Since you can't trust them, you have to carefully review everything they make, which in turn destroys any productivity they could've given you.
> rate of progress has increased
I have yet to see any progress. Opus 4.8 that you get today is no more effective than GPT-3.5 was. Much less would I agree that the rate of progress has increased. Only hype has increased, but there has yet to be a drop of substance.
Also, supposed productivity gains are dubious. I personally experience at best no productivity gains when using LLMs to write code, and sometimes it's an active drain on my productivity. There was that one study a year or so ago showing similar results. People are trying to say the productivity gains are there and undeniable, but that is not true. It is very much a subject of controversy whether AI helps productivity.
I can see an argument that the productivity gains are illusory / don’t translate to economic productivity. I’m not denying the possibility.
However, most of the engineers I respect have gone from being skeptics a year ago to convinced today. I don’t personally know any true holdouts any more. If there are studies that disprove productivity gains more than six months ago, I’m happy to believe that it was true of the AIs that were available at the time. But I’m going to need something much more recent before I disbelieve my lyin’ eyes where it pertains to the AIs available today.
I’m going back to being a holdout, but it’s nuanced - My theory into why LLMs don’t lead to the colloquial definition of productivity would be something like - if code was never the bottleneck than generating code faster doesn’t result in more meaningful output.
Even if you take for granted that AI is as good as the best people say in writing code. And Ive spent a lot of time generating codes, I think it does it pretty well. Then the question becomes - does this change your daily incentives such that you reach for code as the solution to your problems rather than something else (coordinating with your colleagues? Product management? Planning and Design?
So from a holistic perspective, I think intentionally limiting your own AI usage is the best approach for maximum long-term productivity.
There is an observational study that was published in March 2026 that followed 4000 teams over 2 years. It shows, in my view, exactly that the productivity gains don't translate into economic value.
If it was published in March 2026, even if the data was collected up to the day the study was published, 7/8ths of it would fail my “within the last six months” test. But I am looking forward to the results of future studies on this topic!
I get wanting to wait for more data. And thinking that LLMs have improved enough that this will change.
My view is that it's not really about how good the models are - it's about how we're using them. Understanding what you've built is an important part of value creation, and LLMs eliminate that.
Its funny, I've noticed the same thing, but did not come to the same conclusion.
I currently don't have work access to Claude Code, but most of my teammates do. Watching from the outside, the cycle seems to look like this:
1. Experience some success, which hooks you into relying on AI.
2. The AI keeps failing at some task, but you don't want to stop. Keep trying over and over again.
3. Run out of tokens and take a break.
Now, sometimes 1 doesn't happen. Sometimes 2 doesn't happen. 3 is a certainty though.
Now, if you told me that the productivity gain from 1 is enough to offset the loss from 2 and 3, I could believe you. But I also wouldn't be surprised if it didn't.
As I work with Claude more and gain a feel for its capabilities, I tend to run into 2 far less often, as I'll decompose my messages more for the current model limitations. The threshold also changes each release.
Just want to point out that such claims have been made, falsely, for every model to come out in the past three years. It's almost certainly not true this time either.
Carpentry is not the same thing as woodworking, to be fair. The latter has the connotation of making furniture, trim, and other such items that people want to look nice. Carpentry does not necessarily have that connotation. It's a kind of "all squares are rectangles, but not all rectangles are squares" situation.
> Now it’s time to get over it, learn the new tools and adapt.
No, thank you. I have used the new tools, determined that they aren't helpful to me, and set them aside as I would with any other bad tool. I don't feel the need to let hype take the steering wheel.
> Do an AI pass, and have humans verify, and vice versa. Let the humans drive the AI.
You can do that, sure. But doing so negates any improvements in speed the LLM brought. And at that point, you may as well just do it yourself to begin with.
When Google showed up on the scene I found I no longer needed to memorize basic syntax and other such things. If I couldn't remember on the fly, i'd just do a quick google search and move on. This freed space in my mind to instead focus on bigger & better things.
I use GenAI tools when coding a lot, but I do not vibe code. I go through everything it generated, and we iterate. And yes, it doesn't save me a lot of time. But what it does do is free up mental capacity in a similar manner. But instead of syntax, it's more complicated patterns. Maybe I don't remember how to stitch something together, but i know it can be done. Instead of spending the time to look it up and then code it, I just tell it to do it for me.
> Maybe I don't remember how to stitch something together, but i know it can be done.
That's how I use the current AI, too. I never ask them to do something without specifying how it should be done. I ask questions first, use /plan to let the model ask me questions, then I let it execute the plan while reviewing the results. More and more often, I get something close enough to what I would have written. In the opposite case, I at least know exactly how to rewrite the result, if needed.
I observe the same effect as you: while it does sometimes speed up the implementation a bit, it's not very noticeable; however, it frees me from having to recall all the obscure little details up front. Instead, I can describe them, have the model implement them, and then recognize them (and refresh my memory) when reviewing. The effect is that it's easier to start a task because I don't need to prepare as much to execute it. It's especially notable on things that I haven't touched for some time. I know, more or less, how my Elixir projects are set up, but after ~2 years of not working on them, getting back into them had been a hassle - with AI, it's no longer that. I think the biggest difference comes from the AI lowering the cost of context switching for me - I used to have huge problems with that, and AI certainly helped a lot.
Yeah, humans reviewing the AI review can only detect the false positives, where the LLM claims something is non-compliant and flags it for review/correction by a human or another agent. Human review can’t find the false negatives (true deficiencies not flagged) unless you do a full audit yourself to find whatever deficiencies the AI missed.
There are a few games I've found that don't work under Proton. Rome 2 Total War crashes frequently, Battletech ( the tactics game from a decade or so ago) doesn't always launch and has sound issues, and MechWarrior 5 joystick support doesn't work (the game is fine if you do another input method, but I'm not gonna play a sim game and not use my nice joystick). The instances of games which don't work in Proton are thankfully few and far between, but unfortunately it isn't just games with anticheat rootkits.
How good something is inside is directly responsible for how well it works. Your customers might not care about the former, but they will care when your cuts to the former impact the latter (and they always do impact it, in the end).
reply