Isn't it just arguing that one complex weighted graph was tuned to output tokens that more align with what current day users would define as 'taste'?
I don't think it necessarily says anything about a model itself having 'taste' in some subjective way.
If the fashion changes would the model update with it without retraining? No. So the model doesn't have 'taste' in that sense. It has alignment to current human definitions of taste.
I was already spending a lot of time reviewing other people's code. It makes no difference to me if it's coming from an agent or a person.
I can pick and choose which parts of the problem deserve my attention and which can be done by the LLM with me just keeping an eye on it while I mostly work on something else. I don't have metrics but I feel like I am doing higher leverage work with less friction.
Setting up the systems around the LLM itself is fun too. Hacking on harnesses and trying to improve the UX or the metrics is fun. Playing with different workflow topologies across agents is fun.
Diving deep into context strategies, memory systems, prompting is fun.
Trying to marry ideas from the past with what LLMs enable now is fun.
I don't see how this is soulless or unbearable but granted I'm not at a place that is demanding I maximize throughput. That would suck.
nothing is really fun in trying to hack around a model to make him do things you want knowing that its barely patching it until the next disease shows up. its painful.also it is probably a temporary step before everything close up behind closed software and closed API (like anthropic tries to) and the harness wont be yours it will be on a remote cloud server. then you will just spend the time review llm generated code. also the scale and ambition of a LLM generated piece of code can be daunting. the most basic pattern since LLM can code well is overdoing. docs are longer, blogs are longer, ppt are longer, PR are longer while quality naturally drop because more people can produce this kind of content now
but it's not fun for everyone. reviewing code for people vs agents is a massive difference for me. with people i can teach them. with LLMs i can't. it makes me feel helpless. just like i feel helpless using broken non-free apps. i can tolerate the same bugs in FOSS application, because i potentially can fix them even if i don't actually do it. i don't feel helpless, i feel like i have a choice. with LLMs i don't get that choice.
most of the other things you mention feel tedious to me. i like diving into the code, understanding how to solve a problem, figuring out how to make the code structure look elegant and readable. find and comment on a clever short-cut, even if it means that i may not be clever enough to debug it later.
i don't get any of that with LLM generated code. i'd spend more time to clean up and fix the LLM code than i would writing it from scratch. to use LLM code efficiently, i'd have to give up all that. but i don't want to do that.
using LLM to code feels like gambling. every time i put in a prompt, its like rolling the dice. am i going to get a useful solution this time? and then reroll until i get a useful result rather than building up the application one step at a time.
Yeah I can kind of see that and if you don't like them you don't like them. Not all tools are for all people.
I personally draw the line at plugins that try to set up entire workflows and take the human completely out of the loop. Those are next to useless imo for an engineer who knows what they are doing and are exactly how you end up with crappy code/products.
But to give my thoughts to your points I guess I just don't really care that I can't teach LLMs? It doesn't bother me because I do also still teach people, it's not one or the other.
On what you like about coding. I like that too, I still do it where I want to or where it is needed.
I agree with you on what parts are enjoyable but I guess I don't feel that I'm giving them up? I get to pick the problems I work on that way now. The only disagreement there is around 'clever' shortcuts. I get pleasure out of making things debuggable and traceable for humans.
I wish my odds at gambling were as high as they are with LLM generated code lol.
I do run into the whole 'this session was a waste, need to restart', but like once in a blue moon? Not nearly enough to turn me off from using LLMs daily.
On the teaching point again, my learnings around coding standards, architecture -> problem mapping, how to debug, are applied at the system prompt level and around a few key skill files, so when I say "implement ..." or "I'm seeing this behavior, where in the codebase is the most likely root cause? Why?" It does so close to how I would've done it.
I cannot speak for people who are using these things raw in the harnesses provided by the companies, or god forbid in the browser but you can definitely increase the odds of a good roll enough to be productive by changing the environment around the llm and to me that is the opposite of feeling helpless when it comes to LLMs.
I feel enabled to get more done, at my standards, on my time.
And Claude can actually tackle it the same way as humans do - here in the real world, where we don't have time to let some nonsense like "mathematically proven to be unsolvable" to stand between us and our goals: it can eyeball the code and give a good enough guess.
All I know is that we have a gigantic amount of tech debt we accumulated on the web chasing the next web framework built on top of tons of abstractions with very disappointing native web apis that shouldn't be taken seriously nor the w3c who specified them.
And when an Agent it's capable of gluing together a web app with some crud backend with a very rounded corners UI, that solves nothing for end users, we call them capable. These are not hard problems
You insist that AI needs to be able to tackle hard problems, but can't say what qualifies as a hard problem. Can you see the problem with that? If you don't know what a hard problem looks like, how do you know the models can't tackle them?
It’s that it’s to able to tackle hard problems really. It’s because you have to give it the solution, and the patterns to follow, and then monitor it because it will go down weird paths.
If you’ve ever work directly with a user, you know how vague change requests can be. Try writing some vague prompts like that to the agent and see if it can solve them.
For some, writing down a (good?) specs and handing it to an agent is not very productive. Because by then, they already have an idea of the solution and can use the editor to have it done.
Same loved them, told my team about them, got them to switch off of cursor, now I'm telling them to swap to Codex.
Anthropic really pissed me off with their harness crap. They're well within their rights but their communication over it was enough to get me to swap. I don't need extra hurdles when there's a perfectly valid alternative right there. They don't have the advantage they think they do.
I think we are inevitably heading to using the cheap Chinese models like Kimi, GLM, and Minimax for the bulk of engineering tasks. Within 3-6 months they will be at Opus 4.6 level.
This was literally my task today, to try out Qwen 9B locally on my, albeit a bit memory-constrained at 18GB, macbook with pi or opencode. Before reading this update.
MiniMax has its own issues. Server overloads, API errors, and failure to adhere to even the system prompt. It can happily work for hours and get no job done.
I ran OpenCode + GLM-5.1 for three weeks during my vacation. It’s okay. It thinks a lot more to get to a similar result as Claude. So it’s slower. It’s congested during peak hours. It has quirks as the context gets close to full.
But if you’re stuck with no better model, it’s better than local models and no models.
I have to say, OpenCode’s OpenUI has taught me what modern TUIs can be like. Claude’s TUI feels more like it’s been grown than designed. I’m playing around with TUI widgets trying to recreate and improve that experience
> I have to say, OpenCode’s OpenUI has taught me what modern TUIs can be like. Claude’s TUI feels more like it’s been grown than designed.
Claude's TUI is not a TUI. It's the most WTF thing ever: the TUI is actually a GUI. A headless browser shipped the TUI that, in real-time, renders the entire screen, scrolls to the bottom, and converts that to text mode. There are several serious issues and I'll mention two that do utterly piss me off...
1. Insane "jumping" around where the text "scrolls back" then scrolls back down to your prompt: at this point, seen the crazy hack that TUI is, if you tell me the text jumping around in the TUI is because they're simulating mouse clicks on the scrollbar I would't be surprised. If I'm not mistaken we've seen people "fixing" this by patching other programs (tmux ?).
2. What you see in the TUI is not the output of the model. That is, to me, the most insane of it all. They're literally changing characters between their headlessly rendered GUI and the TUI.
> Claude’s TUI feels more like it’s been grown than designed.
"grown" or "hacked" are way too nice words for the monstrosity that Claude's TUI is.
Codex is described as a: "Lightweight coding agent that runs in your terminal". It's 95%+ Rust code. I wonder if the "lightweight" is a stab at the monstrosity that Claude's TUI is.
From this morning: I had a single go file with like 100 loc, I asked it to add debug prints, it thought for 5+ minutes, generating ~1m output token and did not actually update my file.
Anthropic will kick and scream as those are often distilled from their latest models and is cutting into their margin. Though it is not like their hands are clean neither, it is just a different type of stealing, an approved one :-)
Getting them running is easy (check out LMstudio or ask one for some recommendations). The real question is whether you have the hardware to make them run fast enough to be useful.
This is possibly a hot take but recently I've been having about as much luck with Composer 2 in Cursor as I have with Opus 4.6 in Claude Code.
Opus is obviously the better model, but Cursor's "harness" is doing so much heavy lifting in terms of just magically supplying the broader context the model needs to understand the ramifications of its edits.
One thing I enjoy about Cursor and Codex mac apps is the embedded preview window. I know it's not as hardcore as the terminal/tmux but it's hella convenient. But Cursor bugs me with the opacity around what model I'm using. It seems deliberately to be routing requests based on its perceived complexity. What draws you to codex vs cursor?
It can and it does especially combined with skills (context files). It can hit REST APIs with CURL just fine. MCP is basically just another standard.
Where it comes in handy has mostly been in distribution honestly. There's something very "open apis web era" about MCP servers where because every company rushed to publish them, you can write a lot of creative integrations a bit more easily.
Not the guy who made it but I immediately wondered if I could use the intermediate steps with some "outline" mode to help me see things in shapes and finally learn to draw a bit.
Your subjective experience is only the tip of the iceberg of your entire brain activity. The conscious part is merely a tool your brain uses to help it achieve its goals, there's no inherent reason to favor it.
I don't think it necessarily says anything about a model itself having 'taste' in some subjective way.
If the fashion changes would the model update with it without retraining? No. So the model doesn't have 'taste' in that sense. It has alignment to current human definitions of taste.
reply