The elegant solution rarely happens on the first try. Many times you need to first arrive at a solution, and then keep iterating on it until it's elegant. Akin to "sorry I didn't have time to write a shorter letter".
IME human developers also span a spectrum on this. On one end, you have devs who might meditate half a day on different solutions before writing a line of code. On the other end are devs who run full speed ahead with the first working solution that comes to mind. LLMs in their current form are mostly the latter.
Written language is deep tech itself. There's evidence it changed our brain morphology even. So ya it deeply affected kids abilities, for example memorizing long poems or whatever.
It's a skill set just like coding. You can embrace an elevated workflow where you can forget about the specific syntax and focus on the architecture and integration. It takes time to intuit what exactly the models are bad at, so you can forsee hallucinations and prevent them from happening in the first place. Yes you can write 1 line faster than Claude, but what about 10 lines? 100? 1000?
> Yes you can write 1 line faster than Claude, but what about 10 lines? 100? 1000?
Bingo. One quick edit when you already know what needs to be done is trivial, that means nothing. What happens when you have to write a new feature and it will take hundreds of lines of code? Unless you're an elder god of programming, the LLM will lap you easily.
We currently have a tracking code that is deprecated, and stopping it from going out is a one line removal in a switch statement. I have a ticket for it, it’s 1 point, easy.
I’d wager a guess that if I told an LLM “we don’t want the ‘worker-item-click-apply’ event to fire”, I’m going to get a mix of code edits, maybe a new module, and probably a filter at the API call level, because it’s going to be too clever.
> What happens when you have to write a new feature and it will take hundreds of lines of code? Unless you're an elder god of programming, the LLM will lap you easily.
Then I’m going to write hundreds of lines, I’m going to write tests, I’m going to painstakingly compare my work to Figma, and I’m going to do it a lot slower than an LLM. I’m also going to understand the code, inside and out, and when our new engineer hops in to help add a feature or fix a bug, I’ll know where to send them, I’ll be able to explain the code to them, and we will both grow a better understanding of our codebase.
Could an LLM do that, or help? Sure, and I know that it will take a lot of effort and refinement that is just going to be churn.
They're targeting the 90% of code that doesn't really need to be looked at. Software is already so complex and interconnected that it is fully beyond human capabilities, each person only knows a tiny part of the stack. If you create your own full system from scratch, it's not going to be very generally useful.
I wouldn’t say harnesses is simple. They do a lot of things that we aren’t thinking of. I learned that a good harnesses is as valuable as the model. But obviously the model is what carries the whole thing.
I tried to build my own harness once. The amount of work that is required is incredible. From how external memory is managed per session to the techniques to save on the context window, for example, you do not want the llm to read in whole files, instead you give it the capability to read chunks from offsets, but then what should stay in context and what should be pruned.
After that you have to start designing the think - plan - generate - evaluation pipeline. A learning moment for me here was to split up when the llm is evaluating the work, because same lllm who did the work should not evaluate itself, it introduces a bias. Then you realize you need subagents too and statt wondering how their context will be handled (maybe return a summarized version to the main llm?).
And then you have to start thinking about integration with mcp servers and how the llm should invoke things like tools, prompts and resources from each mcp. I learned llms, especially the smaller ones tend to hiccup and return malformed json format.
At some point I started wondering about just throwing everything and just look at PydanticAi or Langchain or Langgraph or Microsoft Autogen to operate everything between the llm and mcps. Its quite difficult to make something like this work well, especially for long horizontal tasks.
I've been running a custom harness in production for months (code gen on cloud boxes), it's quite simple (<500 LOC). The model can use sed or other nice bash tricks to efficiently read. U really don't need any tool besides bash plus a good system prompt. Subagents are the same as the main agent (end with a summary). U can just remove tool results (oldest first) to save context, the model can read again if it needs to.
High quality memory is the only difficult part. But that can also be solved with high quality documentation
Harnesses are simple (kind of? Some certainly aren't, but I'd agree that they can be simple) but they deliver a ton of value. They have a significant ROI.
I agree that good models have more value because a harness can't magically make a bad model good, but there's a lot that would be inordinately difficult without a proper harness.
Keeping models on rails is still important, if not essential. Great models might behave similarly in the same harness, but I suppose the value prop is that they wouldn't behave as well on the same task without a good harness.
There is a reason why Copilot+Opus4.6 is shit, while Claude Code + Opus 4.6 produces excellent results.
The harness matters A LOT.
The model is the engine, the harness is the driver and chassis. Even the best top of the line engine in a shitty car driven by a bad driver won't win any races.
Slop becomes impossible to maintain, and eventually product velocity slows down. Maybe it's ok for an ultra simple todo app, but for most apps code quality absolutely matters... Users expect snappy UX and all the new bells and whistles.
Why did Whatsapp grow so big while thousands of previous chat apps didn't code quality (scalability)
How is this different from previous layers of abstraction? React/JS dev don't have to think about memory management or a million other things that C++ application devs did. Instead that cognitive load is unbundled onto the framework maintainers, and frontend devs can be much more productive.
Obviously react/js didn't cause job apocalypse... Quite the opposite. It's just another abstraction layer making it possible to build a full application with less text. Prompts are the same pattern again IMO
There's too much value in familiar UX. "Don't make the user think" is the golden rule these days. People used to have mental bandwidth for learning new interfaces... But now people expect uniformity
(Not fine tuning, but interesting none the less. If a model can so easily find a more elegant solution, why didn't it pick that in the first place?)
reply