Hacker Newsnew | past | comments | ask | show | jobs | submit | jmcodes's commentslogin

Isn't it just arguing that one complex weighted graph was tuned to output tokens that more align with what current day users would define as 'taste'?

I don't think it necessarily says anything about a model itself having 'taste' in some subjective way.

If the fashion changes would the model update with it without retraining? No. So the model doesn't have 'taste' in that sense. It has alignment to current human definitions of taste.


Do you rewrite the specs with new requirement changes if they've already been implemented? How do you supercede a spec?

I've been using LLMs daily and I spun up a few spec driven flows once or twice but like the person above I think the code is the source of truth.

Also why wouldn't you use TDD to enforce the 'spec' then?


I was already spending a lot of time reviewing other people's code. It makes no difference to me if it's coming from an agent or a person.

I can pick and choose which parts of the problem deserve my attention and which can be done by the LLM with me just keeping an eye on it while I mostly work on something else. I don't have metrics but I feel like I am doing higher leverage work with less friction.

Setting up the systems around the LLM itself is fun too. Hacking on harnesses and trying to improve the UX or the metrics is fun. Playing with different workflow topologies across agents is fun. Diving deep into context strategies, memory systems, prompting is fun. Trying to marry ideas from the past with what LLMs enable now is fun.

I don't see how this is soulless or unbearable but granted I'm not at a place that is demanding I maximize throughput. That would suck.


nothing is really fun in trying to hack around a model to make him do things you want knowing that its barely patching it until the next disease shows up. its painful.also it is probably a temporary step before everything close up behind closed software and closed API (like anthropic tries to) and the harness wont be yours it will be on a remote cloud server. then you will just spend the time review llm generated code. also the scale and ambition of a LLM generated piece of code can be daunting. the most basic pattern since LLM can code well is overdoing. docs are longer, blogs are longer, ppt are longer, PR are longer while quality naturally drop because more people can produce this kind of content now

- building a reliable enough system out of unreliable parts is pretty fun for me. It's a different kind of puzzle.

- pi-agent or code your own harness, it is not hard.

- codex for now allows you to use it in any harness but if they ever don't open models are catching up

- Reject low quality PRs the same as before. If management doesn't enable to you do that then that's a different problem.


but it's not fun for everyone. reviewing code for people vs agents is a massive difference for me. with people i can teach them. with LLMs i can't. it makes me feel helpless. just like i feel helpless using broken non-free apps. i can tolerate the same bugs in FOSS application, because i potentially can fix them even if i don't actually do it. i don't feel helpless, i feel like i have a choice. with LLMs i don't get that choice.

most of the other things you mention feel tedious to me. i like diving into the code, understanding how to solve a problem, figuring out how to make the code structure look elegant and readable. find and comment on a clever short-cut, even if it means that i may not be clever enough to debug it later.

i don't get any of that with LLM generated code. i'd spend more time to clean up and fix the LLM code than i would writing it from scratch. to use LLM code efficiently, i'd have to give up all that. but i don't want to do that.

using LLM to code feels like gambling. every time i put in a prompt, its like rolling the dice. am i going to get a useful solution this time? and then reroll until i get a useful result rather than building up the application one step at a time.


Yeah I can kind of see that and if you don't like them you don't like them. Not all tools are for all people.

I personally draw the line at plugins that try to set up entire workflows and take the human completely out of the loop. Those are next to useless imo for an engineer who knows what they are doing and are exactly how you end up with crappy code/products.

But to give my thoughts to your points I guess I just don't really care that I can't teach LLMs? It doesn't bother me because I do also still teach people, it's not one or the other.

On what you like about coding. I like that too, I still do it where I want to or where it is needed.

I agree with you on what parts are enjoyable but I guess I don't feel that I'm giving them up? I get to pick the problems I work on that way now. The only disagreement there is around 'clever' shortcuts. I get pleasure out of making things debuggable and traceable for humans.

I wish my odds at gambling were as high as they are with LLM generated code lol.

I do run into the whole 'this session was a waste, need to restart', but like once in a blue moon? Not nearly enough to turn me off from using LLMs daily.

On the teaching point again, my learnings around coding standards, architecture -> problem mapping, how to debug, are applied at the system prompt level and around a few key skill files, so when I say "implement ..." or "I'm seeing this behavior, where in the codebase is the most likely root cause? Why?" It does so close to how I would've done it.

I cannot speak for people who are using these things raw in the harnesses provided by the companies, or god forbid in the browser but you can definitely increase the odds of a good roll enough to be productive by changing the environment around the llm and to me that is the opposite of feeling helpless when it comes to LLMs.

I feel enabled to get more done, at my standards, on my time.


What would you consider a "hard" problem?

Extrapolating the final position of the goalposts

That's a halting problem, I think.

And Claude can actually tackle it the same way as humans do - here in the real world, where we don't have time to let some nonsense like "mathematically proven to be unsolvable" to stand between us and our goals: it can eyeball the code and give a good enough guess.

It can certainly get in a loop burning up tokens before deciding to exit some time later.

I don't know how to define hard problems.

All I know is that we have a gigantic amount of tech debt we accumulated on the web chasing the next web framework built on top of tons of abstractions with very disappointing native web apis that shouldn't be taken seriously nor the w3c who specified them.

And when an Agent it's capable of gluing together a web app with some crud backend with a very rounded corners UI, that solves nothing for end users, we call them capable. These are not hard problems


You insist that AI needs to be able to tackle hard problems, but can't say what qualifies as a hard problem. Can you see the problem with that? If you don't know what a hard problem looks like, how do you know the models can't tackle them?

It’s that it’s to able to tackle hard problems really. It’s because you have to give it the solution, and the patterns to follow, and then monitor it because it will go down weird paths.

If you’ve ever work directly with a user, you know how vague change requests can be. Try writing some vague prompts like that to the agent and see if it can solve them.

For some, writing down a (good?) specs and handing it to an agent is not very productive. Because by then, they already have an idea of the solution and can use the editor to have it done.


I don't think they do but you can always use OpenCode or Pi Agent.


very easy to configure claude code to route to GLM as well.


Same loved them, told my team about them, got them to switch off of cursor, now I'm telling them to swap to Codex.

Anthropic really pissed me off with their harness crap. They're well within their rights but their communication over it was enough to get me to swap. I don't need extra hurdles when there's a perfectly valid alternative right there. They don't have the advantage they think they do.


I think we are inevitably heading to using the cheap Chinese models like Kimi, GLM, and Minimax for the bulk of engineering tasks. Within 3-6 months they will be at Opus 4.6 level.


This was literally my task today, to try out Qwen 9B locally on my, albeit a bit memory-constrained at 18GB, macbook with pi or opencode. Before reading this update.


Minimax coding plan is $10 a month for roughly 3x the $20 Claude Pro CLI usage allowed. That would be good place to start. 200k context though.


MiniMax has its own issues. Server overloads, API errors, and failure to adhere to even the system prompt. It can happily work for hours and get no job done.


Just like me :)


Please report back, would be very interested in your findings.


I ran OpenCode + GLM-5.1 for three weeks during my vacation. It’s okay. It thinks a lot more to get to a similar result as Claude. So it’s slower. It’s congested during peak hours. It has quirks as the context gets close to full.

But if you’re stuck with no better model, it’s better than local models and no models.

I have to say, OpenCode’s OpenUI has taught me what modern TUIs can be like. Claude’s TUI feels more like it’s been grown than designed. I’m playing around with TUI widgets trying to recreate and improve that experience


> I have to say, OpenCode’s OpenUI has taught me what modern TUIs can be like. Claude’s TUI feels more like it’s been grown than designed.

Claude's TUI is not a TUI. It's the most WTF thing ever: the TUI is actually a GUI. A headless browser shipped the TUI that, in real-time, renders the entire screen, scrolls to the bottom, and converts that to text mode. There are several serious issues and I'll mention two that do utterly piss me off...

1. Insane "jumping" around where the text "scrolls back" then scrolls back down to your prompt: at this point, seen the crazy hack that TUI is, if you tell me the text jumping around in the TUI is because they're simulating mouse clicks on the scrollbar I would't be surprised. If I'm not mistaken we've seen people "fixing" this by patching other programs (tmux ?).

2. What you see in the TUI is not the output of the model. That is, to me, the most insane of it all. They're literally changing characters between their headlessly rendered GUI and the TUI.

> Claude’s TUI feels more like it’s been grown than designed.

"grown" or "hacked" are way too nice words for the monstrosity that Claude's TUI is.

Codex is described as a: "Lightweight coding agent that runs in your terminal". It's 95%+ Rust code. I wonder if the "lightweight" is a stab at the monstrosity that Claude's TUI is.


To be clear, was OpenCode a better in your opinion compared to ClaudeCode?


Better UI, worse model (GLM), probably slightly worse agentic runtime.

In spite of how glitchy Claude feels, it makes decisions fast.


For what it's worth: here's my experience in the first 10 minutes of using Qwen locally to write some code. https://github.com/robertkarl/local-qwen-first-10-minutes it includes some token generation numbers and steps to repro.


how was it? I'm doing this today


I will report back... but I have to recommend this comment on a post about Qwen 3.6 https://news.ycombinator.com/item?id=47843466 by daemonologist

it goes into detail about llama-server args; quants to try; and layer/kv cache splits. I plan to try the techniques there.


Kimi K3 in July-September is the big one.


Kimi 2.6 works roughly like Opus 4.6, when it used to work. Depending on the task, a bit better or a bit worse. And it's MUCH cheaper.


From this morning: I had a single go file with like 100 loc, I asked it to add debug prints, it thought for 5+ minutes, generating ~1m output token and did not actually update my file.


Which harness? Did you use OpenRouter?


Anthropic will kick and scream as those are often distilled from their latest models and is cutting into their margin. Though it is not like their hands are clean neither, it is just a different type of stealing, an approved one :-)


How challenging are these to setup locally and have them running?


Getting them running is easy (check out LMstudio or ask one for some recommendations). The real question is whether you have the hardware to make them run fast enough to be useful.


The min req is probably crazy I assume but I'll take a peek :)


This is possibly a hot take but recently I've been having about as much luck with Composer 2 in Cursor as I have with Opus 4.6 in Claude Code.

Opus is obviously the better model, but Cursor's "harness" is doing so much heavy lifting in terms of just magically supplying the broader context the model needs to understand the ramifications of its edits.


One thing I enjoy about Cursor and Codex mac apps is the embedded preview window. I know it's not as hardcore as the terminal/tmux but it's hella convenient. But Cursor bugs me with the opacity around what model I'm using. It seems deliberately to be routing requests based on its perceived complexity. What draws you to codex vs cursor?


I don't maintain this anymore but I experimented with this a while back: https://github.com/jx-codes/lootbox

Essentially you give the agent a way to run code that calls MCP servers, then it can use them like any other API.

Nowadays small bash/bun scripts and an MCP gateway proxy gets me the same exact thing.

So yeah at some level you do have to build out your own custom functionality.


It can and it does especially combined with skills (context files). It can hit REST APIs with CURL just fine. MCP is basically just another standard.

Where it comes in handy has mostly been in distribution honestly. There's something very "open apis web era" about MCP servers where because every company rushed to publish them, you can write a lot of creative integrations a bit more easily.


Not the guy who made it but I immediately wondered if I could use the intermediate steps with some "outline" mode to help me see things in shapes and finally learn to draw a bit.


Our entire extistence and experience is nothing _but_ input.

Temperature changes, visual stimulus, auditory stimulus, body cues, random thoughts firing, etc.. Those are all going on all the time.


Random thoughts firing wouldn't be input, they're an internal process to the organism.


It's a process that I don't have conscious control over.

I don't choose to think random thoughts they appear.

Which is different than thoughts I consciously choose to think and engage with.

From my subjective perspective it is an input into my field of awareness.


Your subjective experience is only the tip of the iceberg of your entire brain activity. The conscious part is merely a tool your brain uses to help it achieve its goals, there's no inherent reason to favor it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: