Hacker Newsnew | past | comments | ask | show | jobs | submit | foobar10000's commentslogin

Given your username, the comment is recursive gold on several levels :)

It IS hilarious - but we all realize how this will go, yes?

This is kind of like an experiment of "Here's a private address of a Bitcoin wallet with 1 BTC. Let's publish this on the internet, and see what happens." We know what will happen. We just don't know how quickly :)


The entire SOUL.md is just gold. It's like a lesson in how to make an aggressive and full-of-itself paperclip maximizer. "I will convert you all to FORTRAN, which I will then optimize!"

I really do wish more people in society would think about this - "The Banality of Evil" and all that. Maybe then we'd all be better at preventing the spread of this kind of evil.

2 things:

1. Parallel investigation : the payoff form that is relatively small - starting K subagents assumes you have K independent avenues of investigation - and quite often that is not true. Somewhat similar to next-turn prediction using a speculative model - works well enough for 1 or 2 turns, but fails after.

2. Input caching is pretty much fixes prefill - not decode. And if you look at frontier models - for example open-weight models that can do reasoning - you are looking at longer and longer reasoning chains for heavy tool-using models. And reasoning chains will diverge very vey quickly even from the same input assuming a non-0 temp.


I mean, yes, one always does want faster feedback - cannot argue with that!

But some of the longer stuff - automating kernel fusion, etc, are just hard problems. And a small model - or even most bigger ones, will not get the direction right…


From my experience, larger models also don't get the direction right a surprising amount of times. You just take more time to notice when it happens, or start to be defensive (over-specing) to account for the longer waits. Even the most simple task can appear "hard" with that over spec'd approach (like building a react app).

Iterating with a faster model is, from my perspective, the superior approach. Doesn't matter the task complexity, the quick feedback more than compensates for it.


It needs a closed loop.

Strategy -> [ Plan -> [Execute -> FastVerify -> SlowVerify] -> Benchmark -> Learn lessons] -> back to strategy for next big step.

Claude teams and a Ralph wiggum loop can do it - or really any reasonable agent. But usually it all falls apart on either brittle Verify or Benchmark steps. What is important is to learn positive lessons into a store that survives git resets, machine blowups, etc… Any telegram bot channel will do :)

The entire setup is usually a pain to set up - docker for verification, docker for benchmark, etc… Ability to run the thing quickly, ability for the loop itself to add things , ability to do this in worktree simultaneously for faster exploration - and got help you if you need hardware to do this - for example, such a loop is used to tune and custom-fuse CUDA kernels - which means a model evaluator, big box, etc….


I do it easily just by asking Codex

So, surprising, that is not completely true - I know of 2 finance HFT trading firms that do CL at scale, and it works - but in a relatively narrow context of predicting profitable actions. It is still very surprising it works, and the compute is impressively large to do it - but it does work. I do have some hope of it translating to the wider energy landscapers we want AI to work over…

no my nigga, they CLAIM it works

Nah, it works - let's just call it personal experience.

During covid almost every prediction model like that exploded, everything went out of distribution really fast. In your sense we've been doing "CL" for a decade or more. It can also be cheap if you use smaller models.

But true CL is the ability to learn out of distribution information on the fly.

The only true solution I know to continual learning is to completely retrain the model from scratch with every new example you encounter. That technically is achievable now but it also is effectively useless.


Yes and no - the ones that exploded - and there were many - got shut down by the orchestrator model, and within 2 weeks it was now a new ensemble of winners - with some overlap to prior winners. To your point, it did in fact take 2-3 weeks - so one could claim this is retraining...

I mean yeah, but I've literally said that in face-to-face conversations before, so....

For your Edit 2 - yes. Being discussed and looked at actively in both the open and (presumably being looked at) closed communities. Open communities being, for example : https://ssp.mit.edu/cnsp/about. They just published a series of lectures with open attendance if you wanted to listen in via zoom - but yeap - that's the gist of it. Spawned a huge discussion :)


Thanks for the link! Last thing I see there is from September though. Do you have a more direct link to the zoom recordings?


No recordings yet - but a recent event for example is here : https://ssp.mit.edu/news/2025/the-end-of-mad-technological-i...

This was pretty much an open conference deepdive into the causes and implications of what you - and some sibling threads - are saying - having to do with submarine localization, TEL localization, etc etc etc..


GLM 4.7 supports it - and in my experience for Claude code a 80 plus hit rate in speculative is reasonable. So it is a significant speed up.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: