Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> There are certain tasks, like improving a given program for speed, for instance, where in theory the model can continue to make progress with a very clear reward signal for a very long time.

Super skeptical of this claim. Yes, if I have some toy poorly optimized python example or maybe a sorting algorithm in ASM, but this won’t work in any non-trivial case. My intuition is that the LLM will spin its wheels at a local minimum the performance of which is overdetermined by millions of black-box optimizations in the interpreter or compiler signal from which is not fed back to the LLM.





> but this won’t work in any non-trivial case

Earlier this year google shared that one of their projects (I think it was alphaevolve) found an optimisation in their stack that sped up their real world training runs by 1%. As we're talking about google here, we can be pretty sure it wasn't some trivial python trick that they missed. Anyhow, at ~100M$ / training run, that's a 1M$ save right there. Each and every time they run a training run!

And in the past month google also shared another "agentic" workflow where they had gemini2.5-fhash! (their previous gen "small" model) work autonomously on migrating codebases to support aarch64 architecture. There they found ~30% of the projects worked flawlessly end-to-end. Whatever costs they save from switching to ARM will translate in real-world $ saved (at google scale, those can add up quickly).


The second example has nothing to do with the first. I am optimistic that LLMs are great for translations with good testing frameworks.

“Optimize” in a vacuum is a tarpit for an LLM agent today, in my view. The Google case is interesting but 1% while significant at Google scale doesn’t move the needle much in terms of statistical significance. It would be more interesting to see the exact operation and the speed up achieved relative to the prior version. But it’s data contrary to my view for sure. The cynic also notes that Google is in the LLM hype game now, too.


Why do you think it's not relevant to the "optimise in a loop" thing? The way I think of it, it's using LLMs "in a loop" to move something from arch A (that costs x$) to arch B (that costs y$), where y is cheaper than x. It's still an autonomous optimisation done by LLMs, no?

Did the LLM suggest moving to the new architecture? If not that’s not what’s under discussion. That’s just following an order to translate.

Ah, I see your point.

> As we're talking about google here, we can be pretty sure it wasn't some trivial python trick that they missed.

Strong disagree on the reasoning here. Especially since google is big and have thousands of developers, there could be a lot of code and a lot of low hanging fruit.


> By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini’s architecture by 23%, leading to a 1% reduction in Gemini's training time.

The message I replied to said "if I have some toy poorly optimized python example". I think it's safe to say that matmul & kernel optimisation is a bit beyond a small python example.


There was a discussion the other day where someone asked Claude to improve a code base 200x https://news.ycombinator.com/item?id=46197930

That’s most definitely not the same thing, as „improving a codebase” is an open ended task with no reliable metrics the agent could work against.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: