> There are certain tasks, like improving a given program for speed, for instanc...

NitpickLawyer · 2025-12-20T17:08:04 1766250484

> but this won’t work in any non-trivial case

Earlier this year google shared that one of their projects (I think it was alphaevolve) found an optimisation in their stack that sped up their real world training runs by 1%. As we're talking about google here, we can be pretty sure it wasn't some trivial python trick that they missed. Anyhow, at ~100M$ / training run, that's a 1M$ save right there. Each and every time they run a training run!

And in the past month google also shared another "agentic" workflow where they had gemini2.5-fhash! (their previous gen "small" model) work autonomously on migrating codebases to support aarch64 architecture. There they found ~30% of the projects worked flawlessly end-to-end. Whatever costs they save from switching to ARM will translate in real-world $ saved (at google scale, those can add up quickly).

piker · 2025-12-20T17:29:06 1766251746

The second example has nothing to do with the first. I am optimistic that LLMs are great for translations with good testing frameworks.

“Optimize” in a vacuum is a tarpit for an LLM agent today, in my view. The Google case is interesting but 1% while significant at Google scale doesn’t move the needle much in terms of statistical significance. It would be more interesting to see the exact operation and the speed up achieved relative to the prior version. But it’s data contrary to my view for sure. The cynic also notes that Google is in the LLM hype game now, too.

NitpickLawyer · 2025-12-20T17:38:32 1766252312

Why do you think it's not relevant to the "optimise in a loop" thing? The way I think of it, it's using LLMs "in a loop" to move something from arch A (that costs x$) to arch B (that costs y$), where y is cheaper than x. It's still an autonomous optimisation done by LLMs, no?

piker · 2025-12-20T18:25:15 1766255115

Did the LLM suggest moving to the new architecture? If not that’s not what’s under discussion. That’s just following an order to translate.

NitpickLawyer · 2025-12-20T18:30:51 1766255451

Ah, I see your point.

Jaxan · 2025-12-20T17:46:52 1766252812

> As we're talking about google here, we can be pretty sure it wasn't some trivial python trick that they missed.

Strong disagree on the reasoning here. Especially since google is big and have thousands of developers, there could be a lot of code and a lot of low hanging fruit.

NitpickLawyer · 2025-12-20T18:36:36 1766255796

> By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini’s architecture by 23%, leading to a 1% reduction in Gemini's training time.

The message I replied to said "if I have some toy poorly optimized python example". I think it's safe to say that matmul & kernel optimisation is a bit beyond a small python example.

andy99 · 2025-12-20T11:00:59 1766228459

There was a discussion the other day where someone asked Claude to improve a code base 200x https://news.ycombinator.com/item?id=46197930

exitb · 2025-12-20T11:36:19 1766230579

That’s most definitely not the same thing, as „improving a codebase” is an open ended task with no reliable metrics the agent could work against.

dist-epoch · 2025-12-20T10:59:49 1766228389

https://github.com/algorithmicsuperintelligence/openevolve

piker · 2025-12-20T11:11:24 1766229084

https://chatgpt.com/backend-api/estuary/public_content/enc/e...