Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"LLMs tend to regurgitate solutions to solved problems"

People say this, but honestly, it's not really my experience— I've given ChatGPT (and Copilot) genuinely novel coding challenges and they do a very decent job at synthesizing a new thought based on relating it to disparate source examples. Really not that dissimilar to how a human thinks about these things.



There's multiple kinds of novelty. Remixing arbitrary stuff is a strength of LLMs (has been ever since GPT-2, actually... "Write a shakespearean sonnet but talk like a pirate.")

Many (but not all) coding tasks fall into this category. "Connect to API A using language B and library C, while integrating with D on the backend." Which is really cool!

But there's other coding tasks that it just can't really do. E.g, I'm building a database with some novel approaches to query optimization and LLMs are totally lost in that part of the code.


But wouldn't that novel query optimization still be explained somewhere in a paper using concepts derived from an existing body of work? It's going to ultimately boil down to an explanation of the form "it's like how A and B work, but slightly differently and with this extra step C tucked in the middle, similar to how D does it."

And an LLM could very much ingest such a paper and then, I expect, also understand how the concepts mapped to the source code implementing them.


> And an LLM could very much ingest such a paper and then, I expect, also understand how the concepts mapped to the source code implementing them.

LLM don't learn from manuals describing how things works, LLM learn from examples. So a thing being described doesn't let the LLM perform that thing, the LLM needs to have seen a lot of examples of that thing being perform in text in able to perform it.

This is a fundamental part to how LLM work and you can't get around this without totally changing how they train.


How certain are you that those challenges are "genuinely novel" and simply not accounted for in the training data?

I'm hardly an expert, but it seems intuitive to me that even if a problem isn't explicitly accounted for in publicly available training data, many underlying partial solutions to similar problems may be, and an LLM amalgamating that data could very well produce something that appears to be "synthesizing a new thought".

Essentially instead of regurgitating an existing solution, it regurgitates everything around said solution with a thin conceptual lattice holding it together.


But it's not that most of programming, anyway?


No, most of programming is at least implicitly coming up with a human-language description of the problem and solution that isn't full of gaps and errors. LLM users often don't give themselves enough credit for how much thought goes into the prompt - likely because those thoughts are easy for humans! But not necessarily for LLMs.

Sort of related to how you need to specify the level of LLM reasoning not just to control cost, but because the non-reasoning model just goes ahead and answers incorrectly, and the reasoning model will "overreason" on simple problems. Being able to estimate the reasoning-intensiveness of a problem before solving it is a big part of human intelligence (and IIRC is common to all great apes). I don't think LLMs are really able to do this, except via case-by-case RLHF whack-a-mole.


How do you know they're truly novel given the massive training corpus and the somewhat limited vocabulary of programming languages?


I guess at a certain point you get into the philosophy of what it even means to be novel or test for novelty, but to give a concrete example, I'm in DevOps working on build pipelines for ROS containers using Docker Bake and GitHub Actions (including some reusable actions implemented in TypeScript). All of those are areas where ChatGPT has lots that it's learned from, so maybe me combining them isn't really novel at all, but like... I've given talks at the conference where people discuss how to best package and ship ROS workspaces, and I'm confident that no one out there has secretly already done what I'm doing and Chat is just using their prior work that it ingested at some point as a template for what it suggests I do.

I think rather it has a broad understanding of concepts like build systems and tools, DAGs, dependencies, lockfiles, caching, and so on, and so it can understand my system through the general lens of what makes sense when these concepts are applied to non-ROS systems or on non-GHA DevOps platforms, or with other packaging regimes.

I'd argue that that's novel, but as I said in the GP, the more important thing is that it's also how a human approaches things that to them are novel— by breaking them down, and identifying the mental shortcuts enabled by abstracting over familiar patterns.


I have a little ongoing project where I'm trying to use Claude Code to implement a compiler for the B programming language that is itself written in B. To the best of my knowledge, such a thing does not exist yet - or at least if it does, no amount of searching can find it, so it's unlikely that it is somewhere in the training set. For that matter, the overall amount of B code in existence is too small to be a meaningful training set for it.

And yet it can do it when presented with a language spec. It's not perfect, but it can solve that with tooling that it makes for itself. For example, it tends to generate B code that is mostly correct, but with occasional problem. So, I had it write a B parser in Python and then use that whenever it edits B code to validate the edits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: