If the models are memorizing and regurgitating from their training data, how come every model I've tried this with produces entirely different code?
Presumably this is because "the network only needs to interpolate between them". That's what I want it to do!
I tried the space invaders thing on a 4GB Qwen model today and it managed to produce a grid of aliens that advanced one step... and then dropped off the page entirely.
Transformer does not need to emit a byte for byte clone of a training example to benefit from having seen it. It can store a distributed representation of many near duplicate implementations and then sample a novel linear combination. That still short circuits algorithm design so the burden of discovering the game loop, collision logic, sprite sheet etc. was ALREADY SOLVED during pre training.
When you temperature sample the same model twice you also get "different" code, diversity alone is not evidence of new reasoning. What matters is functional novelty under controlled transformations (renamed variables, resized canvas, obfuscated asset file names etc). On such metamorphic rewrites, models that appear brilliant on canonical prompts suddenly collapse, a hallmark of shallow pattern matching.
The paper I mentioned in my previos comment shows SOTA coding LLMs scoring 70%+ on SWE bench verified yet dropping 10–47% when the very same issues are paraphrased or drawn from unseen repos, even though the task semantics are identical. That is classic memorisation, just fuzzier than a CRC match.
As to qwen, even at 4 bit per weight, a 4B model retains ≈ 2.1 GB of entropy so enough to memorise tens of thousands of full game loops. The reason it garbled the alien movement logic is probably that its limited capacity forced lossy compression, so the behaviour you saw is typical of partially recalled code patterns whose edge cases were truncated during training. That’s still interpolation over memorised fragments, just with fewer fragments to blend.
And this is something that is actually proven (https://arxiv.org/abs/2406.15720v1) by controlled fact memorisation studies and extraction attacks up through 70B params show a monotone curve so basically each extra order of magnitude adds noticeably more verbatim or near verbatim recall. So a 20B model succeeds where a 4B one fails because the former crossed the "capacity per training token" threshold for that exemplar. So nothing magical there.
Don't get me wrong, I’m not arguing against interpolation per se, generalising between held out exemplars is precisely what we want. The problem is that most public "just write space invaders” demos never verify that the endpoints were truly unseen. Until they do, a perfect clone is compatible with nothing deeper than glorified fuzzy lookup.
This is a great explanation, thanks for putting it together.
It more or less fits my fuzzy mental model of how this stuff works.
I'm completely fine with my test prompt taking advantage of this - the point of "implement space invaders" is to explore how well it can construct a game of that shape based on the examples that it has seen in its training data, especially in comparison to other models.
I'm not trying for a test of ability to produce a unique new game - I want a short prompt that gets it to output some HTML and JavaScript that I can then interact with.
Presumably this is because "the network only needs to interpolate between them". That's what I want it to do!
I tried the space invaders thing on a 4GB Qwen model today and it managed to produce a grid of aliens that advanced one step... and then dropped off the page entirely.