I never understood the point of the pellican on a bicycle exercise:
LLMs coding agent doesnt have any way to see the output.
It means the only thing this test is testing, is the ability of the LLMs to memorise.
Because it excercises thinking about a pelican riding a bike (not common) and then describing that using SVG. It's quite nice imho and seems to scale with the power of the LLM model. Sure Simon has some actual reasons though.
I wouldn't say any LLMs are good at it. But it doesn't really matter, it's not a serious thing. It's the equivalent of "hello world" - or whatever your personal "hello world" is - whenever you get your hands on a new language.
Coordinate and shape of the element used to form a pellican.
If you think about how LLMs ingest their data, they have no way to know how to form a pellican in SVG.
I bet their ability to form a pellican result purely because someone already did it before.
> If you think about how LLMs ingest their data, they have no way to know how to form a pellican in SVG.
It's called generalization and yes, they do. I bet you could find plenty of examples of it working on something that truly isn't "present in the training data".
It's funny, you're so convinced that it's not possible without direct memorization but forgot to account for emergent behaviors (which are frankly all over the place in LLM's - where you been)?
At any rate, the pelican thing from simonw is clearly just for fun at this point.
Edit: just to show my point, a regular human on a bicycle is way worse with the same model: https://i.imgur.com/flxSJI9.png