This seems like a silly objection to me, although I am one of the authors :) Yes, Codex has been trained on lots of linked lists. It has also been trained on lots of bad and insecure versions of linked lists! And when we did our earlier study ("Asleep at the Keyboard"), which looked at the security of code written by Copilot without human intervention, we found that it produced vulnerable code at high rates (40%) even for common scenarios like SQL injection and C string manipulation. So your prior that LLMs should be good at linked lists is not at all obvious; that's why we ran the study.
Other complaints in this thread about studying students - sure, perhaps experienced developers will do better. These were upper level undergraduate and masters students though, so they're the kind of developer you will have writing code with Codex in industry within a year or so, and so they are still worth studying. It's not the whole picture, but we never claim it is.
Overall I think the safety problems get worse when the program gets more complex. There's a big difference between getting a 100 line program right and getting a 20,000 line program right.
That said this is an underpowered study to really answer the question. It reminds me a little of that Scott Alexander blog post where he came to big conclusions based on 5 questions he asked ChatGPT. However there is a definite industry of people who write papers where they ask ChatGPT some questions and evaluate the replies.
I have not read the paper beyond the abstract so this is something of a throwaway comment.
The setup of the study raises questions about construct validity. The main purpose of these coding assistants is to write new code - i.e. code with a function that is not directly expressed in their training set. The choice of such a simple (and traditional teaching task) for the study means that the function will be heavily represented in the training set. Furthermore, correct approaches to the problem will be represented more heavily than incorrect approaches.
As I said I did not read beyond the abstract so I would be happy to be corrected if the authors have accounted for this in their study.
They had an example from 2004 that I saw at least; but the article is from 2015 and lots of their links have died, so there might have been an older one. Anyway, 2004 is long enough to not call this a new phenomenon, right?
I think you are right, but it is particularly pronounced in papers because you sorta want a noun-y thing to refer to your new idea. So “NounPun: here’s what it does” is a good title format for both telling the reader what the thing is, and why they care about it.
At some point you have to accept that the likelihood of something you wrote is read increases if you make it entertaining even if just a tiny bit, yes even academics fall for this trick, is usually not concious but academics are humans too and some tricks like that help increase the likelyhood to grab their attention, even subjects that they already strongly believe are interesting to them.
More relevant discussion, from Andrew Ng's The Batch newsletter: Check the Generated Code Generates Overconfident Coders section: (https://www.deeplearning.ai/the-batch/issue-180/)