Lost at C: A user study on the security of LLM coding assistants

dpflan · on March 1, 2023

More: Do Users Write More Insecure Code with AI Assistants? (https://arxiv.org/pdf/2211.03622.pdf)

More relevant discussion, from Andrew Ng's The Batch newsletter: Check the Generated Code Generates Overconfident Coders section: (https://www.deeplearning.ai/the-batch/issue-180/)

mjburgess · on March 1, 2023

> we tasked participants (student programmers) with implementing a singly-linked 'shopping list' structure in C

Uhuh... does anyone think that LLM would actually impact this case?

The LLM has likely been trained on thousands of "top-quality" code examples covering exactly this data structure.

No one could plausibly regard LLMs as impairing a case of this kind.

This seems quite silly.

moyix · on March 13, 2023

This seems like a silly objection to me, although I am one of the authors :) Yes, Codex has been trained on lots of linked lists. It has also been trained on lots of bad and insecure versions of linked lists! And when we did our earlier study ("Asleep at the Keyboard"), which looked at the security of code written by Copilot without human intervention, we found that it produced vulnerable code at high rates (40%) even for common scenarios like SQL injection and C string manipulation. So your prior that LLMs should be good at linked lists is not at all obvious; that's why we ran the study.

Other complaints in this thread about studying students - sure, perhaps experienced developers will do better. These were upper level undergraduate and masters students though, so they're the kind of developer you will have writing code with Codex in industry within a year or so, and so they are still worth studying. It's not the whole picture, but we never claim it is.

IshKebab · on March 1, 2023

You are right, this is a silly study. "...in students" is pretty much the same as "...in mice".

I'm waiting for someone to conclude that code comments are pointless because they make no difference to the ability of students to implement fizzbuzz.

PaulHoule · on March 1, 2023

Overall I think the safety problems get worse when the program gets more complex. There's a big difference between getting a 100 line program right and getting a 20,000 line program right.

That said this is an underpowered study to really answer the question. It reminds me a little of that Scott Alexander blog post where he came to big conclusions based on 5 questions he asked ChatGPT. However there is a definite industry of people who write papers where they ask ChatGPT some questions and evaluate the replies.

amoss · on March 2, 2023

I have not read the paper beyond the abstract so this is something of a throwaway comment.

The setup of the study raises questions about construct validity. The main purpose of these coding assistants is to write new code - i.e. code with a function that is not directly expressed in their training set. The choice of such a simple (and traditional teaching task) for the study means that the function will be heavily represented in the training set. Furthermore, correct approaches to the problem will be represented more heavily than incorrect approaches.

As I said I did not read beyond the abstract so I would be happy to be corrected if the authors have accounted for this in their study.

smoyer · on March 2, 2023

Best title ever ... I spent years doing embedded systems development in assembly and C and I definitely felt this way at times!

ducktective · on March 1, 2023

Tangential but has it became popular to use titles for academic papers like this: [a clever pun]: [generic short description]

I've seen a few AI papers that are like this. Like the "Attention is all you need"

bee_rider · on March 1, 2023

FWIW it isn’t a new thing exactly.

https://slate.com/technology/2015/12/the-best-funny-clever-o...

They had an example from 2004 that I saw at least; but the article is from 2015 and lots of their links have died, so there might have been an older one. Anyway, 2004 is long enough to not call this a new phenomenon, right?

saurik · on March 1, 2023

This is just a popular format for titles in general, whatever the medium / format / genre.

bee_rider · on March 2, 2023

I think you are right, but it is particularly pronounced in papers because you sorta want a noun-y thing to refer to your new idea. So “NounPun: here’s what it does” is a good title format for both telling the reader what the thing is, and why they care about it.

mattigames · on March 1, 2023

At some point you have to accept that the likelihood of something you wrote is read increases if you make it entertaining even if just a tiny bit, yes even academics fall for this trick, is usually not concious but academics are humans too and some tricks like that help increase the likelyhood to grab their attention, even subjects that they already strongly believe are interesting to them.

aix1 · on March 2, 2023

Here is a preprint from the year 2000: "How to Roll a Join: Asynchronous Incremental View Maintenance".

https://pages.cs.wisc.edu/~beyer/papers/matview_sigmod00.pdf