Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Do you document LLM-generated code as such?
12 points by Bostonian 8 months ago | hide | past | favorite | 9 comments
My Python source files are now a mix of functions I wrote and functions written by LLMs. I do review and test the functions written by LLMs.

When code was written by an LLM, do you note that in the code? How specific is the citation -- do you note the date the code was generated". LLM capabilities can vary over time. Do you list the prompt? (I don't but typically ask the LLM to provide a docstring.)




No, but I rarely find that LLM-generated code is just pasted in as-is. It gives me some code that speeds me up, but I still tweak the details and change things to integrate it into the bigger picture. It certainly doesn't bring so much value that I'd be tracking timestamps and prompts... it is just quick boilerplate for tedious stuff that I don't want to burn time on.


Yeah, I don't find I end up with copilot code that survives very long as it came.


I am anticipating that within 2-5 years questions like this will be moot for many projects.

Because there will be development platforms and tools built around AI-generated code. The LLMs will have integrated code execution and libraries/APIs that they are familiar with. If a programmer or user stays within this platform, which will be fairly general purpose, the AI can handle 98% of requests with no help writing code.

These types of platforms will completely normalize AI-written code. People won't think twice about using it or feel they need to make a note of it.

What you might see say five years out is kind of the opposite. For some code bases, if a human writes the code and it hasn't been checked by an AI, then they have to make a note in a comment with their name and why they are not using AI verification. But they will probably choose to have an AI verify the code just to avoid extra procedure.


No, I use LLMs to help with code all the time and I have zero functions written entirely by an LLM. It’s more about figuring out small sections of code, or helping with design decisions, not about generating fully working functions.

I don’t document things as “I learned this from stackoverflow” either, LLMs are very similar.


Actually I am pretty careful about adding stackoverflow URLs to comments, not out of any sense of obligation but so I know where to look if I need further insight when debugging or reusing the code later. This habit has paid off more than once.

For LLM-generated code, I don't see much value in documenting where it came from, especially since it will be very exhaustively tested (and likely rewritten entirely) before it gets checked in, to an even greater extent than excerpts snagged from Stack Overflow. Also, a given prompt will rarely return exactly the same code twice, certainly not years later when I need to look back at it. It will make more sense to create a new query if/when that happens.


I'm involved with a few projects that leverage the DCO [1]. Afaik, it's not been legally tested yet, but given all the open questions around copyright with LLMs, I assume a DCO sign-off on LLM-generated code would be a misrepresentation, and any project having received such commits would either need to rewrite those portions from-scratch (as done in other cases of infringing code) or simply remove them wholesale.

Put another way, till IP-implications have been figured out, I assume all LLM-generated code is radioactive for FLOSS projects.

[edit to clarify]: so documenting that a commit you're pushing includes code generated with an LLM will make maintainers' lives dramatically simpler. Please, and thank you!

All the best,

-HG

[1] https://developercertificate.org/


What would be the purpose of such citations? If you understand the function and it does what needs to do, does it matter where it came from (within legal limits ofc)?

When I read your code I want to do know what the function does and why it does it. LLM generation details would just distract from that.


Usually not. For me, it's an “autonomous search engine on steroids, i.e., its huge dataset”. (I.e., it's just another tool you use.)

Before LLMs, you would cobble a bunch of disjoint information via a search engine like Google. Now, LLMs do this for you, and it certainly helps me to get a lot quicker with using libraries or APIs I am not familiar with (e.g., PyGame, Flask, Django). However, you might find that code from the LLM might need some fixing (subtle bugs or redundancies) or a better use of resources.

The other issue is the LLM's dataset bias towards the most used technologies or concepts. So you might have a hard time with an LLM trying to make Clojure/Racket code or telling the LLM to specifically do the point-in-triangle test with the wedge product only.

Hence, there is still some leeway or reason to use your thing between the ears.

You might as well ask: Are you referencing Stack Overflow or the Microsoft Developer Reference (e.g., in your developer notes/comments)?

My answer: usually, yes.


No I take all of the credit, job security.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: