Hacker News new | past | comments | ask | show | jobs | submit login

Is it clearly plagiarism? I wouldn't say it is that clear-cut, since in a sense the output of an LLM to a prompt you give it could still be seen as something you produced -- albeit with the help of a magical matmul genie.





Yes. It’s clearly plagiarism. Your reply is clearly grasping at the furthest of straws in an attempt to be contrarian and add another “stochastic parrot hehe!” comment to the already overflowing pile. Line up 100 people and the only ones agreeing with you are other wannabe contrarians.

I truly don't understand the tone of your comment.

I'm not grasping at the furthest of straws, I see a distinction between 'verbatim copying someone else's work' and 'verbatim copying the results of a tool that produces text'.


Plagerism isn’t the copying part, it’s the part where you claim to be the author of something you are not the author of. Hope that helps to clear up things. You can plagerism content that your are both legally and ethically allowed to copy. It doesn’t matter the least bit. If you claim to be the author of content you didn’t author and lack attribution AI or otherwise then you’re plagering the content.

> A translation tool like DeepL is presumably trained on a huge amount of 'other people's work'. Is copying its result verbatim into your own work also plagiarism then?

Yes, if you present yourself as its author.


So let's say you are not a native English speaker and write a passage of your paper in your native language, then let DeepL translate this and paste the result into your paper, without a note or citation. Is that plagiarism?

the tool actually produces text… of someone else’s work… that you then copy… verbatim… :)

But the text itself is not someone else's work verbatim.

A translation tool like DeepL is presumably trained on a huge amount of 'other people's work'. Is copying its result verbatim into your own work also plagiarism then?


plagiarism - by definition - is copying someone else’s work.

the easier definition is “did YOU write this?” if answer is no - you plagiarised it and should be punished to the full extent.


'Someone else's work' -- exactly. Not 'the output of some tool'.

I'm not saying what the guy did wasn't wrong or dumb, I'm saying: Plagiarism has a strict definition, and I don't think it can be applied to the case of directly copying the output of an LLM -- because plagiarism refers to copying the work of another author, and I don't think LLMs are generally regarded (so far) as being able to have this kind of 'authorhood'.


plagiarism does NOT refer to copying the work of another author, it refers to you submitting work as yours that you didn’t yourself write.

if I copy entire article from the Economist, did I plagiarize!? There is no author attribution so we don’t know the author… Many articles in media today are LLM generated (fully or partially), can I copy those if someone sticks there name as author?!

bottom line is - you didn’t do the work but copied it from elsewhere, you plagiarized it, period


I'll just link here to another comment I made that sums up my argument quite well, I think:

https://news.ycombinator.com/item?id=42246168


It seems clear to me. The student is claiming that he wrote something that he didn't write.

Definition of plagiarism, by the Cambridge Dictionary:

"the process or practice of using another person's ideas or work and pretending that it is your own"

What I am objecting to is the "another person's" part. An LLM is not a person, it is a tool -- a tool that is trained on other people's work, yes.

If you use a different tool like DeepL, which is also trained on other people's work, to produce text purely from an original prompt you give it (i.e. translate something you wrote yourself), and you put that into your paper... is that then plagiarism as well? If not, what if you use an LLM to do the translation instead, instructing it to act strictly as a 'translation tool'?

It seems to me, the mere act of directly copying the output of an LLM into your own work without a reference cannot be considered plagiarism (in every case), unless LLMs are considered people.

Of course, you can prompt an LLM in a way that copying its output would _definitely_ be plagiarism (i.e., asking it to quote the Declaration of Independence verbatim, and then simply copying that).

So, all I'm saying is: The distinction is not that clear, has nuances, and depends on the context.


By your argument, since an encyclopedia is not a person, I can copy it with impunity. It's a collection of work built on others' ideas and research, but technically a tool to bring it together. I can assure you that virtually any school would consider the direct use of it, without citation, plagiarism.

Let's assume I used an encyclopedia outside of my native tongue. I took the passage verbatim, used a tool to translate it to my native tongue, and passed it off as my own. The translation tool is clearly not a person, and I've even transformed the original work. I might escape detection, but this is still plagiarism.

Do you not agree?

Let's go to how Cambridge University defines it academically:

> Plagiarism is defined as the unacknowledged use of the work of others as if this were your own original work.

> A student may be found guilty of an act of plagiarism irrespective of intent to deceive.

And let's go to their specific citation for the use of AI in research:

> AI does not meet the Cambridge requirements for authorship, given the need for accountability. AI and LLM tools may not be listed as an author on any scholarly work published by Cambridge


> By your argument, since an encyclopedia is not a person, I can copy it with impunity.

I don’t see where they said (or implied) that.

How does “that isn’t plagiarism” imply “I can copy it with impunity”? Copyright infringement is still a thing.

Have you conflated plagiarism with copyright infringement? Neither implies the other. You can plagiarize without committing copyright infringement, and you can violate copyright without plagiarism.


I'm sorry, but this encyclopedia analogy really doesn't say anything at all about the argument I raised. An encyclopedia is the work of individual authors, who compiled the individual facts. It is not a tool that produces text based entirely on the prompt you give it. Using an encyclopedia's entries (translation or not) without citing the source is plagiarism, but that doesn't have any parallel to using an LLM.

(Also, the last quote you included seems to directly support my argument)


The translation software isn't a person. It will necessarily take liberty with the source material, possibly even in a non-deterministic fashion, to translate it. Why would it be any different from a LLM as a tool in our definition of plagiarism?

If I used a Markov Chain (arguably a very early predecessor to today's models) trained on relevant data to write the passage, would that be any different? What about a RNN? What would you qualify as the threshold we need to cross for the tool to not be to be plagiarism?


when did he imply that a LLM would be different as a tool than a translator in his definition of plagiarism? are you even understanding his points lmao?

There's nuances to the amount of harm dealt to the authors based on what sources you are stealing from, but it's irrelevant here, as the specific incident we're talking about is whether or not the student is the actual author of the work submitted.

It'd be the same as if I had Google Translate do my German 101 exam. I even typed the word "germuse" with my own two thumbs!


What we are talking about in this sub-thread is exclusively the 'this is clearly plagiarism' part.

If you used Google Translate for your German 101 exam, that would be academic dishonesty/cheating, but not plagiarism.


I'm largely uninterested in the specific name you want to give it and more if its worthy of punishment.

Then I think this sub-thread is not the discussion you were looking for.

Words have meanings, if we don’t use them and have a shared understanding of them, dialogue becomes exceptionally difficult.

> What I am objecting to is the "another person's" part.

Fair enough. We disagree about definitions here. To me, plagiarizing is claiming authorship of a work that you did not author. Where that work came from is irrelevant to the question.

> If not, whatif you use an LLM to do the translation instead, instructing it to act strictly as a 'translation tool'?

Translation is an entirely different beast, though. A translation is not claiming to be original authorship. It is transparently the opposite of that. Nobody translating a work would claim that they wrote that work.


> Fair enough. We disagree about definitions here. To me, plagiarizing is claiming authorship of a work that you did not author. Where that work came from is irrelevant to the question.

This is exactly what it is ... the post is taking "another person's" waaaay to literally - especially given that we are in the year of our Lord 2024/2025. One of the author's comments above is also discarding Encyclopedia argument stating that they are written by people which cannot ever be factually proven (I can easily ask LLM to create an Encyclopedia and publish it). Who is "another person" on a Wikipedia page?! "bunch of people" ... how is LLM trained? "bunch of people, bunch of facts, bunch of ____"

The crux of this whole "argument" isn't that plagiarism is "another person's work" it is that you are passing work as YOURS that isn't YOURS - it is that simple.


Well, I understand, and I suspect that a lot of people commenting here see the term similarly to you; but there's an official definition regardless of your personal interpretation, and it does include the 'somebody else's work' part.

Why is translation a different beast? It produces text based on a prompt you give it, and it draws from vast amounts of the works of other people to do so. So if a translation tool does not change the 'authorship' of the underlying text (i.e., if it would have been plagiarism to copy the text verbatim before translating it, it would be plagiarism after; and the same for the inverse), then it should also be possible for an LLM to not change the authorship between prompt and output. Which means, copying the output of an LLM verbatim is not necessarily in itself plagiarism.


> but there's an official definition regardless of your personal interpretation, and it does include the 'somebody else's work' part.

No, it doesn't. First of all, dictionaries aren't prescriptive and so all quoting a definition does is clarify what you mean by a word. That can be helpful toward understanding, of course.

That said, the intransitive verb form of the word does not require "somebody else's work" in the sense of that "someone else" being a human.

  > to commit literary theft : present as new and original an idea or product derived from an existing source
-- Merrian-Webster https://www.merriam-webster.com/dictionary/plagiarize

According to this, what it means is taking credit for a work you did not produce. That work did not have to be produced by a human, it merely had to exist.

> Why is translation a different beast?

Because it doesn't produce a new work, it just changes the language that work is expressed in. "Moby Dick" is "Moby Dick" regardless of what language it has been translated to. This is why the translator (human or otherwise) does not become the author of the work. If you were to run someone else's novel through a translator and claimed you wrote that work, you would in every respect be committing plagiarism both by the plain meaning of the word and legally.

> copying the output of an LLM verbatim is not necessarily in itself plagiarism.

Yes, it is. You would be taking credit for something you did not author. You would be doing the same if you took credit for a translation of someone else's work.


Did he write it? Did he write 99% of it? 98%? Less than 5% of it?

Then did he represent it as his own work?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: