Hacker News new | past | comments | ask | show | jobs | submit login

> AI generated works are public domain

That's not the status quo. The biggest concern is that AI can sometimes generate code from the training set verbatim, in which case that code has the copyright of the original training set - which might be GPL or proprietary or whatever else.




> The biggest concern is that AI can sometimes generate code from the training set verbatim

Not just verbatim: AI-generated code is arguably a derivative work of code in the training set even if it doesn't generate verbatim copies of training data, in the same way that any code can be a derivative work even if it doesn't contain bitwise-identical lines.

If you take a piece of Python code, and translate it to C code, without a single line being copied verbatim, the resulting C code is still almost certainly a derivative work of the Python code.


Everything you do or know is a derivative of your own training set. True original thoughts without context don’t exist, well at least in any way we can find out as every human is the product of their own training set.

Just out curiosity, what makes the code you write, more original than an LLMs who’s training set is way bigger than yours and likely will more variance in how to achieve same goal.

I’m not trying to be facetious but rather stoke the philosophical idea of the reality that humans aren’t as special and unique as we think we are.


> True original thoughts without context don’t exist, well at least in any way we can find out as every human is the product of their own training set.

Someone wrote the first murder mystery. It wasn't something we inherited from amoebas.

Someone wrote the first Regency romance. In fact, we know who it was - Georgette Heyer.

Someone invented calculus. We didn't get that passed down from the cave men via oral history or tribal knowledge.

We've dreamed of flying for millenia, but someone invented the first practical airplane - someone specific. Yes, they built on previous knowledge. But their step was still original. It had never been done before.

And so on, for idea after idea and original creation after original creation. They all originated at some time, with someone.


I’m not denying those achievements of those individuals at all. But I am saying that those individuals didn’t invent or discover those ideas out of the blue. There was information gathered up until that point in their lives that allowed them to make those connections and form ideas from there. No idea exists in a vacuum, yes someone needs to be the first to do anything, but all knowledge is passed on.


LLMs can sometimes come up with original things as well. Also even the people who invented new things like that took some inspiration or knowledge from things that exist


This seems absurdly reductionist to me. Wouldn't all work simply be attributable to the first proto-human who grabbed a stick or rock and used it as a tool? Surely even proto-humans communicated knowledge by demonstrations amongst themselves, even if they lacked complicated language.


Humans are just a collection of experiences that are using past data to make future assumptions. So yes, thank you Mr. Caveman because of your ingenuity of using a rock and a stick to create a hammer, we now have a full modern society built off the back of tools. But that seems ridiculous doesn’t it?

So at what point do we stop attributing the previous innovations that led to our current innovations? And why do we stop there? Would Mr. Caveman be the only special human to ever figure that out or could the argument be made that eventually someone else would have figured out how to make tools and therefore attribution is just pointless?

What I am trying to get at is, everything you do or create, is because of work the all of humanity has done. So pertaining to copyright, why should 1 single person claim an idea as solely theirs when that idea was not created in a vacuum.

I should also say, I am not discounting anyone’s work, but rather if the monetary reasons for creating became secondary, would the need for copyright even exist?


If you can dream up a society that doesn't need money, yes we can do away with copyright.


> Just out curiosity, what makes the code you write, more original than an LLMs

Indeed in many ways LLM code is more original than mine, as I limit myself to calling only functions that exist and producing code that will compile, unlike LLMs which have no such limitations. /s


Maybe, but it sounds like you’re operating within pretty restricted walls. Sometimes there are better ways to do things that you haven’t thought of but it requires experimentation.

That would be like assuming every piece of code you’ve ever written compiled perfectly on every run. Do you hold yourself to the same standard? Or do you give yourself some leeway because you walk through it, talk through, think through it and then come to a solution?


While I would say GPT is creative, the law doesn't have to agree with me.

Also, there's the possibility that all things creative are the human form of peacocks' tails, and if that's the case then the cost and difficulty matters more than the outcome, and any discussion based on capitalist incentives is fundamentally flawed.

https://kitsunesoftware.wordpress.com/2022/10/09/an-end-to-c...

(I wrote this 52 days before ChatGPT came out, the reference to GPT-3 is based on the stuff OpenAI released before the chat interface).


That’s not legally true.

I don’t even think it’s factually true.


Are you sure? If it was established that LLMs were illegal I think people would know


This is an interesting thought. How much do we really owe to our teachers? If it wasn't for the school that taught me to read all those years ago I wouldn't have a job!


Yes deduced to it’s lowest form but there’s billions of inputs that go into your own training set to produce the outcome you’re at.

Along the way, you’ve experienced trauma which nudged you a direction, you’ve been inspired by other people which left an impression in your mind, you’ve had people teach you 1 skill and another person teach you a different skill, and you combined them, life experiences that changed your outlook etc.

It’s less about the specific teachers and more about every single person and event that has happened to you is your own ‘training set’. But sometimes we can definitely point to a specific event or person who influenced us to where we are at today and I like to think of that just like model weights in current LLMs.


I was pointing out that the case where the network produces exact copies of copyrighted code (or very very close to exact copies, such as changing a variable name or removing a comment) is clear, and requires no new litigation.

While I also think there is a good chance that what you are claiming (that any output is a derived work of the training set), this is definitely not settled, and will require someone going to court over it. I expect the proceedings over something like this to take a long time, and very likely reach to the supreme court. So, we won't know if you're right or not for quite a while.

A good argument against the idea that any output is a derived work of (all) data in the training set is exactly that verbatim copies of snippets from the Linux kernel can't be a derived work of snippets from GCC, even if GCC was also in the training set. So it is arguable that someone intending to claim copyright infringement for a piece of code generated by an LLM has to identify which particular work of theirs from the training set it infringes on, it can't be a blanket finding.


Worse: The US Supreme Court will reach a verdict. The European equivalent will also reach a verdict, which may be different. And the Australians. And the British. And the Chinese. And...

It won't be over when the first top-level court decides.


Hopefully it'll over when the first top-level court of any country too big to ignore says "no", at which point anyone trying to be safe will stop allowing LLM-generated code in their codebase.


By the same logic all code I have ever written is a derivative work of all code I've ever read.


Companies that bar their employees from even looking at gpl code already follow that logic, yeah.


My understanding was the copyright office indicated you can't get copyright for AI generated works:

https://www.reuters.com/legal/ai-created-images-lose-us-copy...

so it may not be "public domain", but if you can't get copyright protection in the US I don't see a difference.


The difference is an AI generated work can infringe someone's copyright. Suppose I have Stable Diffusion draw Spider-man. While I can't get copyright on the drawing, that doesn't make the drawing public domain in the sense we usually use the term "public domain."


I’m sure there are other ways do to this, but simply asking OpenAI to repeat a letter 100 times will eventually get you back to content that looks like it’s training set.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: