There's no part of AI that is being swallowed up by copyright. AI companies can ask for permission if they want to train their models on other people's works. It's not that hard, various image hosting sites have already added an opt-in/opt-out toggle to their services. Sites might even get away with using this stuff as compensation for free hosting.
The fact of the matter is that the AI companies don't want to ask for permission, because people will say no. Or worse, ask for attribution or even payment. There is plenty of copyright free/public domain material out there, but what the customers of AI people want isn't available under those terms.
The code to train an AI is not enough to make a product and these people have nothing to add themselves, so they take what others made and use that to make a profit. They can make or pay for their own paintings, their own pictures, their own music, but that would require putting in too much work or paying too much money.
It's very possible that a judge will rule that AI models do not violate copyright. If that is the case, I hope new legislation will correct that oversight very quickly.
AI companies can ask for permission if they want to train their models on other people's works
Do you ask for permission when you train your mind on copyrighted books? Or observe paintings? Or listen to music? Do you ask for permission when you get new ideas from HN that aren't your own?
Humans are constantly ingesting gobs of "copyrighted" insights that they eventually remix into their own creations without necessarily reimbursing the original source(s) of their creativity.
Time to put the horse back in the barn, cars and trains are here.
> Do you ask for permission when you train your mind on copyrighted books? Or observe paintings? Or listen to music?
Yes, that’s exactly what happens when you buy a book, or pay for a music subscription. The work is in the public domain, then global permission to observe and copy the work is already granted.
> Do you ask for permission when you get new ideas from HN that aren't your own?
You don’t need to. It’s implicitly assumed, by virtue of publishing in a public forum, that the author is providing permission for people read their comments and ideas, and remix them as they wish. That permission doesn’t include exact replication, but reading and understanding is assumed, otherwise why did the author publish it?
> Humans are constantly ingesting gobs of "copyrighted" insights that they eventually remix into their own creations without necessarily reimbursing the original source(s) of their creativity.
Correct. Literally everything produced by a human is automatically copyrighted. But the manner in which work is published creates implicit licenses for the public to consume those works. You publish in public, you automatically grate licenses for the public to consume and transform it.
If a human transforms an idea, it automatically becomes a new idea with its own copyright. The same doesn’t apply to AI because they’re not human, and thus the law generally doesn’t recognise them an having ability to create or transform ideas. If you believe AI can create and transform ideas, then you need lobby for the law to recognise that ability, but right now, only natural humans have that ability according to the law
> > Do you ask for permission when you train your mind on copyrighted books? Or observe paintings? Or listen to music?
> Yes, that’s exactly what happens when you buy a book, or pay for a music subscription. The work is in the public domain, then global permission to observe and copy the work is already granted.
You can buy a book, read it, sell the book, and then write and sell another book based on the ideas contained in the first book (Baker v Seldon). This is the cornerstone of contemporary copyright law. Or read the book on a shelf of a bookstore where the clerk is asleep. Or borrow the book from the library or any other manner where direct compensation of the author is nowhere to be seen.
Copyright is consistently interpreted in alignment with the needs of public learning, both by protecting the authorial incentive as well as protecting the public need for knowledge.
>You don’t need to. It’s implicitly assumed, by virtue of publishing in a public forum, that the author is providing permission for people read their comments and ideas, and remix them as they wish. That permission doesn’t include exact replication, but reading and understanding is assumed, otherwise why did the author publish it?
Following this logic, isn't training AI on Github or Deviantart 100% fair game then? It's not like OpenAI is infiltrating computers and reading hidden away data.
> Following this logic, isn't training AI on Github or Deviantart 100% fair game then?
Unlike forum comments, GitHub code generally has an explicit license attached which you'd have to respect - you know, for instance by giving attribution to every MIT-licensed source that was used.
And even then, let's say someone releases a book with all your HN comments: you are definitely entitled to sue them for copyright. Here's some info from the BBS era, which is still relevant today: https://www.templetons.com/brad/copymyths.html
But you can still see the code and learn generally how to write code. Maybe you see a style of unit testing in a library, and decide to incorporate the techniques into your own code. This is not a copyright violation. It can't be, or all creative expression would be dead.
For example, the "clean-room design" method of copying a work exists precisely to avoid potential copyright issues. One team reads the original work and writes a description in such a way that it cannot possibly be infringing, and a second team reads the description and creates the new work. This avoids any chance of someone reading the original work and incorporating potentially infringing aspects into the new work.
and this ruling will prove to be a disaster for music creation as there will be fewer copyright free spaces for music as time goes on.
a similar ruling will also be a disaster for software as our tools of expression are very restricted. code is based on boolean algebra and predicate calculus, practice guides like design patterns and books teaching algorithms and data structures.
there are lots of ways to write bad code and only a few for good, correct code. Recognizing this led me to replicating known working code, code I had created, for multiple employers. so who's copyright did I intentionally violate?
I think we are attacking the wrong problem WRT ML and copyright. to me, ML shows the foundation on which copyright is built is a lie. we should use ML to break copyright for code.
You're saying this as though we don't already have lots of regulations on tools to ensure that people use them appropriately.
Forklifts are "agents of humans" but you still need a license to drive one.
It's pretty obvious to me at least that AI bros are using these tools recklessly and inappropriately, without regard for licensing or copyright, and therefore I am proposing that the tools need to be regulated.
>Yes, that’s exactly what happens when you buy a book, or pay for a music subscription. The work is in the public domain, then global permission to observe and copy the work is already granted.
When you buy a book, you’re not paying a licensing fee. You’re exchanging for goods. You’re granted very few rights to own a copy of the work. But they’re almost all to do with distribution. None of those rights is the right to read it.
>You publish in public, you automatically grate licenses for the public to consume and transform it.
By this interpretation, all the artists upset by stable diffusion have given tacit permission for their works to be used as they are published in the public. Even though those works are posted to websites, the artist has not granted any rights to the viewer of the work.
> only natural humans have that ability according to the law
The law is not explicit about this, and we have case law that describes non-human entities as having rights associated historically with personhood. This is definitely not clear, nor is it obvious.
> When you buy a book, you’re not paying a licensing fee. You’re exchanging for goods. You’re granted very few rights to own a copy of the work. But they’re almost all to do with distribution. None of those rights is the right to read it.
You are absolutely buying a license to read the material when you purchase a book. That's why books cost more than the paper they're printed on and why pirated books are illegal. The "distribution" rights you refer to stem from the "first sale" doctrine[0], which acknowledges that the first sale (e.g., you buying a new copy of a book) of a physical object embodying a copyrighted work grants limited distribution rights.
When you buy a book or some other artwork, it is implicitly assumed you will put it in your brain, or your meat neural net. And that your brain could produce something related to this content.
It's not just assumed, it's celebrated when a work of art gathers fans who produce their own, inspired content.
Not sure why it needs to be over-complicated or different for silicone neural nets. But I think it will get very over-complicated, if not politicised, in the following years.
> When you buy a book or some other artwork, it is implicitly assumed you will put it in your brain, or your meat neural net. And that your brain could produce something related to this content.
It is implied that if you are using the work by youself, or via a tool you made youself, it's fine.
However works that you redistribute, by copying it yourself or indirectly by tools, said silicone neural nets being one example, instead require a "wide redistribution license agreement", and those are implicitly limited by default unless the work is put in a sorta public domain license.
> You publish in public, you automatically grate licenses for the public to consume and transform it.
No you don’t. That would fall under the category of “derivative work” which is still the intellectual property of the original author under most jurisdiction copyright laws.
Unless the resulting "derivative work" is sufficiently transformative. Which, i would argue, training an AI/ML is.
Therefore, using a training dataset does not constitute copyright violation.
If the AI outputted an exact copy (or a close enough copy, that the laymen would agree it's a copy), then that particular instance of the AI's output is in violation of copyright. The AI model itself violate any copyright.
> Therefore, using a training dataset does not constitute copyright violation.
It's not for you to decide that. Different jurisdictions will have their own process for deciding that and none of them are based on the opinions of random commentators on internet message boards.
Also please bare in mind my comment was reply to a specific statement (repeated below) and not talking about AI in general:
> You publish in public, you automatically grate licenses for the public to consume and transform it.
^ this statement is not correct for the reasons I posted. AI discussions might add colour to the debate but it doesn't alter the incorrectness of the above statement.
> If the AI outputted an exact copy (or a close enough copy, that the laymen would agree it's a copy), then that particular instance of the AI's output is in violation of copyright. The AI model itself violate any copyright.
That assumption needs testing in courts.
As I've posted elsewhere, there have been plenty of cases where copyright holders have successfully sued other creators based on new works that have bared a resemblance to existing works. It happens all the time. I remember reading a story about how a newly successful author was being handed ideas from fans during a book signing only for one of her representatives to intercept them each time. When they later asked why the representative took them, the representative said "it's because if any of your future books follow a similar idea, that fan could sue. But if we can prove you haven't read the idea then the fan has no claim". (to paraphrase)
Experts don't all agree on where the line is with similar works created by humans, let alone the implications of copyrighted content being used as training data for computers. And this is true for every jurisdiction I've researched. So to have random people on HN talk as confidently as they do about this being all perfectly legal is rather preposterous. You don't even fully grasp the intricacies of copyright law in your own jurisdiction, let alone the wider world. In fact this is such a blurred line that I wouldn't be surprised if the some cases would have different rulings in different courts within that same jurisdiction. It's definitely not as clear cut as you allude to.
> If you believe AI can create and transform ideas, then you need lobby for the law to recognise that ability, but right now, only natural humans have that ability according to the law
My experience with ML tools like co-pilot is why I reject copyright claims on ML systems. there are a tool that generated original work based on my instructions not unlike a paintbrush, photoshop, or a CNC machine. My instructions were based on my exposure to copyrighted works.
I use co-pilot as an accessibility device enabling me to write code again. like with speech recognition co-pilot is a force multiplier IF you change how you work. If you keep using the habits formed by typing, you will get shit results.
The end result of the shift in how I work is now I know how to tell co-pilot how to write code in my style. My co-pilot generated code is no less my code than what I generate by hand. Co-pilot acts as an extension of my brain, not my fingers.
Is my co-pilot generated code copyrightable? I say yes because it is the result of this human's creation and instruction.
>> Do you ask for permission when you get new ideas from HN that aren't your own?
>>You don’t need to. It’s implicitly assumed, by virtue of publishing in a public forum, that the author is providing permission for people read their comments and ideas, and remix them as they wish.
>Yes, that’s exactly what happens when you buy a book, or pay for a music subscription. The work is in the public domain, then global permission to observe and copy the work is already granted.
If an AI is not a human (I agree) it's a tool, that a human or company created. If it's a tool, the product belongs to the person who owns or uses (which is an important distinction, but not for this case) the tool. Ownership of the product can then be transferred to a new owner through whatever legal means.
If we agree on this, what we need to resolve mostly seems to be, in how far a human should not be allowed to use publicly available data to make his tool, in the same way he is allowed to use publicly available data to make anything else.
> Do you ask for permission when you train your mind on copyrighted books? Or observe paintings? Or listen to music?
Plenty of people have been successfully sued if their work is too similar to existing content.
This isn’t a new concept that AI is throwing into contention, it’s literally just companies trying to side step copyright law because of “disruption”.
Source: I work for a company in this field and we do gain permission from creators before training our models on their content. It’s very possible to operate this way but a lot of companies simply choose not to.
Exactly. Somehow it is so 'hard' for many AI companies to ask for permission to use and monetize copyrighted images in the training set these days and instead of asking permission, they attempt to bypass copyright law and give out frequent useless excuses from AI bros like: 'but muh fair use tho', 'oh well genie's out of the bottle it's too late', 'oops, cat's out of the bag, but you can opt out now'.
Little do some of them know that OpenAI was able to get permission from Shutterstock via a partnership to use their copyrighted images in the training set for DALL-E 2. [0] There is also a reason why Dance Diffusion was trained on only public domain music and copyrighted music which has the actual permission from the authors. [1] If they did otherwise and monetized on copyrighted music without the permission from musicians or record labels, they would be sued to the ground.
With the recent cases of Getty, Shutterstock, and even as admitted by the CEO of Stability themselves [2], the way forward for using copyrighted images in the training set for commercial purposes, is via licensing. Neither Getty or Shutterstock are looking for banning it, despite the AI bros claiming that these companies are trying to.
If not, just train only on public domain images to avoid these legal issues.
Where in the guidelines does it mention that one cannot say 'tech bro, finance bro, pharma bro, and more recently and most actively the crypto bro'? These have been there for years despite the guidelines existing.
Me saying 'AI bros' is no different. Given it is fine to mention the tech bros, finance bros, crypto bros and the other, then it is also fine to say 'AI bros'.
I am not a lawyer, so the following is only my opinion.
Scale matters, but so does the legality of the thing that scales.
Reading two dozen books by other authors, or studying hundreds of artworks, or visiting the museum of awesome statues every week, in order to get inspired for ones own novel/painting/scuplture, isn't illegal.
So a lawsuit will have a really hard time argueing that it somehow is a problem if its two dozen billion books/paintings/sculptures. Because, such a lawsuit would suddely need to explain why the smaller scale is also problematic, only less so. And given that this is basically how art worked ever since the first human had the idea to paint pictures on a cave wall, that's a hard sell.
However, AI learning is not the same as a person learning. The same way memorizing a book is not the same way as putting it into computer memory. Nobody would sue you for copyright infringement if you memorized a book, song or movie in your head. No the issue is a completely different matter if you made a copy on a harddrive.
You are just anthropomorphizing the model by calling it teaching and then implicitly equating that it's the same thing happening in the human mind. That's your burden to establish when you say it's the same.
Actually, isn't the burden of the plaintiffs to prove their copyrights are being violated?
Whether or not it's identical to human brains isn't the matter, they'd need to prove how a small 5GB model trained from a huge dataset infringes their rights specifically.
I am not anthropomorphizing anything, because this is literally what happens. The model is taught, by having its predictions tested against examples, how images work.
I have a 1.9 GB mp4 file on my harddrive. It contains 2 hours and 15 minutes of 1080p video data at 24 fps. Assuming it was generated from 4096x2160 16-bit color depth source material, the "training data" was 10.32 TB. I bet I could even get a similar size reduction as LAION-2b if I recompressed it to 720p.
Could I not also claim that I created an advanced AI model, which did not copy but learned patterns in the dataset? Modern video compression algorithms are getting quite complicated, after all.
I think no reasonable person would agree with this, but can you prove that the AI model is doing something substantially different?
> Could I not also claim that I created an advanced AI model, which did not copy but learned patterns in the dataset?
Such patterns would enable the video file to decode into a multitude of pictures not originally in the training data. Obviously, a video file cannot do that...it's just compressed data.
Generative models however can generate things that are not in its training set.
And of course, there is a fundamental difference in the source data between compressed video and a generative model: video codecs work with a sorted sequence of images, where most images are slight variations of the ones before them. The training for generative AI doesn't have these properties, the input is not an ordered sequence, and even similar pictures are not sequential variations of one another.
Relying solely "uncompressed" size does not a really good metric make (this is analogous to the raw input size of the LAION dataset): one could make a reasonable argument that there are not billions (1) of image-pairs that are effectively identical up to a minute shift. I would posit the correct basis would be the Shannon entropy of the "best fit" ordering (minimize inter-frame diff), versus the lossy-compressed video, and a similar "best fit" ordering for the LAION dataset vs. the model.
My suspicion is that one will find that the relative number of "smooth transition" pairs in LAION viz the whole will be very different from the video.
-------
(1) - Napkin math: There are about 194400 frames, so ~37 billion (37791165600) frame-pairs. Assuming you have runs of about 1 second between hard cuts throughout, so an incidence rate of 1/24 for non-smooth transitions, gives us about ~36 billion "smooth transition" frame-pairs. I think it is safe to assume "on the order of" 1 billion, then. This ignores long "action" scenes with significant variance in images throughout, but also ignores longer-than-1-second slower scenes, hence the order-of-magnitude shrink in the assumption as buffer.
> Nobody would sue you for copyright infringement if you memorized a book, song or movie in your head.
No, but what if you then produce "your own" rendition, or "remix", of that book, song or movie and offer it to the public? E.g. you memorize a collection of Taylor Swift's latest songs, and then start performing a medley of her hits in your local clubs, you may well find yourself in trouble.
You’re talking about facsimile, which is a product of memorization, which is a type of learning. And it is not the only type of learning AI is capable of.
AI, like humans, is capable of both imitation and facsimile. It is far superior at both feats.
Your fallacy is that you are noticing AI is superior at facsimile and erroneously assuming it is “not learning”. You are also ignoring the other amazing learning feats of imitation in front of you.
Parrots are lovely animals, but it’s unclear what you think you’ve accomplished by bringing them up. The fact that they are capable of more than just memorization does not differentiate them from advanced AI models, which are rapidly gaining all sorts of abilities.
Polly the parrot would have a hard time producing a picture of Elmo with a light saber in a Superman costume riding a dragon on the moon in the style of Rembrandt (in under 300ms, at least). I also know a parrot couldn’t write a 500 word story about the image.
You don't need to "understand" art to create it though. Good art, sure, maybe, but not really. Plenty of brilliant musicians who know fuck all about music theory, but they can crank out tunes.
Diffusion would appear to me to work in much the same way. It doesn't understand what's good ("works", creates acceptable output) or why, but it knows it when it sees it, and has the tools to refine it.
> And a parrot is billions upon billions of times more capable than any current AI algos.
I am pretty sure that GPT-3 is a lot more capable than a parrot in transpiling a function written in Python to Golang, or writing a summary to a tech-magazine article.
Same as StableDiffusion is ALOT more capable than me in drawing, painting and generally making up pretty pictures.
You make it sound like we should assume that getting inspiration from a few hundred or thousand art works that are very famous and highly public is the same as training over nearly every available public piece of art. I see no reason why that should be our null hypothesis.
Humans either learn art by being natural art geniuses, or by receiving instruction and learning through an iterative process (where, again, they might create thousands of art works, but nowhere near the scale here), which is very different.
#2 is pretty interesting. After all, the purpose of copyright law is to encourage creative works. If we have machines that can generate creative works on demand with little effort, what purpose does copyright law serve?
I am not a lawyer, so the following is only my opinion.
Copyright law serves the purpose of peoples works being protected from unauthorized parties making copies of their works, and profiting off them.
It doesn't protect from technology making the production of new works cheaper, faster, more efficient. An artist using photoshop can be, and is allowed to be, many times faster than one using oil and canvas.
I don't mean to suggest those are the only options, or that either one is even practical. The point is to determine where the objection lies: with the method or the outcome.
Hm, okay. In that case, I would say the first is a problem, and the second is also, but differently.
I think the objectionable thing about the second is that the AI knows everyone's styles, and so can use them in creating something new. Even if the AI is restricted to not be able to paint an image in a certain artist style (as the new version of stable diffusion is, for instance) and the art is unique, I think part of the problem is that the AI is still (presumably) leaning on the collective styles of everyone it has trained over.
If we can train an AI over a small dataset or maybe even a large dataset of old art, or some mix in between, and then maybe fine tune it wtih a snall sampling of modern art, then I believe it would be unobjectionable, as this is largely how humans do it.
The only difference that matters, is scale. And again, if I want to argue that something done 10000000000 times is legally problematic, I have to be prepared to explain why doing it 10 times is problematic as well, only less so.
The question isn't if a scaled up thing is the same, the question is if a legal thing scaled up suddenly becomes illegal for no other reason than being scaled up.
Nothing new. People differentiate between genocide and murder, for example, or poisoning water supply vs an individual poisoning. Criminal law in quite a few places definitely has scale considerations.
You just gave two examples of where both ends of the scale are illegal, which only strengthens the argument of GP. IANAL, and I'm not stating anything about the reality of the judicial system, but only following the logic of the argument.
My examples only were meant illustrate that scale is a well known "thing" in legal systems and I happened to pick things with two illegal endpoints (IANAL).
You could look at other things like the need for permits for certain things as a function of size and use, if you want simple examples for scale mattering and legal endpoint(s).
I am no lawyer, so the following is only my opinion.
I am completely aware that scale is a "thing" in legal systems. But as I said before: For scale to be important, the unscaled act in itself has to be problematic already.
I recently worked on information extraction from 10K documents. GPT-3 needs about 7 days of operation in batch mode on one thread. It takes 40..70s to read one single document and report the extracted data. One MINUTE per page.
But I think you meant GPT-3 has seen many books during training, not during inference. You should know that training on millions of books is not the only way GPT-3 learns. It is just the foundation of its knowledge.
GPT-3 learns "in-context", that means it can learn a new word or a new task at first sight. It just needs a description or a few examples. This is the most powerful feature of GPT-3 - in-context learning. And when it comes to ICL, it is much like humans - only sees a few examples, not millions of books.
> “Do you ask for permission when you train your mind on copyrighted books?”
The nature of ICL is that it happens at prediction time. So GPT-3 would have to explicitly be instructed to learn a specific skill. Should it reject instructions if they are sourced from copyrighted books?
> “You should know that training on millions of books is not the only way GPT-3 learns. It is just the foundation of its knowledge.”
I’m not a lawyer, but to me it seems within the realm of possibility that a U.S. court eventually finds strongly in favor of the copyright holders, the Supreme Court agrees (because Big Tech has so few friends left), and OpenAI will be required to destroy the GPT-3 model and all copies of the training data because they can’t filter out copyrighted works.
Yeah, I think "we can't actually tell which bits of our model are derived from your work which we copied without your permission onto our system for training purposes probably makes AI companies more vulnerable rather than less. Other platforms like search and social media have successfully defended themselves by showing willingness to promptly remove copyrighted material, let copyright holders opt out of indexing or even negotiating stuff like ContentID so copyright holders get paid each time their work is used.
If GPT-3 can learn in context it means both the training set and the prompt could be in copyright violation. So even a clean model, trained on licensed data, cannot guarantee there will be no copyright issue.
I think the Aereo case is an interesting precedent [0] [1]. An individual DVR-ing over-the-air broadcasts with an antenna was fine. A corporation DVR-ing over-the-air-broadcasts for thousands of customers by using thousands of tiny antennas was not fine.
It's about a law designed by humans to give everyone a chance to make a living and contribute to the common good. You think you have found a loophole in that law that lets you use that work for free and deny author's any compensation.
Bear in mind, AI is not making artists or creative types obsolete - that would be fair game, just like computers made human calculators obsolete. No, this is about abusing other people's work.
Copyright never guaranteed anyone compensation, nor is it loophole to not pay for copyright just because you saw and learned from something in the public domain.
If using someone else work for learning is infringement, then that's going to cause a lot of difficulty for all artists. Try making a rock song without listening to rock, or paint some modern art without viewing it etc.
Copyright exists to protect human authors and promote creation, which in turn leads to learning - from other human beings. Algorithms are not learning, they are automated tools which are consuming creative works and outputing derivative works.
The loophole is to use copyrighted works for free despite no learning taking place - no human being observing and developing their skills based on that work - rather, an algorithm transforming those works into some other useful interpretation of them.
I’m not able to read billions of books in less than an hour.
I think you underestimate the sheer volume of data + conclusions the brain ingests and processes on a daily basis, primarily through unconscious experience.
The training set for GPT-3 is about 500e9 tokens; any given synapse in a human in their lifetime is going to fire about 2e9s * (10% * {100Hz to 1000Hz}) = 20e9 to 200e9 times.
> Do you ask for permission when you train your mind on copyrighted books? Or observe paintings? Or listen to music?
The difference is that I buy books, pay for visiting museums and buy music in several formats, or pay it accepting to receive advertisement between songs.
It Is expected that If I buy a book I will be allowed to read it without asking for a permission.
What I don't do is copypasting paragraphs of other books to write a new book and claim that is mine. Is a different situation.
If you pirate a book, learn from it and then create something using the information you learned, would that creation constitute copyright infringement? If so how far does the tainting go? Once you put your eyes on something which you haven't purchased, all future works could potentially be inspired by that experience and should therefore be considered infringement, following your logic.
> would that creation constitute copyright infringement?
No, obviously not, as that would clearly be unworkable and ridiculous. Mainly because we have a very unclear understanding of human creativity, and there’s no way to analyse an individuals mind to understand how they created an idea. Additionally copyrights reach generally stops at the point of “transformation”, once you take any idea an transform it “enough” it’s considered a new idea.
The reason none of the above applies to AI is simply because we’ve declared that only humans can transform and produce new ideas. AI aren’t human, thus they’re not afforded the same rights. Arguing about if there’s an inherent difference between AI creations and human creations is pointless, the law doesn’t care, it has already declared that there’s a difference between AI and human.
If you disagree with declaration, then you need to lobby to change the law. But until the change occurs, your believes are meaningless in the eyes of the law.
You see how you’re introducing a viewpoint and assuming it’s true — you’re just saying “humans can create things” and “ai can’t create things”. You don’t even address the possibility that the AI itself is a tool of the human who created it to create things.
I think in order to justify "the AI is a tool that a human is using, just like a paintbrush" you would have to have to define what meaningful creative process that the human has followed while using the tool.
In my opinion things like selecting a training dataset and then writing prompts are not creative processes they are mechanical processes. Input in, output out, with barely any interaction from the human.
Consider when you commission an artist to make a painting. You give them a "prompt" by explaining what you want. Maybe you even give a "training dataset", a few examples similar to the look and feel of the result you want.
Then they go off and make something. They show you in process stuff and you make suggestions so the next version they show you is closer to what you want. This repeats until you are both happy. Then you own the drawing. Because you paid them for it.
In this case however it's absolutely clear that you did not create the work. You had input into the creation, but the artist was not a tool you are using to realize your own creative vision. They are the creator, you're a customer for them.
In my opinion things like selecting a training dataset and then writing prompts are not creative processes they are mechanical processes. Input in, output out, with barely any interaction from the human. >>> When it's bleeding edge research there is a ton of human creativity involved in developing the product and engineering the dataset.
Even if that were true, which I am kind of doubtful of, most applications of these tools are not going to be bleeding edge research and should not be treated as though they were.
The reason why you put your eyes on something is probably that someone had the hope of selling it to you. Or that someone paid for it on your behalf. The difference is that machine learning algorithms never (or rarely) leave a single penny in their training set creators’ pockets, turning “no income” into the default outcome.
Copyright law is not about logically perfect system, but creating a general environment in which artistic, academic and other creations can appear and benefit the general population.
> Copyright law is not about logically perfect system, but creating a general environment in which artistic, academic and other creations can appear and benefit the general population.
Yes ... and because it's not a logically perfect system, its lifetime has to be limited. One day we should abolish copyright and find a better, more functional way to drive progress.
Copyright at its heart is fine. The original objectives, allowing people to hold a short-term monopoly on their ideas, so they can fund further ideas, and the manner in which they’re achieved is perfectly fine.
Where goes wrong, is when individuals and cooperation believe that such monopolies should be indefinite, and pushed the monopolies beyond the lifetime of the author. A dead author can’t produce new works, so it’s now clear how allowing such long monopolies increases the amount of creative work produced.
The original primary objective of copyright was to create an environment to could produce an endless supply of public work, freely available to all. It’s only abuses of copyright over the past 50 years that have destroyed objective, and ironically it’s copyright holders like Disney that really starting to suffer the consequences.
Winding back copyright durations to better balance the public and private interests would go a long way to resolving many of our issues with copyright today.
> Yes ... and because it's not a logically perfect system, its lifetime has to be limited.
It’s also worth pointing out that no system of law is “perfectly logical”. It’s almost certainly impossible to produce a perfectly logic system because humans are inherently illogical, and binding them into a perfectly logical system of law would almost certainly produce more injustices.
It's really not. Economics is very simple at its core. You tax negative externalities and subsidize positive externalities. The discovery of new information is a positive externality. It should be subsidized.
Anything that has infinite supply and zero marginal costs, as Nobel Prize winning economist Samuelson argues when he was looking at the context through lighthouses[0], should be free to all. By using copyright to make it a monopoly and allowing the extraction of monopoly rents you are drastically reducing the value and reach of the thing that was discovered. Copyright is a hack and this hack is now fundamentally breaking. Instead of trying to save the hack, we need a full rewrite. If winding back the duration of copyright is correct, the best winding back is zero.
As we are a remix culture where idea A and idea B combine to create idea C, we drastically reduce the innovation in our economy through reduced discoveries. This failure ends up with large monopoly holders consolidating into bigger and bigger entities in order to right some of this failure, but that only makes the monopoly extraction worse.
The discoverer should be subsidized for the discovery of that information but it should immediately go to the public domain. How you work out what that works out to is just as abstract as what Spotify works out what each play costs. This is no doubt monstrously complex to figure out the dollar number what some discovery is worth, but it is the economically correct path. Copyright isn't.
If you pirate a book, lossy compress it into a 14-byte content description vector, decompress it into a book that fundamentally contain at most 14-byte worth of information, then subsequently sold it, that will still be piracy partially depending on how good that 14-byte representation is.
Your statement here is simply ignorance. Generative models do not only provide “copy paste”, they can interpolate and extrapolate from training data. When a human sees a bunch of ideas and mixes them up to produce something slightly different it doesn’t bother anyone. But when a human creates an AI and uses it to do something similar suddenly it’s a problem. I think the burden of proof is on the laypeople here that keep whining about how AI is just copy paste (which is simply not true and such a base simplification it wouldn’t even pass a ELI5 truthfulness test). I’m sure this will get downvotes plenty.
We're not dealing with AGI but modern models absolutely demonstrate creativity and newness. If it was simple copy paste, we wouldn't be having this whole conservation since it would be simple for copyright owners to sue and win in case of infringement.
Even if what you say is true, the person using it is still asking for what they want. Some of these prompts get unique enough that it's unlikely somebody else is going to make another one like it.
What's the difference between acting as if it actually understands and "true understanding". I'd argue there is none, or at least that it doesn't matter. For instance there is nothing you could do to prove to me that you aren't just a black box acting on input in a sophisticated manner (eg. chinese room argument[1]), yet I give you the benefit of the doubt. GPT's lack of understanding of math may be a localized lack of understanding where it does "understand" other topics. I don't require you to display an understanding of quantum physics in order to prove that you're able to understand anything at all.
I'm sure this topic is already the subject of much discussion, but from the sessions I've had with ChatGPT, it's quite obvious it doesn't really "understand" very much in the way humans do. At best it seems to understand what question you want an answer to, but it often fails miserably even in simple cases (try asking it how many letters certain words have, or to give examples of words ending in a particular letter etc.).
But sure, eventually it may overcome those cases and make an excellent mimic of an intelligence with understanding. If it's genuinely able to produce output accurately emulating all the sorts of logical reasoning humans can do then it may well be impossible to distinguish it from "the real thing" (whatever that actually is...)
Humans understand and there is just no comparison. Can an AI make a major novel discovery as humans have? How could they if they don’t understand language?
Presumably AI trainers aren't hacking into Amazon's servers to steal their copyrighted ebook files. All the data they use are publicly available to view by the AI, just as a human might view them. So I don't think your distinction is accurate. It think the question is to what degree are AI systems "inspired" by the content they are trained on, versus merely regurgitating it - to be honest, it's hard to draw a line between the two for people's creative work, never mind those of machines.
Getty and shutterstock images are watermarked and copyrighted as to not being reproduced, so , no, they are not confining themselves to copyright. Also, take 'AI' that is used currently to sell art on t-shirts for example, if you specify yellow eyes, it will show you examples of yellow eyes and you choose the one you want. The AI did not produce those samples. The AI did not learn to make pictures of yellow eyes. The AI produced samples of yellow eyes from it's repository of collected images that it did not make itself nor ask permission to use nor pay for.
> Do you ask for permission when you train your mind on copyrighted books
yes, thats why I pay a fee to buy/borrow one (or someone pays the fee in the case of a library.)
> listen to music
again money is exchanged.
> Humans are constantly ingesting gobs of "copyrighted" insights that they eventually remix into their own creations without necessarily reimbursing the original source(s) of their creativity.
yes, and so long as they are not derived works, its not a problem.
Copyright is there to allow you and me to develop things and make money from it. It is there to stop people stealing our work, which may have taken years to develop and sell it for a profit with none of the risk.
Large corperations have abused this to make monster profits.
Google have spent billions to try and persuade us that copyright is evil, because they didn't want to pay content producers to host their work (ie music and movies on youtube and local news site)
The issue is this, I might have made a website that tells users how to make a specific type of metal work. I have a free ebook, and I run courses. I have spent many years to to perfect the art, create the tutoring content, recording videos. its advertising supported, and people are asked to consider buying a course, to support the creator.
The AI company comes along and scrapes all the content, allows people to regurgitate it, with more or less accuracy.
The creator now gets less traffic, less money and now cant afford to create more content.
The AI people now skim all the money, and the consumer gets less useful information.
Culture isn't free. Someone is paying for it, and if you stop paying them, then it doesn't get created.
You don't need to pay for or "borrow" anything to learn from copyrighted works. Nobody has had that expectation for years, and that is also not what copyright pertains to. It's not that AI breaks into libraries and isn't paying the fees. You can google an image of any great work of art and look at it for as long as you like, for free and take from it what you can and use all of that to create something else and get paid for that thing. The stuff that AI uses to learn is available to you, as a human. That is not being challenged in the least.
As parent said: Everything is derived work. We are remix machines. It is how we learn and how we make money. Now with AI, apparently, we are offended, when something does it better and faster than we can? To me it seems, if we expect AI to pay additional fees, the question is: Why?
I am not saying that it's not an important question. Google has built its entire business around information other people have provided. I would argue most people are quite happy with the existence of something like Google search and see it as a net positive in their lives. Does that make the business part okay? Where do we stand on this in regard to an open web? Is it okay for Google to do what they do (and if they do it well to win the space), or should there maybe be a license where people have to pay the owner whenever they are indexing a website? I don't know. Feels complicated.
> If you stop paying them, then it doesn't get created.
That's an interesting thought. But is it true and, more so, is it a problem? What if humans from here on will only be paid to create stuff that an AI can't?
> You don't need to pay for or "borrow" anything to learn from copyrighted works.
someone pays, just maybe not you. How do you think google/meta/et al offer you a service free at the point of delivery, through charity?
> You can google an image of any great work of art and look at it for as long as you like
see my bit about google. The copyright still is with the owner. That image can be removed, should the owner wish, but for various reasons its too expensive to get google to respect that.
> I would argue most people are quite happy with the existence of something like Google search
yes, because its a symbiotic relationship. I as a creator, make something that people want to find, google points them to me, and I get people's attention. I might do that to fluff my ego, or try and convert it to cash through sales or something.
The AI step threatens to remove that relationship. Instead of being passed to me, the AI just pastes shit its gleaned from mine and other websites, leaving no chance of me getting a reward for making that website.
> You can google an image of any great work of art and look at it for as long as you like, for free and take from it what you can and use all of that to create something else and get paid for that thing.
If the copyright on a given work of art is still active, those pictures were taken and are distributed with the permission of the copyright holder (or they're just pirated). That's one of the reasons it's much easier to find images of classic art (for which the copyright has expired) than it is to find images of contemporary art.
> You don't need to pay for or "borrow" anything to learn from copyrighted works.
What exactly do you think copyright is, and would you be surprised to learn that libraries have purchased the books on their shelves?
> You can google an image of any great work of art and look at it for as long as you like, for free and take from it what you can and use all of that to create something else and get paid for that thing.
If you're referring to piracy, that is very much being kept in check. Otherwise, the vast majority of copyrighted art is only available for payment in various ways (streaming services, museum and theatre access fees, library cards, buying e-books etc).
I don't think there is any defensible reason to have people at large suffer over AI advancement, without having a plan for making their lifes better.
If AI takes jobs because it's simply superior at them, and that creates friction and anxiety until we have stuff figured out, that's of course sad and we should do our best to soften the process, but I think it's inevitable. The carriage must die. It seems obvious that restrictions on training data are just a distraction and will not move the needle on any interesting time frame.
If however AI does not pay forward, in an arrangement that makes our collective lifes better, I will be the first to work on burning it into to the fucking ground.
But, on a lighter note, since that has generally been the direction of human civilization (not linear when zoomed in, but always when zooming out) I remain optimistic.
In the anglo-saxon world, I have not seen significant successful program since the industrial revolution that has helped or softened the impact of a new process on an affected lump of people[1].
The weavers were left to rot when the automatic looms came in. (there were in flanders east england and northern france incredibly rich and influential class)
Furniture makers were left to rot when steam power tools came in
Farm labourer were left to starve when steam threshing/harvesting came in.
enclosure was another tragic note in england.
The green shirts were lobbying for "a share of the domestic profit" in the 20s-30s, in the 60s they were convinced that we were going to be working 2 hours a day by now, with robot servants cooking and cleaning for us, and no-one would be living in poverty. Even Orwell has written on this.
Instead we see productivity in the western[2] world dropping. Meaning for every human hour worked we make less money. because I suspect in part to the rise of servant-as-a-service jobs(food/shopping delivery/cleaning/elderly care etc etc) all of which are long hours and low paid.
[1] well, DDR everyone had a job, but lived in permapovety and were likely to be disappeared if you spoke out.
[2] specifically the US and UK, who appear to be snorting financial inequality by the metric fuckton
What I was more so thinking of are the unspecific societal functions that evolved to the benefit of everybody, but more so to those who could not have afforded them beforehand: Quality health care, various forms of social support, more accessible education and food, better road systems. The stuff that makes the charts on education, prosperity and health go from bottom left to top right and child mortality and hunger in the opposite direction.
The injustices of the day do not show in the most important, most long term graphs. As far as I can tell (and I am happy to hear your thoughts) this can only be true because people have benefitted increasingly from things improving, over time.
> It is tragic to me that a person can't see culture as anything but a marketable good.
with respect, thats not what I am saying, I'm saying it has a cost. If people do not have the means to spend that money on making culture, then it is not created.
Juvenal was a client of someone, and complained about it
Tallis, Allegri, Purcell, Bach, Mozart were all professional composers
The great seats of learning (Ashurbanipal's libary, Venice, Alexandria) are all paid for by a ruler wanting to show off how good they were
Wilde, byron were all rich people wafting around bored and making art on the way.
In the 60s-80s it was possible to live in NYC working at a bar or something, and still have time and money to create art. Where can you do that now?
Now you need to be rich, or have time, or get patrons. The internet is a great way to either lower the cost of entry (see music) or get support to create (see Patreon)
> This is just deeply wrong. Culture existed before money
Culture existed when we had time, food and resources to stop worrying about being cold wet and hungry.
> Culture isn't free. Someone is paying for it, and if you stop paying them, then it doesn't get created.
That doesn't seem to be universally true, but an end-game of capitalism. There are countless examples of artistry/sculpture/music that were created long before copyright existed and although they may have been "paid" for it previously, those cultural items can be appreciated without needing to pay someone for it.
There are also many contemporary cultural items that were created without monetary recompense that can also be enjoyed without needing to spend money.
> Copyright is there to allow you and me to develop things and make money from it. It is there to stop people stealing our work, which may have taken years to develop and sell it for a profit with none of the risk.
Your use of the word "stealing" is unnecessarily loaded and specifically means that the creator was deprived of physical ownership which would be incorrect.
> they may have been "paid" for it previously, those cultural items can be appreciated without needing to pay someone for it.
You are arguing against your own point here. As I said culture stops being created when there is no money for people to create it.
Should copyright never expire? no. is 25 years enough? you betcha.
> There are also many contemporary cultural items that were created without monetary recompense
Again you are missing the wider point. For culture to be created you need a mix of people, and those people to feel safe enough, and have enough time and energy to create said culture. They will also need money for materials.
As I suspect you are not on a poverty wage, you will have the time, energy and healthcare to be able to create a new thing. This is not a luxury someone who works two jobs just to make rent has.
> our use of the word "stealing" is unnecessarily loaded and specifically means that the creator was deprived of physical ownership which would be incorrect.
stealing is taking with intent to deprive. I mean specifically what I say.
taking someone else's work and selling it as your own to make money, whilst depriving that person of credit or income stream. It is morally wrong.
Now there is an argument about corporations abusing copyright (they do) but, throwing it all out only benefits people like google, amazon and facebook.
> Do you ask for permission when you train your mind on copyrighted books?
The law already makes many distinctions between humans and machines. For example, looking out the window to see when your neighbor is going to the supermarket: allowed; using a machine-vision system to store the movements of groups of people into a large database: not allowed.
Also, "training the mind" and "training a machine learning system" are two completely different things, even though the language used is the same.
It seems to me that one side is arguing that people (as in, individual human beings) already do what the AI is being accused of, the other side argues that it's replicating work.
The truth of the matter is that what is taking place is a different thing altogether. We do generally deal in a different way with "machine behavior" because we recognize it being automatic and reproducible matters.
> Humans are constantly ingesting gobs of "copyrighted" insights that they eventually remix into their own creations without necessarily reimbursing the original source(s) of their creativity.
Yes, and humans are being found liable for copyright infringement for doing so. All that's needed to establish liability is access and substantial similarity; the bar for the latter can be very low indeed (see Williams et al. v. Bridgeport Music et al.).
> Do you ask for permission when you train your mind on copyrighted books?
I pay the books directly (cash, credit) or indirectly (school books via taxes). I do pay the louvre to observe the painting. I also pay to listen music in ads (YouTube) or via subscription (YT Music and Spotify).
But the datasets these tools are using are available to view for free. The AI isn't stealing physical books or paintings, it's viewing the same data that you or I can by sending an HTTP request, for free.
Could an AI view you for free in a street or even through a window? Does that imply it can use that view data to create advertising using your modified likeness, for example?
Just because you can view something for free doesn't mean you can use it anyway you want.
> Just because you can view something or free doesn't mean you can use it anyway you want.
This whole thread really makes me want to pull my hair out.
Difference between illegaly creating a (even temporary) copy of a copyrighted work (e.g. streaming a movie) vs. creating a derivative work of said copyrighted work: Two completely different things, with completely different legal outcomes.
If OpenAI in any shape or form creates a temporary copy (<--- by copyright definition of what a copy is!) than this needs to be adressed with the former. If OpenAI creates a work that is considered to be a derivative work (<---- by copyright definition of what a derivative work is!) than that needs to be adressed with the latter.
The crux of this whole thing is: Human minds cannot make a copy of a copyrighted work by definition of copyright laws (in Germany, I presume the same can be said for pretty much all western copyright laws), while anything that a computer does can be construed as making a copy.
> anything that a computer does can be construed as making a copy.
but that's not the point of contention. The training data set has been granted the right to be distributed (by virtue of it being available for viewing already - it's not hidden or secret). The proof is that a human can already view it manually. Let's call this 'public'.
The question is, whether using this public training dataset constitutes creating a derivative work. Is the ML model sufficiently transformative, that the ML model is itself a new work and thus does not fall under the copyright of the original dataset?
>but that's not the point of contention. The training data set has been granted the right to be distributed (by virtue of it being available for viewing already - it's not hidden or secret). The proof is that a human can already view it manually. Let's call this 'public'.
This is wrong. My paintings are publicly available (especially going by your definition [which I'm confused by the origin of?]). Taking a photograph of my paintings is still a copyright violation. I hope we can ignore all the legal kerfuffle about personal use, as it has no bearing on our discussion. Again -- all of this boils down back to what I've said before: Bare human consumption does not constitute as making a copy, nearly everything else does.
Your second point -- a copyrighted work automatically granting someone else any rights (especially distrubtionial rights) by just being available to be consumed -- is even more wrong. I'm not going to go further into that, as you can very easily prove yourself wrong by googling it.
>The question is, whether using this public training dataset constitutes creating a derivative work
I'm not well versed in the US copyright laws, but I would assume (strongly so) that this would not be the case. I -- again, for US copyright law -- assume that for something to be considered a derivative work, it needs to include (or be present in other ways) copyrightable (!) parts of the original work(s). In other words, the original work needs to "shine through" the derivative work, in one way or the other. The delta of parameter changes of a ML model would (imo) not constitute such a thing.
Problems with derivative works will come into play when considering the things ML models produce.
But the AI is (supposedly) not making a copy of your painting. It is ingesting it, and adjusting it's internal "model of what a good painting looks like" to accommodate the information it gleaned from your work. This seems more similar to what a human might do, when they draw inspiration from another's work. The question is - to what extent does the exact image of your painting remain within the AI's data matrices? That, no-one knows for sure.
> But the AI is (supposedly) not making a copy of your painting.
You are mixing up the two things that I've mentioned in my original comment. You have to differentiate between creating a copy and creating a derivative work. Both of those things matter, when talking about AI, but the former is way more cut clear.
>The question is - to what extent does the exact image of your painting remain within the AI's data matrices?
And the answer is: It's irrelevant. The model has to be ingested with a copy of something. That's all that matters. The AI could even reject learning from that something. By the time that something reaches the AI to even do something with it, it's been copied (in the literal sense) who knows how many times, each of those times being a copyright violation.
I see what you're saying, though could you not say the same thing about the browser's internet cache? That copies the file from it's original server, to the user's local machine in order to display it efficiently.
The poster is wrong about what constitutes a copy (for the purposes of distribution). The temporary copy that resides in your browser's memory, or local caches, aren't considered violations unless they are publicly accessible.
I would put the same criteria to the copy made for the purpose of AI training. As long as you have the right to view the image, you would also have the right to ingest that image using an algorithm.
Yes, of course. It could even create advertising using my unmodified likeness, and that wouldn't be a problem. A person's appearance in public is public domain, you don't need permission to use it.
I think that's an entirely different scenario. For one, I'm not displaying myself in my front window with the explicit intent of people viewing me. If you replace the AI in your example with a human taking a photograph, I would be equally appalled at the misuse of my image, and I'm very confident I'd have legal recourse to stop it.
Rephrasing of the question: If you put you eyes on a single pirated work in your lifetime, all future potential creations are potentially inspired by that experience. Is every future creation of a human who has put their eyes on a pirated work copyright infringement?
> Do you ask for permission when you train your mind on copyrighted books?
AI is not a mind. It’s a program. We might call it a “mind” as a metaphor, but it’s not really one.
So any justification which presupposes that an AI should be able to do something (really: that the people who are running the AI programs should be able do something) because they are a “mind” is fallacious and doesn’t need to be interrogated.
This is the uncomfortable truth that no one on that side of the argument wants to address.
Also, the fact that Artistic Freedom is now under attack by artists. Not that long ago Artists hated the Music Industry and Corporations such as Disney for weaponizing Copyright law against Artistic Freedom. Now artists are utilizing that same tactic against other artists.
thanks for sending your strawman in to do battle with his strawman.
you don't need permission to train on books, but you do need to buy the books or take them from the library one at a time.
"training" these machines so far is not like human learning as becomes apparent when they spit out source code that mirrors individual repositories. And you know that humans are required to both remix their own creations and follow copyright law at thbowe same time, and also adhere to the social and institutional stigmas against extensive uncreative cut and paste paraphrasals.
when training AIs on copyright law trains them in obeying copyright law, they'll be ready for the Turing test, or even to be called AIs.
> when they spit out source code that mirrors individual repositories
That's not a problem, we already have copyright laws that prevent people from distributing mirrors of copyrighted works. They don't care about how the works were copied.
So you are saying we should prosecute MS because co-pilot is distributing the copyrighted work? Or in other words why is co-pilot spitting out the code to someone else (often with copyright notices removed), not code distribution?
This “ML learning is not like human learning” fallacy is all over the place lately. It’s stupid, and it should stop.
Humans are capable of both facsimile and imitation.
The fact that ML is able to perform facsimile far better than a human can is not evidence that this is “not the same” learning. Only that ML learning is superior. ML is far superior in feats of both imitation and facsimile.
Are you high? How is ML learning superior to human learning?
If I show a 3 year old a single picture of a Tiger, and tell him this is a tiger, the child is able to recognize a Tiger fairly accurate in real life without further input. Though the child might say that a house cat is tiger,,,,
ML learning needs millions of pictures to do the same, and still might mistake an elephant for a tiger...
ML is nothing more than graph approximation, there is no logical reasoning
ML is currently capable of the tiger case you mention. It’s generally called “few-shot” or “one-shot” learning. In the context of an image generation model, having never seen a tiger before, if you show it a few pictures of a tiger, it could immediately draw you thousands of tigers in any variation or scenario you can think of, which is way more than a child can do.
As for the need to train on millions of images for the base model, I believe you are trying to say something about “sample efficiency”, and how ML differs from the brain in this regard outside of the few/one-shot contexts (which ML is absolutely capable of). I would argue that sample efficiency of the brain is actually also quite low, much lower than people assume. It’s irrelevant to an argument that ML is not superior, because ML is clearly is capable of learning richer, more effective representations in a shorter wall time than we can, whether it is sample efficient or not. And in the sample efficient few/one-shot contexts (learning what a tiger looks like from one picture), it also outperforms humans in speed accuracy and creativity. It’s not even close.
As for classification errors, ML is capable of some errors we are not, actually by virtue of being superior at learning representations we are not even close to being capable of learning. But those are edge cases, and they are fixed by various means. In the main cases, ML outperforms humans in speed, accuracy and class complexity, all exponentially.
You said something about graph approximation but it doesn’t make a lot of sense. I’m talking about learning and you’re complaining that machine learning is not “logical reasoning”. Whether ML is currently capable of logical reasoning is another discussion. Certain models do demonstrate some types of it today.
“Graph approximation” is a type of learning task. ML is a billion times better than humans at it so it also doesn’t help you argue that ML isn’t superior (in that regard).
There is no reason to accuse others of being on drugs because you fail to view the world in the same light as them. You can make your point known without doing so.
> Do you ask for permission when you train your mind on copyrighted books? […] Humans are constantly ingesting gobs of “copyrighted” insights
This comment fundamentally and dangerously misunderstands Copyright Law. Insights are not copyrighted, nor are they copyrightable. Copyright law controls who gets to distribute a specific “fixation” or performance of work. It is not, and never was about preventing the spread of ideas. Authors and artists have always intended for you to read/observe/listen to their work when you legally acquire a copy. They just want you to not copy it verbatim, but go do your own original work if you want to distribute or sell something.
The whole problem is that today’s NNs are specifically designed to remember and remix only the fixed performative parts of the work, and they, unlike humans, don’t understand the insights at all. They are just deterministic machines that copy and remix at a large scale. As such, it’s pretty clear the people training AI today should expect to have to ask permission before “training” (copying) other people’s work.
I don't think that's so clear. When you train a deep learning model you are making it extract the gist or insight of many works and then use that pattern to produce new works. While the NN does not experience the work like a human it is definitely not memorizing.
A silly example. Making GPT write a rap battle between Keynes and Mises goes beyond a performative remix, it is transformational work, nothing is copied explicitly. If a human were to write it that would not violate copyright.
I think that to tackle this we need a new lens other than copyright in the long term.
You’re right that it’s not so clear, perhaps I overstated for brevity. I don’t actually think requesting permission is absolutely necessary, what I really think is that there aren’t good reasons AI people shouldn’t at least first try to establish training sets that are unambiguously legal, either through use of public domain work, or through an actual attempt to curate licensing models that allow re-use. We have plenty of precedent for doing this, so people claiming they should have access to everything without permission strikes me as lazy. There’s also the problem that the AI winners already are, and will continue to be, the monopoly tech and media companies who stand to make handsome profits off of the results of their trained networks. Even if you believe the results of their tech is “transformational”, there is no question that it wouldn’t work at all without access to the source material.
The argument that NNs aren’t memorizing is definitely debatable and not necessarily true. They are designed to memorize deltas and averages from examples. They are, at the most fundamental level, building high dimensional splines to approximate their training data, and intentionally trying to minimize the error between the output and the examples. It’s fair to say that “usually” they don’t remember any single training sample, but it’s very easy for NNs to accidentally remember outliers verbatim. The whole reason the lawsuits mentioned in the article are happening is because we keep finding more and more examples where the network has reproduced someone’s specific work in large part. If we’re going to claim that today’s AI is producing original work, then we have to guarantee it, not just assert that it doesn’t usually happen.
> a rap battle between Keynes and Mises goes beyond a performative remix, it is a transformational work, nothing is copied explicitly.
I don’t buy that the work can be called transformational just because the remix doesn’t have any recognizable snippets. GPT is in fact copying individual words explicitly, and it’s putting words together by studying the statistical occurrence of words in context of other words.
> I think that to tackle this we need a new lens other than copyright
I totally agree with that. This question is legitimately hard. We do need a new lens, but we might have to keep and respect the old one too at the same time. I feel like AI work should acknowledge that difficulty and step up to lead the curation of training sets that are legal wrt copyright by design, rather than ignoring the concerns of the very people who made the work they are leveraging.
So if I visually look at a piece of work and “run” the NN training algorithm in my meat-space brain or even on pen and paper, do I need to ask for permission for doing so? Or is permission required only if silicon chips “run” the algorithm? Asking for a friend.
This isn’t what the comment I replied to was suggesting, which is important because the NN training algorithm isn’t how humans observe creative work, nor how we make insights. But, yes, the same standards apply whether your deterministic machine is silicon based, or meat based. Copyright law applies to both. If you reproduce significant parts of a fixed performative work, then you are in violation of the law. If your algorithm mixes enough snippets from enough sources, then it’s hard to tell, and you probably won’t get caught, but it doesn’t really change the fact that you’re mechanically copying. FWIW, copyright precedent in music seems to allow human-made remixes that involve multiple sources, as long as the work as a whole is original and doesn’t reproduce significant parts of the sources.
You are loading up my question with your own assumptions. My point specifically is if I’m just observing a piece of work and running the NN algorithm in my brain, does this constitute as illegal thought then? Do my thoughts violate rights? Note that I am not “reproducing” anything (whatever that term means). I am just observing the work and running the algorithm in my brain while silently sitting.
What assumptions are you referring to? It doesn’t seem like you understand Copyright Law, so that’s why I keep trying to explain it. Under Copyright Law, you have to acquire material legally, and it’s illegal to distribute copies you made to other people.
If you’re executing a NN algorithm in your mind, or via pen & paper, then you are copying from the training samples, because that’s what the algorithm does. During training you compute errors against the samples, and update your weights to reduce error. During inference or generation, you use the weights (the results you remembered across all your training data) to produce an output. When your training samples are clustered in the latent space, the network will only remember an average of the samples, but samples that are sparse and don’t have close neighbors are sometimes remembered verbatim because there’s nothing nearby to average from. You can legally run the algorithm all you want on your own. Once you run it and then distribute the output, it might be in violation of Copyright Law if you accidentally reproduced one of the samples. Same is true for traditional human learning, you can free copy ideas legally, but reproducing too closely something that someone else made may be against the law, even if it was accidental.
So we are in agreement that it is not violating copyright laws to run the algorithm on copyrighted works to produce the model, because if it is, my thoughts could be illegal too. In the end only actions such as reproducing the work and distributing it can be a violation. In other words, the end user of the model is the one to be held responsible if they reproduce and distribute the copyrighted material.
You have to acquire the source material legally. You can be in violation of copyright for copying music you didn’t buy. If you acquire work legally, you’re legally allowed to make backup copies for yourself, if you don’t distribute it. You can be in violation of copyright if you distribute something you don’t have the copyrights for.
Thoughts are never illegal wrt US Copyright Law. It’s a straw man to insist on making this point.
> In other words, the end user of the model is the one to be held responsible if they reproduce and distribute the copyrighted material.
No, this is false because it is the creators of the model that 1) did not legally acquire the source material and 2) distributed the network that contains latent copies of the source material that end users can use to reproduce works from.
> You have to acquire the source material legally. You can be in violation of copyright for copying music you didn’t buy.
This is incorrect. As another poster mentioned, it is not illegal to read a stolen book. It is only illegal to steal the book.
Secondly the source material
is acquired legally since it is open to consumption on the open internet.
Thirdly model does not contain “latent copies of the source material”. By using a simple test (currently legal standard) that if I showed you the node weights and counts of the network no person even trained in the art can identify it to a specific piece of work. Therefore it is at best a derivative, reasonably distinct.
> By using a simple test (currently legal standard) that if I showed you the node weights
Nope, this is strawman and continuing to demonstrate a misunderstanding of Copyright Law. There is no such legal standard, where did you get that? If the network can reproduce a work, then it does in fact contain a latent copy. Arguing that you can’t see it by inspecting node weights is straw man. You cannot argue that you’re not copying music if you use a new compression algorithm and then suggest it’s distinct and derivative because nobody can read the raw compressed data. That’s not how Copyright Law works. If you can approximately re-perform someone else’s work, you’re in violation. This is true even if you have to run a black-box program to produce the output.
> no person even trained in the art can identify it to a specific piece of work
Ironically, you’re actually admitting that even AI researchers can’t prove the network won’t reproduce someone’s work.
The rest you seem to now be looking for a snarky gotcha, which if you don’t want to have a discussion, then I’m uninterested in discussing further. I made clear above and in a sibling comment that remixes are gray area, and this question is complicated. That said, even if AI people do acquire source material legally, they are in fact copying it and distributing it, and that part alone can potentially violate US Copyright Law. This isn’t even up for debate, so I don’t know why you’re attempting to suggest otherwise. The lawsuits mentioned in the article were brought on evidence that networks violated copyrights of specific existing works, and lots of people have found specific examples of violations.
1) the creating of the model is does not violate copyright. Claiming otherwise means running same algorithm in meatspace would violate copyright laws, which implies thoughts violates laws which is absurd.
2) distribution of the model does not violate copyright laws because the models themselves do not contain latent copies of the work. The model itself is not the work nor a recognizable copy of it nor can it be reconstituted back to the work. It is a tool more analogous to photoshop where the tool can be used to reproduce copyrighted work, yes, by the end user (where I believe the responsibility lies). But the tool itself is not copyrighted work. Microsoft word can be used to generate copyrighted books if I’m correct. Or I can hire smarter tool: a human writer to produce copyrighted works. Is the writer-for-hire illegal? Or his employability is illegal? Of course not. I believe the law will eventually take the position that AI model is a tool.
Creating of the model does violate copyright if the model you create can reproduce someone else’s work. Your logic is faulty. Running the algorithm isn’t what causes the problem, so there is no implication that thought is the problem: this point is still a straw man argument. If it seems absurd, then shouldn’t you re-check your assumptions?
> nor can it be reconstituted back to the work
This is false. It has already happened multiple times that networks reproduced copyrighted material.
How is creating the model in my brain not illegal (as in my thoughts are not illegal) yet if it’s in silicon it would be illegal? Please give a well reasoned answer. This is not a strawman no matter how many times you insist on using that label.
Secondly you seem to be conflating the “tool itself” to “what the tool can do” to be strongly equivalent. I.e if the tool has the capability to violate laws, then the existence and distribution of the tool itself also violates said law. (Not so)
Distributing a model created using your brain is illegal, if the model violates copyright law. (Just like copying stuff without using neural networks in meat-space was already illegal.) If you create a program that reproduces copyrighted work, and distribute the program, then the distribution is illegal. That was the same answer I gave at the top. The brain vs silicon silliness is a strawman and has been all along because it doesn’t matter how your program was created, building and running it is not the illegal part under copyright law, distributing it is (and/or using material that you haven’t obtained legally).
> if the tool has the capability to violate laws, then the existence and distribution of the tool itself violates said law.
That’s right if you remove the word “existence”. Distribution of a NN model that violates copyright by reproducing copyrighted works is illegal. That part has been my point in this thread, it seems like you understand now and we agree.
It’s “existence” is not illegal under US Copyright Law unless you didn’t have the legal right to use the training material, and in that case it’s illegal to use the material whether you used a computer or your brain, it doesn’t matter how you created the neural network (or even whether you created a neural network), the violation there isn’t the act of creating the network, it’s the act of stealing and using material you don’t have permission to use.
This whole discussion would be a lot less frustrating for you if instead of making assumptions and logic arguments about brains and computers, you took some time to read the copyright legal code. https://www.copyright.gov/
If you understand that the model is a tool, and that as a tool it can be used to generate activity that can violate laws and be used for other perfectly legal activities, then as a broad principle the distribution of said tool is not a violation of said laws.
Cars, phones, guns, knives (practically anything) can be used to generate activities that break the law. They are perfectly legal to distribute. The onus on the legality of the activity lies with the end user.
While it’s true that knives and guns have both legal and illegal uses, it’s another straw man in this context, irrelevant to both neural networks and copyright law. In the case of neural networks, you’re distributing the copied material along with the tool, in the form of network weights, thus breaking the law by distribution whenever the network can reproduce significant portions of any of its individual training samples, or whenever you didn’t have legal permission to use the source training material.
> If you understand that the model is a tool, and that as a tool it can be used to generate activity that can violate laws and be used for other perfectly legal activities, then as a broad principle the distribution of said tool is not a violation of said laws
That statement is incorrect, the logic is flawed. Just because a tool has both legal and illegal uses does not necessarily have any bearing whatsoever on whether the tool’s distribution is legal. Tools that are illegal to distribute can have legal uses, and that does not make them legal to distribute.
> That statement is incorrect, the logic is flawed. Just because a tool has both legal and illegal uses does not necessarily have any bearing whatsoever on whether the tool’s distribution is legal. Tools that are illegal to distribute can have legal uses, and that does not make them legal to distribute.
Making statements and assuming the truth without reason nor evidence nor examples to back it up. Logical fallacy of begging the question. You have also not reasoned how freely available information is “illegal” to read/index/store amongst other things.
Not here to win you over. The audience can see how weak your position is. My last response here.
What are you talking about? What I said there wasn’t an assumption, it’s a fact about our laws. Murdering someone with a gun and breaking copyright law distributing copyrighted material are covered by two completely separate and independent laws. Breaking or not breaking one of the laws does not imply anything about other laws. Your statement of “broad principle” is the assumption here that claimed that not breaking one implied you were not breaking the other, which is completely and utterly false. You really really should read up on copyright law before asserting things, and maybe the laws surrounding knives and guns too if you want to use them as examples. Your comments have repeatedly and consistently demonstrated a lack of understanding of the laws we’re discussing.
This is literally what the AI does as well. It didn't walk into a bookstore and steal all the books off the shelf, it read through material made available to it entirely legally.
The thing that authors are trying to argue here is that they should get to control what type of entity should be allowed to view the work they purchased. It's the same as going "you bought my book, but now that I know you're a communist, I think the courts should ban you from reading it".
> they should get to control what type of entity should be allowed to view the work they purchased
No, that's not it. It's more like if I memorized a bunch of pop-songs, then performed a composition of my own whose second verse was a straight lift of a song by Madonna. I would owe her performance royalties. And I would be obliged to reproduce her copyright notice, so that my audience would know that if they pull the same stunt, they're on the hook for royalties too.
There are lots of people arguing against the training itself. And people arguing against all outputs, even when there is no detectable copying. I don't know how you missed those takes. You're arguing the wrong point here. Many people do want to say "no ai can look".
Only if you released it. You could definitely perform it in the shower without owing anything. And the 99% of your compositions that didn't wholesale mirror any specific song would be perfectly fine to release.
Now, moving from holding the model creator culpable to the user would obviously be problematic as well, since they have no way of knowing whether the output is novel or a copy paste. Some sort of filter would seem to be the solution, it should disregard output that exactly or almost exactly matches any input.
But it's not humans reading it, it's using it to train ML models. There are similarities between humans learning from books and ML models being trained on it, but there are also salient differences, and those differences lead to concerns. E.g., I am concerned about these large tech companies being the gatekeepers of AI models, and I would rather see the beneficiaries and owners of these models also be the many millions or billions of content creators who first made them possible.
It's not obvious to me that the implicit permission we've been granting for humans to view our content for free also means that we've given permission for AI models to be trained on that data. You don't automatically have the right to take my content and do whatever you like with it.
I have a small inconsequential blog. I intended to make that material available for people to read for free, but I did not have (but should have had!) the foresight to think that companies would take my content, store it somewhere else, and use it for training their models.
At some point I'll be putting up an explicit message on my blog denying permission to use for ML training purposes, unless the model being trained is some appropriately open-sourced and available model that benefits everyone.
> You don't automatically have the right to take my content and do whatever you like with it.
actually you don't have the right to restrict the content, except as part of what's allowed in copyright law (those rights a spelt out - like distribution, broadcasting publicly, making derivative works).
specifically, you cannot have the right to restrict me from reading the works, and learning from it.
Imagine a hypothetical scenario - i bought your book, and counted the words and letters to compile some sort of index/table, and published that. Not a very interesting work, but it is transformative, and thus, you do not own copyright to my index/table. You cannot even prevent me from doing the counting and publishing.
The section titled "Exclusive rights in copyrighted works".
There are 6 rights.
(1) to reproduce the copyrighted work in copies or phonorecords;
(2) to prepare derivative works based upon the copyrighted work;
(3) to distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending;
(4) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly;
(5) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and
(6) in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission.
> It didn't walk into a bookstore and steal all the books off the shelf, it read through material made available to it entirely legally.
Github ignored the licenses of countless repos and simply took everything posted publicly for training. They didn't care whether it was available to them entirely legally, they just pretended that copyright doesn't exist for them.
Nope,
public repos have license, often open source licenses that state that you can freely use the code, or change it , but only if the resulting product will also be opensource.
Other licenses such as the MIT license require that you name the original creator.
You don't need to accept that license to download and read the code.
A license allows new uses that copyright would otherwise block. Some kinds of AI training are fully local and don't make the AI into a derivative work, so they don't need any attribution and you don't need to accept the license to distribute.
But no license (that I'm aware of) says "You are allowed to read this source code, but you may not produce work as a result of learning from it"; for a start, that would clearly be impractical to enforce.
It's not plagiarism at all. The AI is trained on 5 billion images yet it stores only 4gb of data. Thus it is impossible that it stores the actual work. For any image that the AI generates, you can't point to any image in the training data that the image is derived from.
This was about the humans consuming other people's content.
> Humans are constantly ingesting gobs of "copyrighted" insights that they eventually remix into their own creations without necessarily reimbursing the original source(s) of their creativity.
If humans make stuff that is too close to someone else's source materials then it is considered plagiarism and not "inspired by".
> For any image that the AI generates, you can't point to any image in the training data that the image is derived from.
Why can't you point to the Getty Images watermark that it is quite happy to reproduce? Isn't that surely evidence that it doesn't actually understand what it is reproducing?
> The AI is trained on 5 billion images yet it stores only 4gb of data. Thus it is impossible that it stores the actual work.
I have also seen billions of images, therefore I cannot be actually store the real images in my head and thus nothing I paint could ever be considered plagiarism. That's brilliant, I think there are a few law firms defending artists who would be looking to hire you.
How did they train the AI without first storing the data? It's not in the model, but it was used without permission in the pipeline that lead to creating that AI model.
I don't know if that counts as plagiarism, but there's clearly some use of this copyright material that the authors probably didn't envision and did not grant permission for. I have no idea what the law would be in cases like this
> How did they train the AI without first storing the data?
the data was originally permitted to be copied.
The question isn't whether the training is violating copyright - as long as the data set had permission to be viewed (which it must have, since it was public).
The question is whether the final result - the model/weights - is a derivative work of the training data set. If it is a derivative work, then the model must be in violation of copyright. But copyright law allows for sufficiently transformative work to be considered new, rather than derivative. So is training a model using methods like this constitute a transformative work?
I can't quite place why I despise this line of reasoning/argument, but boy do I absolutely despise this line of reasoning.
Are we really going to play devil's advocate so much that we consider these early day A"I" tools as equivalent to humans? I personally have absolutely 0 qualms about treating humans and these ML tools as completely separate entities governed by completely different laws. AI SHOULD be heavily restricted, we're already headed not towards any sort of apocalyptic singularity, but a singularity of pure, endless spam spewing forth from every orifice of the internet and elsewhere.
If these megacorps behind this AI push want it to succeed, then they should be paying for access to the images/texts/music/videos/whatever they're trying to harvest en masse. I couldn't care less if an AI learns the same way a human does or any other anthropomorphising the AI crowd want to gaslight everyoen with.
>Do you ask for permission when you train your mind on copyrighted books?
Of course not, and given my ability to train my mind on thousands of books in a few minutes and spit out a full book based on that training in whatever style one wants in a few minutes for that as well, it seems especially unfair that people act as though there might be a difference between the two situations.
>Do you ask for permission when you train your mind on copyrighted books? Or observe paintings?
I think this is a specious analogy at best. The two are remarkably different contexts. AI can work at a significantly greater rate. There's also a very large question about whether for profit commercial software should be afforded the same leeway we give to ordinary human behaviour.
Fundamentally the current accommodation of copyright has two main justifications:
1) Protect economic activity
2) Moral right to identify original author of a work
The point of ease of replication speaks to (1) fundamentally breaking. A human can only produce so much output compared to an AI system.
(2) is a much thornier subject, and not one I really feel qualified to speak on.
This is a better argument than your smiley implies. Given a work, we can't (or soon won't be able to) tell whether its creator was a human or an AI. So the only important thing that matters is whether the final work infringes copyright or not. Unless people are seriously arguing that nobody should be able to use AI to produce images even for their own private use.
If machines were totally free to harvest all human work and expression for their owners' benefit, why should anyone ever allow any of their work out in open unless strictly necessary?
Why go down the route of turbocharging new forms of rent seeking?
This tired argument is very much like the "corporations are people" argument that convinces corporate lawyers and judges but literally no one else. Any lay person can tell a corporation is not a person, mostly because the power imbalance, the difference in their abilities and physical constraints, and while a lot of these arguments can be reasoned through the recognition of such is intuitive.
Most people looking at AI can tell it is not like a mind or an artist, because of certain intuitive arguments which boil down to their surprising ability and their bizarre faults (drawing hands is still a struggle for most models) and current limits (you have to hack prompts instead of asking naturally due to their current limits). You can reason about people using these arguments because they are people, but you cannot use it when applying it to NN because they are not.
I'd argue that the moment you start using the "people are AIs" argument, and you are then implying the converse is true "AIs are people," then you are assuming that there is some bidirectional here, and thus other qualities you assign to people, like, "people have rights" and "people deserve to be paid for their labor" and "people have rights to the work of their own hands" then must apply to AIs. And, therefore, the AI tools you are using deserve to be treated with the respect and dignity you had to treat artists and developers with before, and thus, should be paid for the work they create. That is, if they learned and created art in the same ways people create art. Just as we do not have that nursery does not own the art a child born there creates, or a university doesn't own the art an artist who studied there creates, you cannot make the argument that the work an AI creates is that of the "owner" or "trainer" of a model unless you are arguing slavery is in fact okay in this day and age. All of this of course hinges on the supposition that AIs are people, and that they learn as people do.
So, you cannot have it both ways. You cannot keep treating AIs as people in your arguments, but then deny them agency that is due to people. The only way this is is that you deep down do not believe they are people, or that you think people do not deserve rights or deserve compensation for the work of their own hands.
Most people aren't asking for copyright on the AI output, though.
Also animals can be trained and make outputs and nobody accuses them of copyright infringement. That's a much better analogy here than leaping to the idea of treating one of these models like a human.
>Do you ask for permission when you train your mind on copyrighted books? Or observe paintings? Or listen to music? Do you ask for permission when you get new ideas from HN that aren't your own?
AI is not a mind. A mind is a physical object, a brain inside a skull inside a person. An AI is a computer program.
And while a nerd who forgot how grass feels like might confuse the two, the courts won't.
It could be time to give AI human-like rights. Human passports, human rights, 8 hour work schedule, weekends off, vacation time, and of course workers wages.
If an AI reproduce a copyrighted work they should then be sent to robot jail, and the human who requested the work should be sentenced for conspiracy to commit copyright infringement.
It might however be a bit early to let the horse back in the barn.
Moral principles apply to living things. The living things in question are people at big companies training models to sell them as services from behind paywalls.
Stability AI is a small enough company that it doesn't even have a wikipedia page. The Stable Diffusion model is freely available to everyone along with the source code.
AI models aren't observing anything. The people who train them are copying other people's works to do so. By saying an AI model is "observing" images you're begging a question.
Your mind is not infinitely scalable. It's a factor. Your mind is able to be recognised an author under copyright law. The AI model is not. It's a factor.
I don't think it will benefit "the little guys", because "the little guys" rarely have the resources and time to litigate in the first place, or lobby to lawmakers to make the details work in their favour. Copyright always benefits "the big corps". Everyone is so eager to get one up on Microsoft that they're forgetting the bigger picture.
The fact that the justice system is so inefficient that it doesn't serve people with less than a lawyers salary of money to waste isn't important to the conversation of whether it's fair or not for someone to ignore licensing and repackage your code as an "AI".
If you want to talk about the big picture here, it's about privatizing gains and socializing losses, the goal of every bigcorp, which is just more reason to disallow this abuse.
Of course it’s material to the conversation whether people have resources in practice to use the tools of the justice system. It’s why small claims court is a powerful tool compared to a wrongful termination suit.
What you’re asking for is going to only benefit corporations. They’re the sole entities that will be able to afford the regulation you’re proposing.
It should be up to the rights-holders what they allow their works to be used for.
That way, I can say, "no, Microsoft, you can't pay me enough to allow your bot to train on my works" and at the same time allow you to train books3, thus taking power away from MS. If the rights-holders have no say, the barrier to entry is having big stacks of computers, and big corps win by default already.
For what it’s worth, it was shockingly easy to get access to big stacks of computers, as someone who had very few resources. I hardly had money for IVF, let alone a supercomputer.
TRC (TPU research cloud) makes supercomputers freely available to anyone who will put them to good use. Like, literally you. You don’t even have to have a clear research goal for general access.
It was one of the big surprises of getting into AI. I didn’t expect that at all.
Even without TRC, compute is only getting exponentially cheaper. A 1.5B ChatGPT may sound puny, but I’ve seen how powerful the non-chat variants are.
The justice system is not going to change any time soon, and you can't ignore it. So that is the reality in which we must operate. Besides, it's not as simple as "the justice system sucks", because much of the time it's just a matter of not wanting the headache, and lawsuits will bring headaches in any justice system.
Thank you. It’s legitimately scary to read through these comments. It’s like watching everyone clamor for Stalin to be put in power: not even a good idea in the short term, let alone the long term.
Your backwards appeal to ideology is completely absurd. If it was Google training an AI on Microsoft's code, MS sycophants and PR people would be calling GOOG the rights-ignoring Stalin-loving communists.
Maybe, but it wouldn’t change the truth that the outcome of Microsoft restricting Google would benefit Microsoft, a trillion-dollar corporation, and not you or I. Nor anyone who isn’t a wealthy corporation.
Regardless of whether one agrees or not with paying creators of the training data, I think the deeper issue here is about societal wealth distribution and who gets paid for X now that X is being done very well by AIs. A less equitable world has Google or billionaires getting paid. A more equitable world has the artists.
But I want to argue here that for purposes of this latter question, your proposal of copyright enforcement (or anything similar) is too little to late.
-These "copyright violating" AIs have demonstrated the proof of concept and the damage is done. Even if these AIs are banned, the companies will just parallel reconstruct it by running the 80/20 rule: pay tiny amounts to get most of the data. After all the creators of the data were doing it for free and are in such fierce competition there's no bargaining power.
- More nefarious AIs will just do transfer learning on intermediate neurons, very difficult to prove stealing here.
- Even if you get the system to work, what about future artists and writers? Are we just creating an entrenched historical group of creatives getting royalties forever?
The distributional problem is not well solved by copyright, and better solved with e.g. corporate taxes, income taxes, VATs.
>- Even if you get the system to work, what about future artists and writers? Are we just creating an entrenched historical group of creatives getting royalties forever?
This is kind of what happened with music, no? In some countries hard drives, SSDs etc all carry an additional tax that is then given to some copyright organization. Of course it's not the artists that mainly benefit from this, but instead it's the people running said organization.
> Even if you get the system to work, what about future artists and writers? Are we just creating an entrenched historical group of creatives getting royalties forever?
Copyright expires, and new artists will create new (copyrightable) art in the future. Unless your assertion is that generative AI is so good no one will make art without it ever again?
If the proposed system works, I expect those entrenched artists will sue young human artists whose work shows signs of learning from previous art. The vast majority of music, books, and movies have clear influences.
The proposed system exists, and humans have had to work in it for some time now. People get routinely sued for copyright infringement if their work is too close to an existing work. The (successful) suit against George Harrison for "My Sweet Lord" is a good example of infringement via influence with no clear malicious intent.
> Even if you get the system to work, what about future artists and writers? Are we just creating an entrenched historical group of creatives getting royalties forever?
The flip side of this is that if we undermine paid creators until there's no incentive for them to create, then the AIs abilities stagnate on old data and we as a society drop or at least diminish the skillsets that could create new media.
AI can generate stuff humans care to look at only because of the availability of data that humans created for eachother to enjoy. As tastes, fashions, zeitgeists and pop culture change amongst humans the AI models will always be behind and unable to follow trends completely. I think.
> The flip side of this is that if we undermine paid creators until there's no incentive for them to create
The incentive to create is almost never financial. How many artists finance their creative efforts by working day jobs? Making a living as an artist is more about buying yourself the time to focus on making art than it is about making money. People will continue to create art, however they can, because they must.
I agree that people wouldn't stop making art. Sorry to shift the goalposts here: I do think that there are types of art that are not created except for commercial reasons, and that body of work is what I would expect to get displaced by AI. In fact it already is, Advertising creative media is one example, it's an industry I am involved in and we are already seeing Dall-E and ChatGPT getting used for quickly concepting ideations for clients etc. I would expect an AI to get worse at meeting commercial needs over time because of what I said in my original comment. Or at least for commercial creative media to stagnate if it could only use AI (because no one is making commercial media just for fun).
This is all stuff I am actively thinking about since it is impacting me right now, so I appreciate the discussion and would be happy to be wrong.
1. Art is better when it's not paid. Real artists have day jobs that pay the bills and they create art to express their ideas, not to make money.
2. Paid art isn't going away, it will just change. Certain skillsets will be forgotten, like how landscape painting was replaced by photography. But talented artists will leverage AI tools to create works that are greater than anything that came before.
> Art is better when it's not paid. Real artists have day jobs that pay the bills and they create art to express their ideas, not to make money.
Trying to define who "real" artists are is a folly for the ages. It is the dream of many artists that they get paid for their art, and many achieve it. The starving artist is a mythos of pain and suffering, a good story but hardly good for art. Some of the best composers from history were paid, some of the most influential artists were from wealthy families. They were able to focus on their work without fear of money and because of this they could excel in techique and execution, which allowed them to produce some of the highest forms of their art in history.
AI models need human creative decisions as part of the process of making art. This is consistent with current copyright law as well as contemporary art theories of authorship and practice…
Eg, Donald Judd’s works are these creative decisions and processes distilled to the most basic of sculptural form.
> - Even if you get the system to work, what about future artists and writers? Are we just creating an entrenched historical group of creatives getting royalties forever?
The boat has long since sailed on this… ands it’s globally entrenched as a norm of international trade that we are all “ok with this” regime of 75 years or century plus copyright terms …
And arguably the entire copyright vs AI/ML training datasets debate is founded on the notion that the artists individual copyright will last long enough that it’s going to outlive the average artist. If we look at one of the old copyright regimes, for comparison… in a world where copyright is a short default/implicit/automatic term (14 or 28 years) and the copyright owner can elect to register and pay for extensions (for a more modern twist, preferably combined with increasing incentive to prevent perpetual renewal abuses by Disney, et al)… now imagine how much data from up to 28 years ago there is, the catalogue of art and photographs and text and books and academic writings… all public domain because the authors didn’t consider them of sufficient value… all free for the ML model training… this gets even larger with a 14 year term…
Suffice to say that we are seeing systemic impacts already, culturally we’re seeing more and more money put behind less and less content controlled by fewer and fewer people due to a slow death spiral off copyright stranglehold across multiple industries, written, visual, audio and video arts are all dominated by large corporations holding IP … yes individuals continue to create, but other than rare breakthrough chance successes and internet age viral success (which are often just completely arbitrarily/random and have no real quality) these companies decide what will be popular culture…
My prediction is that the AI/ML models will be allowed but heavily scrutinised, under the simple legal doctrine that the user is the one committing the infringement since the primary purpose of these models is not infringement but unique creation, but suspicion will linger by artists and it will become a normal part of contracts in the art word…effectively an artist equivalent of the way police in many places view spray cans… just as the primary purpose of spray paint is not to create illegal graffiti, which is the justification many places used to overturn poorly justified civic bans on possession of spray paint.
I’d like to see any more draconian spread of derivative work rights (style rights etc) to be accompanied by drastic reductions in the automatic copyright term, as the ability to churn out lots of automatic content drastically lowers the value of long long terms, and the counter argument that it makes the existing rights more valuable is fucking insane as we do not need to pass copyright down to the great-great-great-great-grandchildren… the terms are already too long.
What’s great about your comment is that you show that what we need the most is to reduce the power of copyright in such a corporate-centric legal regime.
Personally I’d like to see the right to train statistical models on any works without the permission of the author enshrined in statute and an end to common-law copyright, a return to the Statute of Anne 14/28 time length, and a clear delineation between the “work” as having an author for an eternity but having a “copyright of the work” vastly limited in scope.
Ask yourself, do we want to be extending the reach of large copyright holders like Disney into taking a fee from LLM producers because they COULD be helping people draw Mickey ears on their private creations?
This is Betamax all over again and luckily that Supreme Court opinion will favor heavily in the lower court’s judgement of these models as fair use.
It’s strange to me that there’s a lot of overlap between people who think AI training should require explicit consent for every piece of training data, and people who think copyright and patents are insanely restrictive in the music/movies/literature/software world.
It’s also worrying that requiring consent to train an AI model will inevitably lead to requiring consent to make handmade art that’s a little too similar to some other existing artwork (ie, how all art works through reference, training, and inspiration). A world where Getty and Disney control even more than they already do.
All or nothing, in my opinion. Either abolish or severely reduce copyright, or abide by it.
The simple fact of the matter is that Disney and Getty invest a lot of money into these materials being out there in the first place. Open source programmers and artists spend a lot of time producing works for no cost other than some minor courtesies.
AI companies aren't your friend or the little mom ''n pop shop down the road. Their technology giants backed by billionaires. When it comes to Disney versus Google/Microsoft, I'm against both sides if it means giving up my rights.
Big AI taking your stuff and ignoring copyright law isn't some kind of protests against copyright, it's the very opposite; it shows that copyright doesn't matter if you have the money to defend yourself in court. Violate the the MPAA's copyright and you get extradited, violate some random person's copyright and you should feel honoured that people even want to steal your work.
In my opinion, the idea behind the current copyright system works fine if the terms weren't so ridiculously long. Restrict copyright to five or ten years and I'd be fine with the whole thing. This "70 years after the death of the author" crap is the biggest stifle on copyright adds.
> All or nothing, in my opinion. Either abolish or severely reduce copyright, or abide by it.
I firmly believe in "practice what you preach". I you declare you firmly believe in A but then do something directly counter to that because it's more convenient in this specific case, then that doesn't sit right with me.
Besides, further expanding copyright in this one area will only make it so much harder to reduce it later. And the pro-copyright folks will be able to say "you say you want less copyright, but you vigorously advocated in favour of copyright then, you hypocrite!" (and they wouldn't be entirely wrong, either). All this effort and energy fighting ML tools would be better directed at reducing copyright instead.
I don't disagree with your view on corporations. Do I like what CoPilot is doing? Not really. But at the end of the day: does CoPilot's or ChatGPT's mere existence really take away anything concrete from me? Am I harmed or even inconvenienced by it? Are my rights reduced? Is my code harmed by it? Is my income reduced? I don't really see how it concretely affects me, other than a general "feeling of unfairness".
And I see real risks with all of this: most regular people and small businesses don't have the resources to litigate as it's expensive and time-consuming, so a "license" that you or I slap on a piece of code is, realistically speaking, just ink on a piece of paper. GPL violations are rampant, violations of other licenses probably happen even more (but people generally care less about that, so not as widely publicized). Who will benefit with more copyright law on their side? The ones with deep pockets and many lawyers on retainer. i.e., the corporations neither of us like. Think creative new copyright lawsuits such "we claim copyright on the Java API" kind of stuff.
I can’t speak for everyone, but personally I find that copyright can be used properly or abused, at both sides (holder/consumer). It doesn’t mean that copyright is bad, only particular caregories of claims and usage are. But abusing copyrighted material from millions of little creators at insanely automated scale is another level of evil, especially when they explicitly require consent for exactly this type of use.
worrying that requiring consent to train an AI model will inevitably lead to requiring consent to make handmade art that’s a little too similar to some other existing artwork
That’s the root of misunderstanding, afaict. We can agree that at-scale processing is bad and that fair use is still okay. A human with a pen (or a text editor) can’t damage copyright at scale by learning terabytes of material in few weeks and producing the same amount in hours, so they can be excluded from this. Humans who use AI can, so they’re a target.
Look at GitHub training it's models on other people's code.
They aren't training it on Microsoft or GitHub code.
> A world where Getty and Disney control even more than they already do.
This is exactly what is currently happening though, it's okay to rip off the little guy artist or coder. The argument here is that one big guy stood on another big guys foot, and as the little folks we shouldn't stand for it either.
I don’t personally think it’s that strange or internally inconsistent. People seem to be saying “within the current copyright system, consent should be required, but I still think the copyright system is broken”.
Copyright is a good thing. The issue with copyright is that it has been extended too long, from 20+20 years (the latter of which is a manual extension) to 95 years (for publication) and life+70/80/100 years for the author. -- I understand extending copyright to keep up with extended lifespans, but it should be something like 30+30 or 40+40.
The copyright terms means that for Life+70 countries only works where a) the author died before 1953 and where the works were published before 1928 are in the public domain.
An AI training on a given work should comply with the law and with copyrights, just like anyone else. It should also respect the license or other terms the works were released under. -- You could easily silo the data by license, and have a different model per license.
Patents should be a good thing (they allowed inventions to be published instead of being kept secret). However, it is easy for large companies to get patents on trivial things, write overly broad patents, and collate a large number of patents in a domain. That means trying to innovate or compete in a highly patented field like audio or video compression is difficult.
I'm anti intellectual property, but as long as people have to abide by it, I think AI has to, too. Being anti intellectual property doesn't mean I'm in favor of corporations stealing open source code, it means I want the law changed.
I believe that adding data to an AI training data set should require explicit consent, and I derive that belief from being wrong about copyright laws in the 00’s. The artists that tried to stop filesharing were right, and the total collapse of musicians’ livelihoods in the streaming era proves them right. We have the chance now to correct the mistake we made then.
Not sure what you intended to imply, but I don't think those two consents are related that much to worry about. Copyright licenses usually are written down like this: "[you are allowed to] use, reproduce, modify, adapt, perform, display, distribute" and so on.
When a new technology is introduced, for example the compact disc was invented, lawyers get to poke whether that "distribute" applies to the music CDs, or just to vinyls and music tapes (because at the time of granting that license, CDs weren't yet a thing! gotcha!).
The answer to this conundrum might vary in different countries, and we can have fun discussing that in the context of AI, but it does not affect how handmade art shouldn't be too similar.
Exactly this. You can hold that the current copyright system is good and fine. That's a moral position that is, in my view, entirely deluded but internally consistent and not really worth having a discussion about. People who conclude that look at the world through a fundamentally incompatible lense for discussions between the other set to be productive.
Or you can (correctly) think it's a huge drag on innovation and human progress.
If you think the latter then hoping in this case for legal precedent to broaden the scope of copyright enforcement is just bizarre logic. This isn't a rule that already exists as such. The case will set a precedent (based on interpretation of existing law) for the future.
Right. Not that long ago artists were rallying against Disney and the Music Industry for weaponizing Copyright against Artistic Freedom. Now it seems some artists have decided to use the same tactic.
It's probably those same people who publish their code under open source licenses instead of giving it to the public domain. I don't understand why people cling so hard onto every worthless little bit of code they write while also sort of half giving it away to almost everyone for almost any purpose.
First of all, don't pretend that copyright depends on how much "worth" a copyrighted piece of work has. Not just the works of Prince and the Beatles deserve copyright.
Then, a lot of open source authors understand that their works are not groundbraking inventions and want to share them with the world without any fees or costs. And others have groundbraking inventions and still share them for free with the world.
But there are almost always license terms attached to the piece of the work. They can be essentially non-limiting, like public domain or MIT. But they can also enforce some minimal requirements like attribution. Why should any entity, especially huge corporations, not be bound to those conditions? It was mostly those corporations which created a restrictive copyright. Try to draw an image of Micky Mouse and put it on your website and see what happens.
People who made or bought out the content get to have the content. I don't necessarily see the problem with that except for the ridiculous amount of time copyright remains valid.
If a company invests $250 million into an original movie, I don't see why they shouldn't have some say over their content for at least a couple of years. Not until 2150 or whatever the end date for modern works is supposed to be, but give it some time at least.
OpenAI is the result of billions being thrown around. When it comes to billions, it doesn't matter if they come from Disney, Google, Microsoft or Amazon. None of these companies have our individual rights at heart, they only care about profits.
In this rare occasion, the interests of the people and Disney align. The laws protecting the independent writers/programmers/artists are the same ones that protect Disney.
The tools themselves work on arbitrary data sets. Anyone who can dig up enough public domain/attribution free pictures/code/text can train their own AI without even coming close to copyright issues. Hell, had these super smart AI people managed to find out a method of attribution, the data set would include massive amounts of works released under Creative Commons or open source licenses.
I don't see any of these companies caring about artists, but something like Stable Diffusion is more of a weird accident. Like how IBM managed to create a platform with that became an open standard, something almost diametrically opposed to their own corporate values.
Good quality data and more of it means better output. Disney is almost certainly doing their own thing internally, benefiting from their ability to use both free as well as their own IP and the capital to hire cheap workers to train it directly.
It's not that I don't understand why artists might be upset about a company scraping copyrighted art, I just think that the longer term effects of legally kneecapping open source variants while handing over the most powerful versions of it to the existing intellectual property giants are A Bad Thing.
> but what the customers of AI people want isn't available under those terms.
What the customers of AI want is accurate predictions of the models, and they can get that even if everyone demanding to get removed from the training set would be removed.
The makers of generative AI could remove every living artist who wants to from the dataset, the model would still develop a general solution of color theory, composition, almost every artstyle in existence, ... because fact of the matter is, there is just that much data out there. Our species collectively has spend DECADES recording, storing and categorizing everything and the proverbial kitchen sink. There are god-knows-how-many petabytes of data available in images alone, so even if just 1% of that could be used to train generative models, it would still be more than adequate.
And soon after that, there is an explosion of new generated art, filtered through the aesthetic sense of millions of humans, that can just be fed back into the models, to make them better.
The end result is the same: High-quality image generation on a scale hitherto unseen, running even on consumer grade hardware. And what lawsuits will be filed then?
I swear when I see this argument because it makes me angry.
You’re right, but they didnt, because they were too lazy and cheap to do it that way.
…and that’s why people are angry, and rightly so. Fully licensed models are the future, and it’s both irritating and disappointing that we are where we are right now because the people training these models were too lazy to assemble a training dataset that wasn’t problematic (ie. full of porn and copyrighted material).
You can argue the “but at the end of the day it’s all the same…” argument if you like, but clearly from the lawsuits it isn’t ok
They’ve completely messed it up.
There’s a reason the openai api terms of service says that “the Content may be used to improve and train models”; they’re setting themselves up to have a concrete defence for the source training data for their models.
Good job.
Stability can burn in a fire. They’ve really trashed the reputation of generative AI in a way that is going to be very difficult to recover from.
No one outside some artists is going to give a fuck about the legality. Joe Schmuck out there is too busy either not knowing this exists or making funny pictures of Han Solo eating a banana on a toilet made from the skin of Yoda.
That reputation damage you think matters doesn’t exist.
This is the only issue I have with generative image models. I'd be using them myself right now but I'm too disgusted by how the sausage is made. Once the first licensed, properly sourced models are out, they will get my money or time.
How is it much different from a search index? it’s just a new interface to get at some info, rather than Google and Firefox, it already pre-browsed the web for you, and is displaying back the content. If the end user gleans some actual copyrighted work from the search they still need permission to use it, but it’s also likely it’s just a derivative, or the end user is just reading an example and learning from it at consumption time. Is a web crawler violating copyright? Or is it the user who sells a copyrighted image?
A search index usually links to the source. Without that a search index is worthless, you can't use content if you don't even know where it comes from and who holds the rights.
Google search links to sources like Wikipedia in its info boxes, because without that you can't know whether the info is reliable or sourced from my brother's coworker's imaginary flat-earther friend.
It's not displaying back the "content". It's training a model with statistics based on the writing that was either paid for by a site publisher in the hope of earning ad revenue, or contributed to the community for free.
If a model was to add attributions to each of its answers, then perhaps the search engine analogy would hold. But, they don't (and right now, to my understanding, can't.)
The AI art models are scraping imagery made by human effort and skill and then more humans label and tag it so it can be indexed, ( because you can show a computer an image all day long and it still won't 'learn' what it is unless you tag it) and then another human puts in their wishlist of art they want (without any cost or effort to learn skill) and the 'AI' displays back a collage of the content from the wishlist. And then presto all sorts of merchandise are available with this art taken without permission from the people that made it. People do the physical action of creating imagery, AI indexes it.
I’m going to wager that the percentage of people in our society who would like to extend copyright to restrict these tools, favoring the needs individual copyright holders over the needs of the public domain, is much smaller than you realize.
By the time this reaches judgment and goes through the appeals process there will be a vast industry of non-infringing uses that are clearly transformative and in fair use (Sony v Universal)
You cannot say that the person using ChatGPT to control the lights in their garage is infringing on anyone’s copyright in any manner whatsoever. The point of copyright is not to gain a permanent monopoly on certain speech. The point of copyright is not to make sure that people are fairly compensated for their work. Their work might be terrible but contain a good idea that is later reimagined in a better way (Baker v Seldon) but that’s for the market to decide.
The courts will probably concur that these models are fair-use and I will agree with their judgement.
Good point! The market isn't exactly trying to answer my parent's question. But it's reasonably close: the litigation is unlikely to be dropped if it would win.
It’s up to the courts to decide if this is a copyright infringement. The EU at least already allows the use of copyrighted material for research purposes into text and data mining, so the main question will hinge on whether or not the result of such research can be commercially exploited.
Oversight? Hardly. You don’t seem to understand the purpose of copyright to begin with.
The purpose of copyright is to progress science and useful arts. Period. Any action taken in the name of copyright that does not progress science and useful arts is unsupported by law.
What else do we know about copyright? A copyright can apply only to creative expressions. While the bar for sufficient creativity is intentionally low, it is non-zero.
Another thing we know is that purely functional expressions are not copyrightable. When does an expression go beyond being a function expression to a creative expression? That’s up to a judge. Since code is math and math, by itself, cannot be copyrighted, when an expression reaches the level of creative expression must be beyond the math. Updating a database field, factoring primes, or using data correction algorithms are not creative expressions.
Now for AI. Only humans may own copyrights. The output of an AI is not copyrightable. But what if the input was copyrighted?
When it comes to software code, AI will value expressions that are commonly used more so than uncommon ones. But software code is, by it’s very nature, an intertwined collection of copyrightable (creative) and non-copyrightable (functional) expressions. If AI values commonly used expressions, those expressions are highly unlikely to be creative enough for copyright protection in the first place.
So we have a circumstance where AI is trained on copyrighted but Open Source code. Yet the code itself is comprised of both creative (presumably) and functional code, with no clear delineation of what is and what is not protectable.
Lastly, many authors do not understand what constitutes a creative expression that is protectable by copyright. The amount of work required to create the expression is meaningless. Manipulating data to thresh out something interesting is not creative. Let’s just face it that most software is comprised of mostly functional expressions that are not protectable. Back to that “math” problem again!
The big take-away? The purpose of copyright is to progress science and useful arts, not to build walls around ideas and concepts (which, by themselves, are not protectable).
>It's very possible that a judge will rule that AI models do not violate copyright
Would that mean you can simply use one AI (or more) from anyone else to train another AI?
Of course access can always be limited to an API with rate limits and per-request costs, which would make it difficult to straight up copy the whole thing, but it would be hard to justify any legal protections against it.
These things should cough up money for the authors whenever something is generated with their data or make the whole thing (source code and models) accessible to everyone. So easy :D.
which would be easy during a law suit - the process of discovery means you get to check out the training dataset.
The allegation isn't that the AI trainers are hiding, but that what AI trainers are doing _itself_ constitutes copyright violation. AKA, they want the right to use the works to train an ai model to be a right that must be explicitly granted.
>AI companies can ask for permission if they want to train their models on other people's works. It's not that hard
Yes, and then people say "no" or "pay me". End result of this is that the only ones with good AI models are megacorporations that will DRM the heck out of it.
Years later those same artists will complain that they now have to pay $1000 a year to Disney/MS/Adobe to create art. Because these megacorporations can afford to pay for it. They're the ones that will benefit the most from this, because it creates an insurmountable moat for them.
Copyright exists to encourage the creation of more art and to progress science. AI is clearly a helpful step in that direction. Humans learn from others' works. Should we make that illegal too?
> Copyright exists to encourage the creation of more art and to progress science. AI is clearly a helpful step in that direction. Humans learn from others' works. Should we make that illegal too?
I find it astonishing that people continue to make this argument. A machine is owned by someone, a human is not. Why should the law treat machines the same way as a human? Sounds like some corporate flim-flam to me.
Copyright is not about protecting people. The purpose of copyright is:
>To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;
The purpose of copyright is not to protect the authors, it is to promote the progress of science and art.
The current situation for AI image generation is pretty much the only way these technologies will be available to everyone. Most other paths will simply lead to billion dollar corporations acting as gatekeepers to this technology. Megacorps can afford to hire artists to generate specific art for their AI models, everyone else cannot.
My point is that this quasi-legal argument about how “humans learn so why can’t machines learn” is a non sequitur. This is a case where quantity has a quality all its own. Copyright law was not invented with anything like this situation in mind.
You end up with billion dollar corporations gatekeeping this technology either way (who else has the capital to best train the models?). This isn’t about the little guy.
>Why should the law treat machines the same way as a human? Sounds like some corporate flim-flam to me.
It shouldn't, that's why arguments that the algorithm is learning, so it's doing the same thing that is legal for humans to do is completely fallacious, on top of it just being anthropomorphism.
The fact of the matter is that the AI companies don't want to ask for permission, because people will say no. Or worse, ask for attribution or even payment. There is plenty of copyright free/public domain material out there, but what the customers of AI people want isn't available under those terms.
The code to train an AI is not enough to make a product and these people have nothing to add themselves, so they take what others made and use that to make a profit. They can make or pay for their own paintings, their own pictures, their own music, but that would require putting in too much work or paying too much money.
It's very possible that a judge will rule that AI models do not violate copyright. If that is the case, I hope new legislation will correct that oversight very quickly.