"But Defendants’ LLMs endanger fiction writers’ ability to make a living, in that the LLMs allow anyone to generate—automatically and freely (or very cheaply)—texts that they would otherwise pay writers to create"
This kind of luddism sees copyright as a way to enrich rights holders, as opposed to "promoting the progress of science and the useful arts".
It appears this lawsuit is complaining that ChatGPT can write fan fiction and they don't like that.
I was onboard initially thinking we were talking about OpenAI ingesting Game Of Thrones as training material, but it appears George et al are just mad because it can make stories with their characters.
This is far from the authorship/copyright problem of AI.
If you read the claims to relief (start on page 44 of complaint) it's mostly just standard copyright infringement during training. The claims about ChatGPT writing works that infringes their works seem to be to be to try and head-off a fair use defense. One of the tests for fair use is the effect on the potential market for original work.
Theft is the word used by their lawyers. Seems fair to use in the title. The differences between theft and copyright infringement isn't important in this case anyway.
Either way, it will be interesting to see how this goes. Lots of weird arguments on both sides, so it will be interesting to see what rulings we get.
Here's a point that I struggle with. Let's imagine a point in the future where technology has progressed to the point that a machine can become "assisted memory" for a human. This could be useful for degraded memory conditions or even just to buff up human capabilities. In this scenario how do we deal with licensing and copyright? The "memory" is trained on books, artwork, etc and then human intelligence accesses that computer aided memory and constructs something new.
Seems like this lawsuit could set precedence that the future I describe would not be allowed.
I think it depends on the use. The kind of AI you're suggesting wouldn't need to ingest the entirety of every book and movie you've seen, it would just need to store a summary and a location for the original file.
It's legal for me to make digital copies of copyrighted works I own so long as they're not redistributed. If I have a local AI maintaining my personal data for reasons other than generating competing works, that shouldn't involve copyright at all.
I think the question boils down to what is fair use in these situations. I actually think it would be useful to have a public effort to build a pan humanity model that claims fair use for public benefit and consumes all productions of humanity.
What specifically hurts OpenAI is the monetization and commercial benefit from other people’s work. They also fall clearly in the space of civil claims of copyright violation, if not criminal (however the use of models to produce copyright materials is likely criminal infringement). (IANAL, but a law professor friend made these claims to me, YMMV)
I think this is all mostly huff and puff and frankly going to be irrelevant. At some point anyone with a decent enough home computer will be able to train their own models and use them as they see fit. Any law that tries to stop that is going to be stifling and unwieldy in the extreme.
And it'll be cross border, so even more difficult to enforce.
How is it different than a search index? It takes existing content as an input, processes it and then outputs data structures from it. Those data structures are then used to power full text search.
A LLM does the same thing but instead of search results it emits a stream of tokens.
DMCA 2024 - A nice big report button for when generated content is too close to copyrighted content. It is then on the AI company to supplement the training materials around that content, to dilute the generation of content that could be seen as infringing. So instead of George RR Martin prequels with the same names and characters (because of a lack of training materials), it generates something more generic for the input prompt.
The actual complaint is about using their copyrighted works in the training of the LLM without a license. OpenAI is claiming it's fair use, the authors disagree. It's going to take a ruling from a judge to get clarity on the issue, and no matter what it'll be appealed until it hits the SC.
That's what discovery will be for, the complaint alleges that the likely source was libgen. Most of these authors haven't released DRM-free ebooks, and it seems unlikely that OpenAI has a large scale book scanning effort (and even if they did, that authors would likely claim that to be infringement itself.)
What if it never accessed the book, but read everything relevant like episode summaries, fan wikis, and forum discussions? It would still be as conversant. Is it still infringement?
Ideas have never been the scope of copyright and it wasn’t in its democratic mandate. If creatives want that change, fine, advocate for a change of the law
This isn't about ideas, it's about a specific individuals work given that the reproduced text lifts literal characters out of Martin's book. That has always been covered by IP law. Canonical example, you cannot write a novel about Harry Potter, you can write a book about a wizard going to a magical school.
If a model generates large amounts of text that is very close to something you've written, because there isn't much else like it, how is that "inspired"? It needs more dilution.
We would have to change the law to allow the kind of ‘inspiration’ you are talking about, which is why there are multiple lawsuits here. That’s what OpenAI is asking for - redefinition of ‘fair use’. NNs aren’t copying ideas, they train on what copyright calls ‘fixation’ - they deal with text, audio, and pixels, not ideas. We keep hoping and looking for understanding in the NNs, but we have ample evidence that they don’t actually understand much, if anything, they are just really good at copying in a way that make understanding seem plausible to the layperson.
It’s a good idea to make this easier to report, but… shouldn’t it be on the AI company to train using legally acquired content in the first place? It’d be great if the training data was opt-in and curated. Wouldn’t that be better than a shoot first ask questions later policy? There’s definitely room to improve copyright and room to allow AI to exist, but do we really want to allow AI to ingest all copyrighted material and call it ‘fair use’? That would be giving them a ridiculous and unprecedented amount of freedom to take any and all content and turn around and auto-generate enough to obsolete the people who made the training material. It seems like the race is on to supplant Google as the portal for information, and it does feel like downloading everything in the world and then crying fair use after the fact is wishful thinking that more or less admits to copyright violation.
>shouldn’t it be on the AI company to train using legally acquired content in the first place
I don't think so. It's not illegal to look at or learn from copyrighted materials. If you start producing the materials it becomes a different question. I think the same applies to AI.
Your argument doesn’t work because OpenAI has admitted that ChatGPT is producing copyrighted material. They’re trying to carve an exception for AI, but have already acknowledged that training does copy the materials, literally, and that it does not “learn” from the the same way humans do. The intent with AI may be to remix them, but the whole reason there are multiple lawsuits here (as well as with Stable Diffusion and other NNs) is because they have repeatedly demonstrated they sometimes memorize the training data and can produce it more or less verbatim. They have violated current copyright law. In that light, we have two primary options: change the law, or enforce the current law. OpenAI is hoping to change the law, but whether they have copied some training data and produced it for the output is not even up for debate, this is already the different question you referred to.
Or disagreement anyway, about how comparable photocopiers & copyright are to generative models and protection from unauthorized automated style reproduction.
How I look at it:
1. In both cases, reproduced copies or reproduced styles, automation destroys economic incentives for creators to make any sustained effort.
Without economic protection, it isn’t even a question of less motivation. Creator’s like everyone else need to eat.
2. So we protect creative works from complete copies in order to have more creative works.
And it is primarily about automation and mass reproduction.
Nobody is worried about people hand copying Atlas Shrugged.
3. But we also protect copyrighted works from partial copying.
Only copying chapters 1-3? Not allowed
Only copying the plot but changing all the names, locations, fashion and colors? Not allowed.
4. So now it turns out a different substantial part of a work can be copied via automation. It’s style.
Well if you can protect a works plot from automated copies, why not a works style?
It is a substantial piece of a creative work.
Reasons for protecting style come down to protecting any major part of a copyrighted work.
The only thing different now is we have “style reproducers”.
So we have to decide, is this essentially the same situation as copyright addresses, or not?
5. It is.
The exact same trade offs between protection and incentivization exist for extracted & mass reproduced style as they do for extracted and reproduced plot.
How many books written have taken some style or concepts from other books?
Stranger things takes liberally from a lot of Stephen King as Spielberg elements not outright but in spirit and tone, why isn't Stephen King suing the Duffer brothers for reading his shit and coming up with ideas for books based on that?
One is that basic plots are copied all the time and there’s a meme that there are only seven basic plots. Of course there’s much more variety at the detail level.
Was Sword of Shannara pretty derivative of Tolkien? Yeah. But I assume it was pretty far from a copyright violation.
So 7 basic plots. But an actual plot for an original story isn’t just a basic plot is it? It’s an original work.
Movies are sued all the time for copyright infringement due to substantially copying plot and character elements. [0]
Because these cases tend to each be unique, the line between infringement and non-infringement gets settled very much on a case by case basis.
As a result of this inherent unpredictability, most cases involve the accused settling with the aggrieved party to get the lawsuit dismissed.
This is common in many areas of civil law.
A few examples:
1. *"The Island" (2005)*
- Accusation: Similarities to the 1979 film "Parts: The Clonus Horror."
- Outcome: Settled out of court. [1]
2. *"Frozen" (2013)*
- Accusation: Claimed similarities to a short film named "The Snowman."
- Outcome: Disney settled the case. [2]
3. *"Coming to America" (1988)*
- Accusation: Art Buchwald claimed the movie was based on his script.
- Outcome: Paramount settled for an undisclosed amount. [3]
4. *"The Terminator" (1984)*
- Accusation: Harlan Ellison claimed it was similar to an episode of "The Outer Limits."
- Outcome: Settled out of court, and an acknowledgment was added to later copies. [4]
5. *"Disturbia" (2007)*
- Accusation: Accused of being similar to Alfred Hitchcock's "Rear Window."
- Outcome: Initially dismissed, but a settlement was reached. [5]
Authors currently don't control who or what reads their works, of course.
Personally I currently feel that (at life +70 years) the copyright pendulum has gone too far towards the rights of publishers (not necessarily authors) as is.
That said, I'm open to good arguments to change my mind. Why do you feel that authors should be given this additional right to control what is used for AI training? What would be the public good or public trade-off here?
And they don’t control fair use or promulgating the ideas in the book. That Wikipedia article summarizing the key contents in an editor’s own words? Perfectly legit.
What copyright buys is that no one else can distribute verbatim copies of large amounts of your work. But a lot of other uses are allowed.
It covers the expression of ideas. Which in the case of a book is mostly the text as written. And, yes, doing some substitution of character names etc. may still violate copyright but you certainly can’t keep me from writing an article about the main points you make in your book.
> What would be the public good or public trade-off here?
Consider the aesthetic landscape where creators do not have control over whether their work is used to train an AI versus one where they do. It's hard to predict with certainty, but my model is this:
No control: Anyone's work is fair game to be trained. If I want to make a prompt of "A graphic novel in the visual style of Moebius, written by Stephen King, set in Westeros" I can get something based on King and Martin's actual words and Moebius' actual drawings, without compensating them. Neat! However, potential new novelists see that quality novels can just be churned out for free or low cost and so, actually sitting down to write a new novel becomes a niche, geek thing to do. There's no money in it. These new novels just get thrown into the ml bin, fodder for the next version.
With control: novelists and other creators know they can make money from their work because they can make business decisions about how and when their work trains a model. We all get to see more new, professional quality creativity. Those who want to read Conan as written by Lord Dunsany can still see that, since those works are in the public domain.
Training AI feels inherently commercial with intent to commercially distribute, which seems to be licensed specially in most domains.
In a sense it's like driving down the highway with a duffel bag of cannabis flower in a state where possessing and traveling with few ounces is no problem--something commercial is probably happening. Why is that prohibited? Perhaps another debate, but just trying to connect the implied intent aspect.
If an AI were being trained for strictly academic reasons then I'd agree with fair use and that type of arguments. But if the AI itself has a subscription fee, then whoever is subscribing is also probably using the work for real or anticipated commercial gain. Hence investing money.
True that hobbyists spend money with no intention to gain commercially, and we may do that at a higher than average rate as tech workers because we have usually have a decent amount of excess money from our work. But money is pretty scarce to most people and businesses with set non-investment budgets, so if they're spending it on AI there's little doubt it's with commercial intent.
So in conclusion, I do think there's both merit to the authors' case related to intent to commercialize and room for doing unlicensed non-commercial AI training.
What is inevitable is that copyright is dead. Do you think China will respect western copyrights? They already don't. People would just use LLMs hosted elsewhere.
This is a good thing. We're going to see an explosion of indie games, movies, and more that never could've been made before by a single, dedicated person.
I also tested this with Harry Potter books (which are still not public domain AFAIK). It starts auto-completing the correct words quickly, and then hangs and eventually stops producing output. You can call the API to generate more tokens, but it stops again after producing a few more correct words.
I think for a few high-profile authors (for books like Harry Potter and Game of Thrones), OpenAI probably installed some output filters in order to not get sued too hard. Of course, I can't definitively check that without access to the raw model. Which OpenAI conveniently doesn't provide.
The complaint mentions LibGen, Z-Library and Bibliotik, and Sci-Hub in a footnote.
Thought experiment:
What if every person who dowloads materials from the above sources claimed that they were doing so only to "train AI".
Many such persons who download from those sources are probably doing so for noncommercial purposes, for example, academic research. Whereas, according to this compaint, OpenAI "intend[s] to earn billions from this technology."
Not disagreeing with your end conclusion, but surely the concept of a limited company exists exactly to have a distinction between legal entities, some of which are not humans but may still violate laws. Take it this way: if the EU fines a company for GDPR violations, it doesn't really fine an individual. Perhaps no individual broke the law explicitly, but as a collective the end result is a law violation.
Technically yes, but how that is handled is up to the country. In the US, a concept known as "corporate personhood" exists. A strange concept, because if a company murders someone, the company, nor any of its executives, go to prison.
GRR Martin, the author of Game of Thrones, had the audacity to join this lawsuit. The only thing I expect from AI in this context is NOT to reproduce the shitshow GOT ended up to be.
I'm wondering what will the authors do if we develop AIs that are able to find new artistic styles that are not in the dataset ? Would it still pose problem to use their content to learn how NOT to imitate them ?
Seems it is possible in collaborative filtering:
> Yes, in collaborative filtering, finding empty classes is possible. To recommend items for these gaps, utilize adjacent class information or employ techniques like matrix factorization, content-based filtering, or hybrid systems. These methods predict preferences based on observed patterns, similarities between items, and user preferences, filling in missing data.
People misleadingly conflate these two concepts all the time, but no, human learning and machine learning are not equivalent.
If you want to learn to draw in the style of Moebius then go for it. If you want to train an AI on Moebius' work, you should ask permission from Moebius' estate, since they stand to lose revenue by your model.
What if I only train on licensed human art "in the style of moebius" but legally distinct - you can't copyright a "style". A lot of critics seem to be arguing for a conception of copyright that's far broader than actually exists as a subsidy for artists (by which everyone in effect means Disney et al).
You're right, because clearly anyone who takes an AI-generated Moebius-style image and puts it on a blog post about message queues or a recipe or whatever would have just bought a licensed Moebius image otherwise.
I'm replying to "you should ask permission from Moebius' estate, since they stand to lose revenue by your model" - there are no damages. Whether it's infringement or not is not really relevant to that point.
Why do we think that the moebius estate deserves that revenue? Why is that the optimal state of the world, rather than just a capitalist incentive to create more art?
Your issue is with capitalism apparently. Like it or not, creators rely on copyright to make a living and have an incentive to create more, and quality, art.
Copyright protects the work, not the thoughtspace. Unless the LLM recreates their work in a form similar enough to be legally defined as the copyrighted material, there is no copyright issue, period.
Starting a legal precedent that copyrights spreads like fungus and encompasses any thought related to the copyrighted work seems like a horrible idea, one that can only lead to a dystopian future.
> Copyright protects the work, not the thoughtspace.
I think that is a good argument, but conflates is and ought, and I have two counters:
copyright owners can dictate how their work is used (with some exceptions), and if that use hurts the copyright owner, the owner should have the right to forbid it.
the intent of copyright is to reward and encourage creators and creativity. If a script kiddie can just train a model and duplicate the hard-won aesthetic work of Molly Crabapple or Ralph Steadman or anyone at all, and either dilute the value of it or actually profit from it, what is the incentive for creators to create new work at all?
> copyright owners can dictate how their work is used (with some exceptions), and if that use hurts the copyright owner, the owner should have the right to forbid it.
Consider a poet who publishes poetry in some unique meter, or has some other unique stylistic structure for which they are well known... should they be allowed to sell copies of their poems that can be used for reading only, but prevents usage of those stylistic devices by other authors?
I'm going to assume that we agree the answer is "no, the author should not be able to prevent those uses" at least for human consumers of their works. This is how art has always developed... even though that use "hurts the copyright owner" by diluting the market for works with that style, the owner does NOT have the right to forbid it.
Now, let's say that same poet drew a lot of inspiration from a bunch of out-of-copyright poets. Let's also say that I train an AI model on the poet's inspirations, but NOT on the poet's work directly. Then I ask the AI to write a poem in the style of the poet's inspirations, and to include the unique stylistic device for which that poet is famous. In your world, is this OK?
> Consider a poet who publishes poetry in some unique meter, or has some other unique stylistic structure for which they are well known... should they be allowed to sell copies of their poems that can be used for reading only, but prevents usage of those stylistic devices by other authors?
I don't think this is a fair analogy. Unfortunately, analogy breaks down because the technology is unprecedented. So, to answer directly, no the poet cannot copyright the unique meter, but, no, machine learning is not that.
If you need an analogy, think copy machine, not human learning. An LLM can only regurgitate that which it has seen before. Absent the poet, the other poets can still make other poetry, but the LLM literally cannot make poetry that it has not seen before. If it produces a poem with that unique meter then it definitely copied that poet, and was not "inspired by" the poetry. If you wrote poetry inspired by EE Cummings your process for doing that would be very different from an LLM's, which would programmatically use his material.
What about the second part of my post, where the LLM has NOT been trained on the specific meter, but it does have some "concept" (maybe not the right word, but bear with me) of what meter is, so the human prompter can say "write a poem about subject S, with meter M" and get something in the style of that poet, without having been trained on it... sounds like you're OK with that scenario?
Full disclosure: I think I probably disagree with you on some points you've made in this thread, but I'm not going for any gotchas right now, I am just trying to map the contours of what you think is OK and not OK. We're all sort of flying blind on this stuff, so getting a sense of what others are thinking is really important in my mind. Appreciate the engagement.
I think you're coming from a fundamental place of misunderstanding. LLMs don't just regurgitate what they've seen before. After you understand how they work, I think the rest will become clear to you.
This is argued by the people that say it's akin to human learning, but then turn around and say we don't know enough about human learning. it's an utterly fallacious argument
> copyright owners can dictate how their work is used (with some exceptions), and if that use hurts the copyright owner, the owner should have the right to forbid it.
Like if I read their books and then write better ones in their style, undermining their profits?
Copyright doesn't let you stop Nazis (or insert objectional set of people of your choice here) from reading your books or seeing your art if they obtain a legal copy. You can't just say "I forbid it!". Why should we create a new restriction on our freedoms to allow for that?
Copyright absolutely allows you to control who you sell your work to, who you license it to, and who they can sublicense it to. Copyright allows an author to control who displays, reproduces, performs, etc their works. The misunderstands I've seen about copyright here are at times shocking in how incorrect they are, but seem to be consistent with a lot of the positions taken by the posters that express these understandings. For example, someone upthread said that copyright only protects against verbatim copying!
My issue is not with capitalism but with assuming the present rules of asset ownership are optimal! If we could make housing free - conjure it out of thin air - it would be really bad for landlords. They rely on that income to make a living!
We should still obviously do it. More of a thing that people want is usually good.
Conjuring things out of thin air also tends to have side-effects, and it's better not to stop at the first-order effect of an action before going ahead and "just doing it". Concretely with content generation: if the disregard for copyright leads to a world where people no longer make the effort to produce and think about new things, the only things that you will consume will be produced by AI. Reminds me of The Matrix :-)
Conjuring things out of thin air does not have side effects because it is not possible.
The whole point of the phrase was to describe a hypothetical situation with no side effects to avoid sideways arguments about "but actually here's some bad things that would happen if you did that unrelated to the central argument".
I actually agree completely with that; my initial reply doesn't quite put the focus where it needs to be.
So let me try again: in my view, you shouldn't think of policies or laws impacting real people by ever placing yourself outside of the reality in an ideal case, because the hard bit is not conjuring ideals but finding a way of making them happen.
It's always a lot messier where the rubber meets the road. People have already died and suffered because ideals (specifically around asset ownership) that weren't quite thought through, but caught on. Take communism as an example.
Part of my point is that such "implementation details" are not as unrelated to the central argument as they seem. This is very different from the software world where it might be ok to assume that in 2 years we'll have the computations be 10x as fast and work out a solution backwards from there.
There is a distinct lack of capitalist incentive to spend your time developing a novel style only to have your style replicated for anyone to use, based on your work, and without permission.
In terms of automation used to destroy incentives for original creators, based on original creators’ works, photocopying machines and generative models are in the same quadrant.
Yes, it is value destroying for the creators. It cheapens their work. However, it gives millions more people access to their ideas. Maybe that’s better.
In this particular theoretical, it's not value destroying for the creator--he's dead and presumably has no more interest in money. Whether the heirs of his estate make more or less has no impact on the fact that Moebius will be producing no more art.
My entire career has been selling copyrighted works for good money. And I have been able to confidently do that for many years because of copyright protection.
So “it just delays the inevitable” means what? I have to give my money back later?
The transition from horses to automobiles is a prime example of how protectionism can't halt technological progress. When cars first arrived, they faced stiff opposition, especially from the horse and carriage industry. Some countries even introduced protectionist policies, like high tariffs on imported cars or regulations favoring horse-drawn vehicles, to shield their traditional industries. In the end the end user will decide or better said they already did in regards of generative art.
The inevitable is that your industry will change and the distinction between what is an "Artist" and an "Operator" will change even more. If most of your clients come to you for the end product you are most likely an "Operator". ie. "Do X like I want it to be done." if you are an "Artist" clients come to you because you are either in the Zeitgeist or people like your way of thinking and the process behind your art. The end product is collaborative.
If you are an "Operator" you will have problems in the near future if you are an "Artist" you will be fine. It's like in VFX and the mark the writers strike made on the industry the "Aritsts" are all fine because the studios want to retain them and the "Operators" are left on the street.
And I agree, requiring permission to use copyrighted works in model building will give legs to the current paradigm, but only delay the change.
However, AI is going to upset everyone’s apple carts, so anything that allows changes to happen more smoothly, less disruptively, is probably with the effort.
AI is likely to devalue all human labor, except for the provenance value of creations (creator, history, associations) and a preference for the human element (many personal services, or services with a personal touch element).
Imagine the kind of hellish world we'd live in if you could just ignore the law and remove the capacity for the market (and by extension all of society) to make human expression possible. Yes, event artists need to be given the possibility of being able to feed themselves.
Imagine what I could accomplish if I was a tech startup flush with cash unencumbered by the law. Surely a planetary hostile takeover would only be a few years and existential gambits away.
As of today, it is by no means clear that a law is being broken and the IP lawyers I know tend to think not. But the courts and perhaps Congress will decide.
I have no idea what you mean by laundering IP. Existing IP is built on all the time in ways that are or are not permissible depending upon the nature of the IP protection, if any, and the nature of the extension.
And sometimes things end up going to court, especially in the context of patents.
And lots of things that aren’t generally protectable like new artistic styles and techniques are co-opted all the time by other artists.
Ghaff is an intellectually dishonest poster expressing nonsensical views about copyright law (it only protects against verbatim copying) it is not worth the effort to undo his "bullshit asymmetry principle" https://en.wikipedia.org/wiki/Brandolini%27s_law
Human authors cannot read and perfectly memorize millions of books in a day, and are therefore not comparable to computers running machine learning software.
That's not how machine learning works. They don't "perfectly memorize" anything. They do learn much quicker than humans, of course, but that alone doesn't seem like a good argument.
You are looking at a Large Language Model (specifically GPT3.5) output full paragraphs from a book it was trained on. The prompt is the first 13 words of the book, shown with a white background in the first screenshot. The words with green background are the LLM outputs.
The second screenshot is a diff that compares the LLM output with the original book.
Frankenstein. You might argue that it is a public domain book, but it demonstrates that these LLMs can and do memorize books.
It also works with Harry Potter, but the API behaves weirdly and after producing correct output in the beginning really quickly, suddenly hangs. You can continue generating the correct words by doing more API calls, but it only does a few words at a time before stopping. It clearly knows the right content, but doesn't want to send it all at once.
I think there is some output filtering for "big" authors and stuff that is too famous that they filter in order to avoid getting sued.
Do you have a credible source that says an LLM like the ones trained by OpenAI can perfectly memorize millions of books? Or be trained in a single day?
It's provably impossible for a model with 1.76 trillion floating-point parameters (like GPT-4) to memorize millions of books?
How many bytes do you think a million compressed books takes? Consider that the way these models are trained is basically completing the next symbol based on the previous words, which is how most compressors are made.
Are you really trying to argue that there exist humans that can learn facts at a similar pace to a datacenter running GPT training software on petabytes of scraped data?
My point still stands.
"Your honor, I don't know human minds work, but clearly LLMs work the same way" The legal burden of evidence is on LLM proponents to establish that what they are doing is the same as the human mind and therefore should be treated the same way.
That's a bit of a false dilemma. To address copyright issues, we needn't prove that machine learning models learn exactly like humans or not. The more relevant point is that neither human learning nor machine learning has the intent to store or replicate copyrighted material; both aim to generalize from data to produce new content. It is in this way that they are similar.
It's the argument that is being made. Intent isn't a requisite for copyright infringement. Your re-characterization of the argument is so general that it's useless.
I wonder if this might be because Slavboj might be a monist/physicalist, and you might be a dualist[2]? If that's the case we'd all argue until we're blue in the face, if we don't at least recognize this underlying difference. For the record, since I've studied biology, I'm probably closest to some form of mechanism[3] (due to the rejection of vis vitalis [4] in early 20th c. )
Of course, you could also just be a very skeptical monist mindful of the kluger hans effect[5] and working from there.
Let me know which (if any), maybe we can still find middle ground!
It's like saying I can kill people with a swiss army knife so I should be able to own a nuclear bomb since it's _literally_ the same thing, besides the scale but that's a detail right :)
Imagine if, after having read and trained a single, bright student through university, you could clone them a million times and get their clones to churn out content for a penny per 4k tokens.
One solution to this would be to say this:
* The copying done while training a neural net is "fair use", similar to copying the DVD into RAM and onto the screen while watching it.
* The resulting neural net is a derivative work of all copyrighted works used during its training: No copying of the derived work is allowed without permission of all copyright holders
* The output of the neural net is subject to normal copyright laws for humans: i.e., it's only a violation of copyright if it's obviously a copy.
Basically, you can either train every instance of a neural network separately (like you have to do with humans) or get a license for all your training data.