Your issue is with capitalism apparently. Like it or not, creators rely on copyright to make a living and have an incentive to create more, and quality, art.
Copyright protects the work, not the thoughtspace. Unless the LLM recreates their work in a form similar enough to be legally defined as the copyrighted material, there is no copyright issue, period.
Starting a legal precedent that copyrights spreads like fungus and encompasses any thought related to the copyrighted work seems like a horrible idea, one that can only lead to a dystopian future.
> Copyright protects the work, not the thoughtspace.
I think that is a good argument, but conflates is and ought, and I have two counters:
copyright owners can dictate how their work is used (with some exceptions), and if that use hurts the copyright owner, the owner should have the right to forbid it.
the intent of copyright is to reward and encourage creators and creativity. If a script kiddie can just train a model and duplicate the hard-won aesthetic work of Molly Crabapple or Ralph Steadman or anyone at all, and either dilute the value of it or actually profit from it, what is the incentive for creators to create new work at all?
> copyright owners can dictate how their work is used (with some exceptions), and if that use hurts the copyright owner, the owner should have the right to forbid it.
Consider a poet who publishes poetry in some unique meter, or has some other unique stylistic structure for which they are well known... should they be allowed to sell copies of their poems that can be used for reading only, but prevents usage of those stylistic devices by other authors?
I'm going to assume that we agree the answer is "no, the author should not be able to prevent those uses" at least for human consumers of their works. This is how art has always developed... even though that use "hurts the copyright owner" by diluting the market for works with that style, the owner does NOT have the right to forbid it.
Now, let's say that same poet drew a lot of inspiration from a bunch of out-of-copyright poets. Let's also say that I train an AI model on the poet's inspirations, but NOT on the poet's work directly. Then I ask the AI to write a poem in the style of the poet's inspirations, and to include the unique stylistic device for which that poet is famous. In your world, is this OK?
> Consider a poet who publishes poetry in some unique meter, or has some other unique stylistic structure for which they are well known... should they be allowed to sell copies of their poems that can be used for reading only, but prevents usage of those stylistic devices by other authors?
I don't think this is a fair analogy. Unfortunately, analogy breaks down because the technology is unprecedented. So, to answer directly, no the poet cannot copyright the unique meter, but, no, machine learning is not that.
If you need an analogy, think copy machine, not human learning. An LLM can only regurgitate that which it has seen before. Absent the poet, the other poets can still make other poetry, but the LLM literally cannot make poetry that it has not seen before. If it produces a poem with that unique meter then it definitely copied that poet, and was not "inspired by" the poetry. If you wrote poetry inspired by EE Cummings your process for doing that would be very different from an LLM's, which would programmatically use his material.
What about the second part of my post, where the LLM has NOT been trained on the specific meter, but it does have some "concept" (maybe not the right word, but bear with me) of what meter is, so the human prompter can say "write a poem about subject S, with meter M" and get something in the style of that poet, without having been trained on it... sounds like you're OK with that scenario?
Full disclosure: I think I probably disagree with you on some points you've made in this thread, but I'm not going for any gotchas right now, I am just trying to map the contours of what you think is OK and not OK. We're all sort of flying blind on this stuff, so getting a sense of what others are thinking is really important in my mind. Appreciate the engagement.
I think you're coming from a fundamental place of misunderstanding. LLMs don't just regurgitate what they've seen before. After you understand how they work, I think the rest will become clear to you.
This is argued by the people that say it's akin to human learning, but then turn around and say we don't know enough about human learning. it's an utterly fallacious argument
> copyright owners can dictate how their work is used (with some exceptions), and if that use hurts the copyright owner, the owner should have the right to forbid it.
Like if I read their books and then write better ones in their style, undermining their profits?
Copyright doesn't let you stop Nazis (or insert objectional set of people of your choice here) from reading your books or seeing your art if they obtain a legal copy. You can't just say "I forbid it!". Why should we create a new restriction on our freedoms to allow for that?
Copyright absolutely allows you to control who you sell your work to, who you license it to, and who they can sublicense it to. Copyright allows an author to control who displays, reproduces, performs, etc their works. The misunderstands I've seen about copyright here are at times shocking in how incorrect they are, but seem to be consistent with a lot of the positions taken by the posters that express these understandings. For example, someone upthread said that copyright only protects against verbatim copying!
My issue is not with capitalism but with assuming the present rules of asset ownership are optimal! If we could make housing free - conjure it out of thin air - it would be really bad for landlords. They rely on that income to make a living!
We should still obviously do it. More of a thing that people want is usually good.
Conjuring things out of thin air also tends to have side-effects, and it's better not to stop at the first-order effect of an action before going ahead and "just doing it". Concretely with content generation: if the disregard for copyright leads to a world where people no longer make the effort to produce and think about new things, the only things that you will consume will be produced by AI. Reminds me of The Matrix :-)
Conjuring things out of thin air does not have side effects because it is not possible.
The whole point of the phrase was to describe a hypothetical situation with no side effects to avoid sideways arguments about "but actually here's some bad things that would happen if you did that unrelated to the central argument".
I actually agree completely with that; my initial reply doesn't quite put the focus where it needs to be.
So let me try again: in my view, you shouldn't think of policies or laws impacting real people by ever placing yourself outside of the reality in an ideal case, because the hard bit is not conjuring ideals but finding a way of making them happen.
It's always a lot messier where the rubber meets the road. People have already died and suffered because ideals (specifically around asset ownership) that weren't quite thought through, but caught on. Take communism as an example.
Part of my point is that such "implementation details" are not as unrelated to the central argument as they seem. This is very different from the software world where it might be ok to assume that in 2 years we'll have the computations be 10x as fast and work out a solution backwards from there.