I think you have to be practical. It would be difficult to train an AI to consume Harry Potter and compress it but prevent it from recreating it. You can try and people do, but there are always ways around it.
But it's on an individual prompt basis. It's not like ChatGPT can produce the entirety of its text and sell it as a pdf. It's just a device that could reproduce it much like a word processor is a device that you can read the book and type out the contents.
So the question is one of practicality. Do we ensure that no copyrighted material is in the training data? Difficult but probably not impossible. But what you can't do is target the content in all its various other forms, from descriptions of the plot, reviews, fan fiction, etc. So in the end its pretty much a lost cause.
So what to do about it? I don't know. In the utilitarian sense, I think the world in which this technology exists in a non-crippled form is a better richer world than one in which there are all these procedural steps to try to prevent this (and ultimately failing).
Whats the harm here? Are people not buying Harry Potter books and just having an LLM painfully recreate the plot? I would imagine Harry Potter fans would be able to explore their love of the media through LLMs and that would drive more revenue to Harry Potter media, much like fan fiction and pirated music lead to more engagement and concert sales.
In the case of new art, maybe fewer artists get commissioned, but let's be real, Mike Tyson wasn't going to contract out an artist to create a ghibli style animation of him anyway, so there's really little harm in LLMs here to artists. If anything it expands the market and interest.
I'm just going to briefly respond to the part you wrote about art in particular.
We may not have a way to actually quantify the harm that GenAI is doing to creative industries because some of the damage is long-term. Choices are being made right now based on the state of the world. Why would anyone start an art career in this climate? What does art as a profession look like in 5 years? 15 years?
Art is not just the final artifact, and I feel we're surrendering part of our humanity in service of enriching big tech companies.
We proceed towards AGIs that implement proper understanding, and have them read all of the masterpieces and essays and textbooks - otherwise they will be useless -, as is fully legitimate in any system that foresees libraries.
But it's on an individual prompt basis. It's not like ChatGPT can produce the entirety of its text and sell it as a pdf. It's just a device that could reproduce it much like a word processor is a device that you can read the book and type out the contents.
So the question is one of practicality. Do we ensure that no copyrighted material is in the training data? Difficult but probably not impossible. But what you can't do is target the content in all its various other forms, from descriptions of the plot, reviews, fan fiction, etc. So in the end its pretty much a lost cause.
So what to do about it? I don't know. In the utilitarian sense, I think the world in which this technology exists in a non-crippled form is a better richer world than one in which there are all these procedural steps to try to prevent this (and ultimately failing).
Whats the harm here? Are people not buying Harry Potter books and just having an LLM painfully recreate the plot? I would imagine Harry Potter fans would be able to explore their love of the media through LLMs and that would drive more revenue to Harry Potter media, much like fan fiction and pirated music lead to more engagement and concert sales.
In the case of new art, maybe fewer artists get commissioned, but let's be real, Mike Tyson wasn't going to contract out an artist to create a ghibli style animation of him anyway, so there's really little harm in LLMs here to artists. If anything it expands the market and interest.