You are trying to argue encoding semantics, but at the end of a day the "AI" was completely happy to recite Carmack's Fast inverse square root including original comments verbatim word for word.
With the way these AI models work, that data isn’t stored in a database though.
It’s hard for people to understand this concept, but the fact that a model repeated some data verbatim is a happy coincidence (!) solely based on patterns of data that it seen before.
I think people have also have a hard time with how these models are trained. They are vacuuming up all sorts of data and learning from them by creating vectors that determine how follow-up data should be generated.
Sure, the original creators of this content aren’t being compensated or even recognized for it. I don’t have a good idea on how that should be handled.
For normal humans though, looking at art or reading a book, and later repeating some passage or drawing something from your own memory is not a crime. (Unless you’re sharing the DeCSS source code I guess…)
Slightly changing the topic here, but I do wonder what were to happen if someone wrote a program called “Monkeys on Typewriters” that just iterated through various combinations of characters (or bits or pixels) and was able to recreate things verbatim.
Is that random happenstance copyright infringement?
> For normal humans though, looking at art or reading a book, and later repeating some passage or drawing something from your own memory is not a crime.
False, actually; memorizing a copyrighted work and reproducing it other than in conditions specifically excepted from copyright protection is a violation of the exclusive rights of the copyright holder to make copies.
Copyright doesn't just apply to mechanical copies which don't have a human brain in the middle of the process.
But it isn’t. It’s just a series of vectors that point to a likely occurrence of the next word or pixel or bit in a sequence.