> Learning Human learning doesn't involve making a copy (or any other use of an ...

SilasX · 2025-05-19T20:59:33 1747688373

>Human learning doesn't involve making a copy (or any other use of an exclusive rights) as defined in copyright law (the human brain not being a fixed medium), AI training does, because digital storage is.

That's just false -- AI models themselves only store parameter weights, which represent a high-level, aggregated understanding across all data that was learned on, i.e. what human brains do. This is clear from all the examples where you have to painstakingly trick them into producing exact text.

And even if they did store something that's "effectively a copy", that's no more copyright infringement than when Google caches a site in order to handle search queries. It's not copyright infringement until they start redistributing [non-fair-use] content from the sites.

dragonwriter · 2025-05-19T23:01:49 1747695709

Sorry, that was expressed less precisely than I intended.

Human learning doesn't involve fixing anything, whether or not it would be a copy, into a fixed medium, and therefore even the question of whether or not it would be a copy if it were in a fixed medium, much less the question of whether or not such a copy would be exempted due to fair use, does not arise.

AI training does involve fixing something into a fixed medium, which makes it disanalogous to human learning. This raises the question of whether the thing fixes (parameter weights) is or is not legally a lossy mechanical copy of the training corpus, which further potentially raises the question of whether the incorporation of individual copyright protected works into the training corpus combined with its use in training would (before considering exceptions like Fair Use) violate the copyright on the works involved and whether, if it would, the use nevertheless satisfies the requirements of one or more of those exceptions (Fair Use being the one usually argued for.)

> And even if they did store something that's "effectively a copy", that's no more copyright infringement than when Google caches a site in order to handle search queries.

Among the reasons that Google removed outside access to its cached copies is that the evolution of the web meant that providing them increasingly potentially had negative revenue on the original content providers, which harms the Fair Use case for caching, since effect on the market of the original work is a fair use factor. (Google's cache was ruled as fair use based on the factual situation -- including implied consent, which AI model trainers have a much weaker arguent for -- in 2006, but changes in the facts relevant to fair use can change the outcome.)

But AI training is not so analogous to Google's cache (much less the situation of Google's cache in 2006) that one can simply leap with no analysis from one being fair use to the other in the first place. That's applying wishful thinking, not Fair Use analysis.

SilasX · 2025-05-20T00:48:19 1747702099

>Human learning doesn't involve fixing anything,... AI training does involve fixing something into a fixed medium, which makes it disanalogous to human learning.

No, it breaks one part of the analogy, which has never been considered relevant: how long the learning persists. Yes, computers can store weights of a AI model much longer than any human could. But storing this "impact of having viewed a copyrighted work" has never been a factor in considering something infringing, regardless of how long it's stored. Courts don't consider it infringement if you simply use what you have learned from reading previous novels (the updates to your brain's neural weights) in producing new content.

Your argument is saying that the "fixedness of storing model weights" (aggregated high level understanding) can cause it to be infringement. That's without precedent. It would imply that if you're really good at "fixing" (remembering) the style of an author you read 50 years ago, it somehow crosses over into infringement because you "fixed that understanding into a medium" (your brain). That's not how it works at all.

>Among the reasons that Google removed outside access to its cached copies

I wasn't referring to the cached part of a site that Google serves to users, but the undistributed cache that they hold merely to know which sites to point you to, so you're not addressing the analogy. Here Google does store an exact copy (of at least some portions) and even then it's not considered copyright infringement until they start redistributing that content (or at least, too much of it).

My point was that acceptance of this practice even further bolsters the case that AI models aren't infringing, because, even if they did store exact copies, that generally not considered infringement until they start serving close-enough copies of the original copyrighted content.

yusefnapora · 2025-05-19T21:20:25 1747689625

Good thing my MP3 files only store a psycho-acoustic model of that Metallica album!

I mean sure, if you go to painstaking lengths, you can trick your computer into making some noise that seems vaguely similar to the copyrighted work it was trained on, but I trust the consumer to make their own fair use evaluation.