This is just an end to copyright. There is no definition of "AI", it has always been a marketing term for monetising csci research. There is no difference between a lossy jpg, taking its pixels as weights, and the weights of a NN.
So if i just zip up copyrighted images using a NN, then, what? They're public domain?
Regulators here are miles away from understanding the implications -- this is what happens when you let companies whose profit motive is selling "AI" be the "Experts" on the topic.
If you don't care about Disney, fine -- so what about your health records? This is also the prelude to an end to privacy
Completely crazy take. Your health records are not currently protected by copyright law. If someone magically snapped their fingers and eliminated copyright law, the protections on your health data, scant though they may be, would more or less be the same (IANAL)
So to be clear -- you're well aware that in the case of your private data that you have an interest in preventing it being used to train AI.
Great. So d'you think you could outline a reason why you wouldnt have an interest in your creative works not being used also?
Either the training data is, as big-ad-tech says, essentially equivalent to generic human experiences -- ie., weakly repoducible; OR it is extremely reporducible, and equivalent more to standard contemporary data compression.
If you're kool-aid'ing the former on copyright, why not the latter on privacy?>
Because my private data isn't protected by copyright, it's protected by things like HIPAA which doesn't matter one iota about human experiences and applies equally to humans and machines. It's about data sovereignty and who may access my data and for what purpose. A human is not allowed to share, retain, or reproduce my medical data.
So arguments like "I can get the AI to output my chart verbatim" start carrying weight because it's granted access to data that the humans that created the AI are not permitted to share in any form whatsoever where as copyright concerns what I may do with the data after it's produced. Copyright is full of exceptions for things that don't count as a reproduction or performance of the work and this is just one more, it doesn't change the nature of copyright.
It'd be really interesting to open up a movie theater in Japan that just ingests blockbusters through a "do nothing" NN and then be able to screen them royalty free. This decision feels incredibly half baked.
I'd clarify that in my example the screening would be of the potentially random output of a model... just one that was only trained by watching a specific blockbuster movie and thus extremely likely to just reproduce the source material. My example is obviously an extreme but it gets at the core of the NYT case here in the states... I think it's a bad thing if we allow models to output data nearly indistinguishable from copyrighted data it was trained on.
W.r.t to the NYT case - It's my opinion that it's completely reasonable to use a corpus of vetted english literature like the NYT as a way to train your model to comprehend language - but if the model also begins to echo the contents of those articles then that may be a serious breech of the NYT's right's to monetize their work.
That will probably go about as well as distributing an encrypted copyrighted work along with its encryption key and then claiming that none of the bits are the same so you did nothing wrong. Courts historically have not had any problem sorting out nerdy fantasy workarounds of the type often posted on HN.
At that direct level yeah probably, and I do think copyright is dumb and should be if not abolished, limited to 20 years or whatever. That said, imagine training a network from scratch entirely on Disney's catelog. Even if that model is then prompted to generate new characters, it seems weird to say that Disney's copyright wasn't infringed.
> "zip up copyrighted images using a nn" is trek level technobabble.
Look up 'overfitting', neural-network based compression, etc. or that paper that used zip compression as a neural-network basically. Farthest thing possible from being 'technobabble' once you understand how inextricably linked compression and 'understanding' is.
Regardless of whether it's "technobabble," it's a misunderstanding of how courts operate. The law is not a formally specified algorithm. If you overfit a NN to produce someone else's work, that's not going to get you off the hook in front of a court.
So if i just zip up copyrighted images using a NN, then, what? They're public domain?
Regulators here are miles away from understanding the implications -- this is what happens when you let companies whose profit motive is selling "AI" be the "Experts" on the topic.
If you don't care about Disney, fine -- so what about your health records? This is also the prelude to an end to privacy