> But creating a derivative work based on the song? 1. it wouldn't matter as der...

knollimar · 2025-11-11T15:16:29 1762874189

That seems like a really broad interpretation of "technically memorization" that could have unintended side effects (like say banning equations that could be used to generate specific lyrics), but I suppose some countries consider loading into RAM a copy already. I guess we're already at absurdity

cycomanic · 2025-11-11T20:15:46 1762892146

> but I suppose some countries consider loading into RAM a copy already. I guess we're already at absurdity

FYI most do. Have a look at many software licenses. In particular Microsoft (who as we know invested lots into OpenAI), will argue it is so.

I would also say it makes sense. If it wasn't the case we can just load a program into lots of computers using only a single license/installation medium.

knollimar · 2025-11-11T20:30:31 1762893031

I think it's absurd. In my opinion the copy is for copying the usable part (e.g. installation).

Is running a program making a copy? If I run it on some distributed system is it then making more copies than allowed? This gets insane quickly.

I think it's just a bandaid for fixing removable drive installations. These should have had their own laws/rules/etc.

It has knock-on effects like being able to enforce other IP law to someone you just licensed your software to.

Similarly I think this is more an "interpret words to get the desired outcome instead of the likely spirit or meaning of the words".

dathinab · 2025-11-12T12:41:16 1762951276

It _really_ isn't absurd.

The law doesn't care what technical trickery you use to encode/compress copyrighted material. If you take data and then create a equation which contains it based on it it which can reproduce the data trivially then yes, IMHO obviously, this form of embedding copyrighted data still is embedding copyrighted data.

Think about it if that weren't the case I could just transform a video into an equation system and then distribute the latest movies, books, whatever to everyone without permission and without violating copy right even through de-facto I'm doing exactly what copy right law is supposed to prevent... (1)

Just because you come up with a clever technical trick to encode copyrighted content doesn't mean you can launder/circumvent copyright law, or any law at that. Law mostly doesn't care about technical tricks but the outcomes.

Maybe even more importantly LLMs under hood the are basically at the core compression systems where by not giving them enough entropy to store information you force to generalize and with that happen to create a illusion of sentience.

E.g. what is the simplest case of training a transformer? You put in data to create the transformer state (which has much smaller entropy) and then output it from that state and then you find a "transformation" where this works as well as possible for a huge amount of different data. That is a compression algorithm!!! And sure in reality it's more complex you don't train to compress a specific input but more like a dictionary of "expected" input->output mappings where the output parts need to be fully embedded i.e. memorized in the algorithm in some form.

LLMs are basically obscure multi layered hyper dimensional lossy compression systems which compress a simple input->output mapping (i.e. database) defined by all entries in it's training data. A compressed mapping Which due to forcing a limited entropy needs to do compression through generalization....

And since when is compression allowing you to avoid copyright??

So if you want it to be handled differently by law because it's isn't used as a compressed database you have to special case it in law.

But it is used as a compressed database, in that case e.g. it was used to look up lyrics based on some clues. That's basically a lookup in a lossy compressed obscure database system no matter how you would normally think about LLMs.

(1): And in case it's not clear this doesn't mean every RNG is a violation because under some unknown seed it probably would reproduce copyrighted content. Because the RNG wasn't written "based on" the copy righted content.