Sure, ROT13 encoding is a derivative work because the entire original work is still there, encoded. Ditto for FFT. Large language models are not that.
Sometimes parts of the original works are still encoded, which we've seen when some code is reproduced verbatim, and I'm sure that happens to people as well, ie. they see some algorithm and down the road have to write something similar and end up reproducing the exact same thing.
Once they iron out those wrinkles, it's not clear to me that a large language model is a directly reversible function of the original works. At least, not any more than a human learning from reading a bunch of code and then going on to have a career selling his skills at writing code.
Edit: by which I mean, LLMs are lossy encodings, not lossless encodings.
> Ditto for FFT. Large language models are not that.
They're not, but the "giant table of token frequencies and associative keywords" reminded me of doing FFT on images, and I wanted to communicate the idea that transformations like this can actually retain the original information, and reproduce it back through inverse transform.
> by which I mean, LLMs are lossy encodings, not lossless encodings
Exactly. And while I doubt most training data is recoverable, "lossy encoding" is still a spectrum. As you move away from lossless, it's not obvious when, or if at all, the result is clear from copyright of original inputs' author. Compare e.g. with JPEG, which employs a less sophisticated lossy encoding - no matter how hard you compress a source image, the result would still likely retain the copyright of the source image author, as provenance matters.
> And while I doubt most training data is recoverable, "lossy encoding" is still a spectrum. [...] Compare e.g. with JPEG
I'll just finally note that LLMs are not lossy encodings in the same sense as JPEG. LLMs are closer to human-like learning, where learning from data enables us to create entirely new expressions of the same concepts contained in that data, rather than acting as pure functions of the source data. That's why this will be interesting to see play out in the courts.
My belief is there is no fundamental difference here. That is, learning is a form of compression. Learning concepts is just a more complex form of achieving that much greater (if lossy) compression levels. If the courts will see it the same way too, things will get truly interesting.
Yes learning concepts is a form of compression, but I'm not sure that implies there's no "fundamental" difference. I see it as akin to a programming language having only first-order functions vs. having higher-order functions. Higher-order functions give you more expressive power but not any more computational power.
You could say a higher order program can "just" be transformed into a first-order program via defunctionalization, but I think the expressive difference is in and of itself meaningful. I hope the courts can tease that out in the end, and we'll see if LLMs cross that line, or if we need something even more general to qualify.
> I see it as akin to a programming language having only first-order functions vs. having higher-order functions.
Interesting analogy, and I think there are a couple different "levels" of looking at it. E.g. fundamentally, they're the same thing under Turing equivalence, and in practice one can be transformed into the other - but then, I agree there is a meaningful difference for humans having to read or think in those languages. Additionally, if those are typical programming languages, you can't really have the code in the "weaker" language self-upgrade to the point the upgraded language has the same expressive power as the "stronger" one. If the "weaker" one is Lisp though, you can lift it like this.
In this sense I see traditional compression algorithms - like the ones we use for archiving, images and sound - to be like those typical weaker languages. There's a fixed set of features they exploit in their compression. But human learning vs. neural network models (or sophisticated enough non-DNN ML) is to me like Lisp vs. that stronger programming language, or even Lisp vs. a better Lisp - both can arbitrarily raise their conceptual levels as needed. But it's still fundamentally compression / programming Turing machines.
> that happens to people as well, ie. they see some algorithm and down the road have to write something similar and end up reproducing the exact same thing.
And if such algorithm is copyrighted, that would be infringing! It doesn't matter if you copy on purpose or by chance.
Sometimes parts of the original works are still encoded, which we've seen when some code is reproduced verbatim, and I'm sure that happens to people as well, ie. they see some algorithm and down the road have to write something similar and end up reproducing the exact same thing.
Once they iron out those wrinkles, it's not clear to me that a large language model is a directly reversible function of the original works. At least, not any more than a human learning from reading a bunch of code and then going on to have a career selling his skills at writing code.
Edit: by which I mean, LLMs are lossy encodings, not lossless encodings.