Some years ago I had an idea to have a method of file sharing with strong plausible deniability from the sharer.
The idea, in stage one, was to split a file into chunks and xor those with other random chunks (equivalent to a one-time pad), those chunks as well as the created random chunks then got shared around the networks, with nobody hosting both parts of a pair.
The next stage is that future files inserted into the network would not create new random chunks but randomly use existing chunks already in the network. The result is a distributed store of chunks each of which is provably capable of generating any other chunk given the right pair. The correlations are then stored in a separate manifest.
It feels like such a system is some kind of entropy coding system. In the limit the manifest becomes the same size as the original data. At the same time though, you can prove that any given chunk contains no information. I love thinking about how the philosophy of information theory interacts with the law.
I think this touches on the core mismatch between the legal perspective and technical perspective.
Yes, on a technical level, those chunks are random data. On the legal side, however, those chunks are illegal copyright infringement because that is their intent, and there is a process that allows the intent to happen.
Except you've a heckin' problem with Stable Diffusion because you have to argue that the intent is to steal the copyright by copying already existing artworks.
But that's not what people use Stable Diffusion for: people use Stable Diffusion to create new works which don't previously exist as that combination of colors/bytes/etc.
Artists don't have copyright on their artistic style, process, technique or subject matter - only on the actual artwork they output or reasonable similarities. But "reasonable similarity" covers exactly that intent - an intent to simply recreate the original.
People keep talking about copyright, but no one's trying to rip off actual existing work. They're doing things like "Pixar style, ultra detailed gundam in a flower garden". So you're rocking up in court saying "the intent is to steal my clients work" - but where is the clients line of gundam horticultural representations? It doesn't exist.
You can't copyright artistic style, only actual output. Artists are fearful that the ability to emulate style means commissions will dry up (this is true) but you've never had copyright protection over style, and it's not even remotely clear how that would work (and, IMO, it would be catastrophic if it was - there's exactly one group of megacorps who would now be in a position to sue everyone because try defining "style" in a legal sense).
> because you have to argue that the intent is to steal the copyright by copying already existing artworks
Copyright infringement can happen without intending to infringe copyright.
Various music copyright cases start with "Artist X sampled some music from artist Y, thinking it was transformative and fair use". The court, in some of these cases, have found something the artist _intended_ to be transformative to in fact be copyright infringement.
> You can't copyright artistic style, only actual output
You copyright outputs, and then works that are derived from those outputs are potentially copyrighted. Stable Diffusion's outputs are clearly defined from the training set, basically by definition of what neural networks are.
It's less clear they're definitely copyright-infringing derivative works, but it's far less clearcut than how you're phrasing it.
That's an interesting essay and I agree it goes to the heart of the question. There's clearly an interesting question, even in the colour domain: is someone infringing copyright if the data they themselves are sharing has a perfectly legitimate colour that is the basis of their sharing? That's the plausible deniability bit that's so important: "Yes your honour, I did share that chunk of random data, but I did so because it's part of this totally legitimately coloured file I was wanting to share. I had no idea that someone added a new colour to the block. Obviously, I'm only sharing the original colour block; prove otherwise". At some point, the court has to decide the colour of the block from the perspective of the accused, which allows a basis for deniability.
The idea, in stage one, was to split a file into chunks and xor those with other random chunks (equivalent to a one-time pad), those chunks as well as the created random chunks then got shared around the networks, with nobody hosting both parts of a pair.
The next stage is that future files inserted into the network would not create new random chunks but randomly use existing chunks already in the network. The result is a distributed store of chunks each of which is provably capable of generating any other chunk given the right pair. The correlations are then stored in a separate manifest.
It feels like such a system is some kind of entropy coding system. In the limit the manifest becomes the same size as the original data. At the same time though, you can prove that any given chunk contains no information. I love thinking about how the philosophy of information theory interacts with the law.