Entropy is a tricky word: legend has it that von Neumann persuaded Shannon to use it for the logarithmic information measure because “no one knows what it means anyways”.
These days we have KL-divergence and information gain and countless other ways to be rigorous, but you still have to be kind of careful with “macro” vs “micro” states, it’s just a slippery concept.
Whether or not some 7B parameter NN that was like, Xavier-Xe initialized or whatever the Fortress of Solitude people are doing these days is more or less unique than after you push an exabyte of Harry Potter fan fiction through it?
I think that’s an interesting question even if I (we) haven’t yet posed it in a rigorous way.
These days we have KL-divergence and information gain and countless other ways to be rigorous, but you still have to be kind of careful with “macro” vs “micro” states, it’s just a slippery concept.
Whether or not some 7B parameter NN that was like, Xavier-Xe initialized or whatever the Fortress of Solitude people are doing these days is more or less unique than after you push an exabyte of Harry Potter fan fiction through it?
I think that’s an interesting question even if I (we) haven’t yet posed it in a rigorous way.