

Sampling from a probability distribution is equivalent to lossless decompression - tripzilch
http://blog.sigfpe.com//2012/01/lossless-decompression-and-generation.html

======
ColinWright
See also: Arithmetic coding

<http://en.wikipedia.org/wiki/Arithmetic_coding>

~~~
tripzilch
Yes, but did you read the article? The cool thing is if you feed the
arithmetic encoding with uniform random bits/numbers, you'll get a random
sampling of the distribution it models. Maybe it's obvious to others but it
was a clever reversal idea that made me go "huh!" :-)

~~~
ColinWright
I guess I should've written a more complete response.

This idea as described in the article only works for specific distributions,
because Huffman coding is a prefix code, and is only optimal for distributions
with specific characteristics. Think of a two symbol alphabet with P(A)=0.1
and P(B)=0.9. Huffman coding is very poor on this, and your decoding or a
random stream definitely does not have the desired characteristics.

If you want to make it better you need to step to something like Shannon-Fano
coding which works on collections of symbols and approximates the distribution
much more closely.

On the other hand, you can step sideways to Arithmetic Coding and get exactly
the distribution you want in a single clean and elegant model.

