
Lossless decompression and the generation of random samples - joelg236
http://blog.sigfpe.com/2012/01/lossless-decompression-and-generation.html
======
DanBC
> _Huffman coding tries to compress text one letter at a time on the
> assumption that each letter comes from some fixed and known probability
> distribution. If the algorithm is successful then we 'd expect the
> compressed text to look like a uniformly distributed sequence of bits. If it
> didn't then there'd be patterns that could be used for further compression._

This can be gently confusing when you're using different compression systems,
(bits vs bytes)

([https://groups.google.com/d/topic/lz4c/DcN5SgFywwk/discussio...](https://groups.google.com/d/topic/lz4c/DcN5SgFywwk/discussion))

Someone is compressing very large log files. They then compressed the output,
and got further reductions in size.

> _The fundamental reason is that these highly repetitive byte sequences, with
> very small and regular differences, produce repetitive compressed sequences,
> which can therefore be compressed further._ \- Yann Collet

------
hadronzoo
The same is true of arithmetic coding, which separates the probability model
from the encoding process. Feed an arithmetic coder a stream of random bits
and it will efficiently sample from your model. See section 6.3:
[http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/105.124....](http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/105.124.pdf)

------
Chris2048
I simply don't understand this, I don't think I have the right background. Any
good starting places for finding more about this topic?

~~~
susi22
I'd suggest diving into discrete probability for a few hours. That should be
enough to understand this post. Huffman coding is a little advanced but you
can understand the algorithm (not the theory) with just the probability.

