

Compressing log files *twice* improves ratio dramatically - willvarfar
https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

======
pwg
This is not at all a surprising outcome. The result of the very redundant
input file is (as a result of the limited range length of the compressors and
the fact that they output chunks of literal data) a smaller, but still
somewhat redundant output file. The subsequent passes are simply operating
upon the redundancy created by the compression algorithm itself.

Try compressing a file of nulls and look at the output with hexdump or a hex
editor. You'll see the patterns left over by the compression algorithm:

    
    
       dd if=/dev/zero of=test bs=1024 count=102400
       gzip -9v test
       "look at contents of 'test.gz'" -- lots of redundancy
       gzip -9v < test.gz > test.gz.gz
       "look at contents of 'test.gz.gz'"  -- looks more random

~~~
willvarfar
I'm imagining that the byte-aligned LZ matches (e.g. one match emitted for
timestamp and another for the remainder of the log-line) in the first pass are
using relative offsets, so they are actually exposing a deeper pattern to the
second pass.

