Tell HN: The joy of 100 to 1 backup compression: 8.3M to 70K

lutusp · on Aug 21, 2014

This compression ratio only means your data file contains a great deal of redundancy, exploitable consistent patterns that repeat.

You're describing Lrzip as though it's responsible for the compression ratio. I assure you, the result lies with the data, not the compression algorithm or application.

For purposes of a discussion of compression, data can be described in terms of its entropy. If data has high entropy, it will resist compression. With low entropy like your example, a compression algorithm can produce a high ratio, and the resulting file has high entropy, meaning it will resist any further compression.

If you want to produce an impressive result for someone who doesn't understand compression, use a plain-text file for the example, one containing many repeated words and phrases. If you want to make a compression application look bad, try to compress something already compressed, like a JPEG graphic.