Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tell HN: The joy of 100 to 1 backup compression: 8.3M to 70K
4 points by andrewstuart on Aug 21, 2014 | hide | past | favorite | 5 comments
Dear HN

I wish to share the joy deep in my heart that I have found from using lrzip http://ck.kolivas.org/apps/lrzip/

My Postgres database dumps are compressed more than 100 to 1. So nice.

  (venv2.7)username@ip-x-x-x-x:~/backupscripts/16303$ ls -lah
  total 8.4M
  drwxrwxr-x 2 user 4.0K Aug 21 .
  drwxrwxr-x 3 user 4.0K Aug 21 ..
  ---------- 1 user 8.3M Aug 21 singlepageguru_db_21-Aug-2014-06:00
  ---------- 1 user  70K Aug 21 singlepageguru_db_21-Aug-2014-06:00.lrz
  (venv2.7)username@ip-x-x-x-x:~/backupscripts/16303$


This compression ratio only means your data file contains a great deal of redundancy, exploitable consistent patterns that repeat.

You're describing Lrzip as though it's responsible for the compression ratio. I assure you, the result lies with the data, not the compression algorithm or application.

For purposes of a discussion of compression, data can be described in terms of its entropy. If data has high entropy, it will resist compression. With low entropy like your example, a compression algorithm can produce a high ratio, and the resulting file has high entropy, meaning it will resist any further compression.

If you want to produce an impressive result for someone who doesn't understand compression, use a plain-text file for the example, one containing many repeated words and phrases. If you want to make a compression application look bad, try to compress something already compressed, like a JPEG graphic.

Further reading:

http://en.wikipedia.org/wiki/Entropy_(information_theory)#Da...


It still makes me happy.


What kind of results do you get for other popular archive formats like rar or tar-bzip2?


What's your Weissman?


It must be close to 5.2




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: