Tips for benchmarking a compressor (cbloomrants.blogspot.com)
33 points by deverton on May 16, 2016 | 3 comments

> Consider excluding almost-incompressible files. This is something you should consider for final shipping application

This is definitely a tip on how to use a compressor rather than how to benchmark it. It should not be taken as a suggestion to avoid testing incompressible files because certain fast compression algorithms such as LZ4 process incompressible data faster than they process compressible data. Other algorithms run extra slowly on incompressible data. Including incompressible data such as JPEGs when benchmarking a compression algorithm lets you distinguish between the two.

Also, JPEGs are not automatically incompressible. The squash benchmark results claim that ncompress is capable of compressing the JPEG that the Snappy developers use to test their compression algorithm to 79%, which is well into compressible territory:


You can confirm that LZ4 is faster on incompressible data than compressible data using the JPEG and just about anything else. As far as LZ4 and most of the compressors there are concerned, the JPEG is incompressible. ncompress is the most notable exception.

A file full of random binary numbers would probably work better than a JPEG. In any case, you need to check your compression ratios when evaluating whether something is compressible or incompressible. Just assuming that a certain type of file is incompressible does not work in general.

Wow, what a great set of tips. This is also a great guide for benchmarking any software, so long as its execution is highly deterministic like compression algorithms are.

This will probably be my goto reference when someone shows me a benchmark with bad methodology. If nothing else it should give him/her an appreciation of the multitude of things that can affect a benchmark result.

> Make sure to test on 3D video files.

