
Tips for benchmarking a compressor - deverton
http://cbloomrants.blogspot.com/2016/05/tips-for-benchmarking-compressor.html
======
ryao
> Consider excluding almost-incompressible files. This is something you should
> consider for final shipping application

This is definitely a tip on how to use a compressor rather than how to
benchmark it. It should not be taken as a suggestion to avoid testing
incompressible files because certain fast compression algorithms such as LZ4
process incompressible data faster than they process compressible data. Other
algorithms run extra slowly on incompressible data. Including incompressible
data such as JPEGs when benchmarking a compression algorithm lets you
distinguish between the two.

Also, JPEGs are not automatically incompressible. The squash benchmark results
claim that ncompress is capable of compressing the JPEG that the Snappy
developers use to test their compression algorithm to 79%, which is well into
compressible territory:

[https://quixdb.github.io/squash-benchmark/](https://quixdb.github.io/squash-
benchmark/)

You can confirm that LZ4 is faster on incompressible data than compressible
data using the JPEG and just about anything else. As far as LZ4 and most of
the compressors there are concerned, the JPEG is incompressible. ncompress is
the most notable exception.

A file full of random binary numbers would probably work better than a JPEG.
In any case, you need to check your compression ratios when evaluating whether
something is compressible or incompressible. Just assuming that a certain type
of file is incompressible does not work in general.

------
Cyphus
Wow, what a great set of tips. This is also a great guide for benchmarking any
software, so long as its execution is highly deterministic like compression
algorithms are.

This will probably be my goto reference when someone shows me a benchmark with
bad methodology. If nothing else it should give him/her an appreciation of the
multitude of things that can affect a benchmark result.

------
achr2
> Make sure to test on 3D video files.

