Hacker News new | past | comments | ask | show | jobs | submit login
A poor entropy coding result in Zstandard (forwardscattering.org)
16 points by Ono-Sendai 11 months ago | hide | past | favorite | 11 comments



> So the theoretical minimum is ~1.2 MB. Zstd is only acheiving ~1.4 MB, which is nowhere near the theoretical limit.

> Interestingly, 7-zip compresses the data much better than zstd, reaching a size of 1,298,590 B.

This isn't surprising to me. Zstd is designed for real time compression at I/O speeds (~1 GB/s), so it probably would leave some entropy in exchange for speed, but LZMA (7zip default) is designed for good compression ratios.


Might do better if swiveled it so all the upper 8 bits are stored contiguous, then all the lower 8 bits-- zstd likes bytes.


Yes I think you are right. From looking at the zstd source, as far as I can tell the entropy coder operates on bytes only.


His method for computing entropy is not so good. His method works for an independent and unpredictable sequence of bytes, but the actual data is a bunch of signed 16-bit residuals that presumably are normally distributed. Really, he should be measuring the entropy of the original heightmap, but instead he is measuring the entropy of the heightmap after he has applied a couple of confounding transformations (predictive filter, words->byte pairs, and a random shuffle) that act to increase the apparent entropy.

Calculating the exact entropy of data is at least as difficult as breaking encryption. Encrypted data should be indistinguishable from random (full-entropy) noise, but really its entropy should be the entropy of the plaintext data plus the entropy of the encryption key.


The entropy of the original heightmap is much higher, around 15 bits per symbol.


I would not write this as failure of zstd.

There are no comments on particular command line usage of respective compression algorithm frontends.

zstd has higher compression levels that near LZMA and you can additionally train a specific dictionary to be used in compressing particular kind files.


I tried on all zstd compression levels, all perform poorly. (author of blog post here)


Try TurboRC using 16-bits format: https://github.com/powturbo/Turbo-Range-Coder


Wonder if he tried zstd --long=31

Or, given that he's basically mentions FCM/PPM, zpaq.


+1'd, came here to mention same, if need optimal, zpaq or variants

plus height map data is prolly pretty hard to compress assuming normal distributions, etc

85% of optimum at IO speeds is pretty good using zstd especially default options is pretty good


What benefit do I have with using zstd over the rest? Is there any?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: