Hacker News new | past | comments | ask | show | jobs | submit login

I have found that a very good approach is to apply some very simple transformations such as delta encoding of timestamps, and then letting a good standard compression algorithm such as zstd or deflate take care of the rest.

Delta encoding of timestamps helps a lot though, because it makes the redundancy more visible to a general purpose compression algorithm.

I used this for the telemetry storage of the Columbus module of the International Space Station, back in ~2010, and then a few times since.

http://blog.klaehn.org/2018/06/10/efficient-telemetry-storag...

https://github.com/Actyx/banyan




> delta encoding of timestamps, and then letting a good standard compression algorithm such as zstd or deflate take care of the rest

This can cut both ways. If the integer data is irregularly distributed, you may increase entropy and give the compressor a harder time than it would have just looking for slightly longer common substrings

I've lost count of the number of compression experiments where some fancier encoding that greatly reduced redundancy in the uncompressed representation also happened to worsen (or at least have no effect on) the compressed output size. Would even be tempted to say it more often than not makes things worse


I tried this with real data (Satellite and Columbus module telemetry). At least for the timestamps, it is always a big win, because values are typically being sampled at a precisely or at least somewhat regular frequency.

For sample values, it is not as clear cut. The best way I found is to just try both approaches and use the one that results in better compression, if you can afford it. But for values I usually don't bother.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: