Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Given that it's BWT, the difference should be the most prominent on codebases with huge amounts of mostly equivalent files. Most compression algorithms won't help if you get an exact duplicate of some block when it's past the compression window (and will be less efficient if near the end of the window).

But here's a practical trick: sort files by extension and then by name before putting them into an archive, and then use any conventional compression. It will very likely put the similar-looking files together, and save you space. Done that in practice, works like a charm.



Handy tip for 7-Zip, the `-mqs` command line switch (just `qs` in the Parameters field of the GUI) does this for you. https://7-zip.opensource.jp/chm/cmdline/switches/method.htm#...


Ooh, that’s neat. How much improved do you get from this? Is it more single or double digit % diff?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: