Given that it's BWT, the difference should be the most prominent on codebases wi... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		SeptiumMMX 9 months ago \| parent \| context \| favorite \| on: Bzip3: A spiritual successor to BZip2 Given that it's BWT, the difference should be the most prominent on codebases with huge amounts of mostly equivalent files. Most compression algorithms won't help if you get an exact duplicate of some block when it's past the compression window (and will be less efficient if near the end of the window). But here's a practical trick: sort files by extension and then by name before putting them into an archive, and then use any conventional compression. It will very likely put the similar-looking files together, and save you space. Done that in practice, works like a charm.

hcs 9 months ago | [–]

Handy tip for 7-Zip, the `-mqs` command line switch (just `qs` in the Parameters field of the GUI) does this for you. https://7-zip.opensource.jp/chm/cmdline/switches/method.htm#...

ku1ik 9 months ago | [–]

Ooh, that’s neat. How much improved do you get from this? Is it more single or double digit % diff?

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact