
Parallel Gzip – Pigz - keyboardman
https://leimao.github.io/blog/Parallel-Gzip-Pigz/
======
goodside
This is a low-effort post, showing basic CLI usage and nothing else. Pigz does
not actually decompress in parallel, as the author implies, underscoring the
importance of benchmarking things — especially when you're only considering
them for performance reasons. See:
[https://github.com/madler/pigz/issues/36](https://github.com/madler/pigz/issues/36)

For compression, pigz is only useful for compressing a single large file in a
scenario where you _must_ use gzip. For large collections of files, wrapping
normal gzip with GNU Parallel would be faster. In scenarios where you control
both compression and decompression, you'd be better off picking a more modern
compression algorithm than gzip. ZStandard is almost always the right choice,
but this benchmark is a great resource for comparing other options:
[https://quixdb.github.io/squash-benchmark/](https://quixdb.github.io/squash-
benchmark/)

------
LeoPanthera
See also:

pbzip2:
[https://github.com/ruanhuabin/pbzip2](https://github.com/ruanhuabin/pbzip2)

plzip:
[https://www.nongnu.org/lzip/plzip.html](https://www.nongnu.org/lzip/plzip.html)

And possibly the cleverest of the parallel compressors, tarlz, which, while
still being backwards compatible with tar, also allows parallel
_decompression_ , something none of the others can do:
[https://www.nongnu.org/lzip/tarlz.html](https://www.nongnu.org/lzip/tarlz.html)

~~~
Yajirobe
The source code looks a little extreme... Is it auto-generated?

[https://github.com/ruanhuabin/pbzip2/blob/master/pbzip2.cpp#...](https://github.com/ruanhuabin/pbzip2/blob/master/pbzip2.cpp#L3367)

~~~
trav4225
Seems pretty standard to me. (Although I also typically only write C in C89,
so modern C might be a lot prettier.)

~~~
Yajirobe
Where are the 20 line functions, the modularity, the refactored nested if's?

~~~
trav4225
Like I said, pretty standard! ;-)

------
skrause
You basically only need to learn about "zstd -T0". Better and faster algorithm
and multi-threading directly included.

~~~
dispat0r
sadly also doesn't work for decompression.

~~~
goodside
Everybody's needs are different, but zstd is so fast the lack of multithreaded
decompression doesn't matter very often. I've only ever found parallel
decompression useful with bz2 files (which are insanely CPU-intensive to
decompress), using lbzip2.

Zstd also has experimental code in the `contrib` dir for parallel (and, more
importantly, seekable) decompression:
[https://github.com/facebook/zstd/issues/395](https://github.com/facebook/zstd/issues/395)

~~~
dispat0r
Thanks for the Information. You're right zstd even on a single core is pretty
fast but for multicore systems with nvme drives multithreading would be
beneficial.

------
ed25519FUUU
Looks great! I’m still surprised these tools (gzip, etc) haven’t been update
to take advantage of multicore support.

~~~
bennofs
In the gzip case, the reason is that the compressed stream format is
inherently sequential: you need to decompress everything before a byte before
you can decompress the next byte. So parallizing is not really possible,
without changing the format.

Newer compression formats have multiple streams that can be decompressed in
parallel, or have blocks that are compressed independently without
dependencies (bzip2 can be decompressed in parallel due to independent
blocks).

~~~
ed25519FUUU
So is Pigz not backward compatible with gzip?

~~~
goodside
Pigz is backward compatible — it breaks the input into 128K chunks, compresses
each in parallel, and concatenates them. Gzip supports naive concatenation of
compressed files so the result can be decompressed by standard gzip.

Pigz does not implement parallel decompression, but pigz-compressed files
could, in principle, be decompressed in parallel. I assume there's not much
urgency for this because gzip decompression is fast enough: >3x faster than
compression and >10x faster than bzip2 decompression on default settings.
(See: [https://quixdb.github.io/squash-
benchmark/](https://quixdb.github.io/squash-benchmark/) )

