
Parallel decompression of gzip-compressed files - nkrumm
https://arxiv.org/abs/1905.07224
======
nkrumm
GitHub: [https://github.com/Piezoid/pugz](https://github.com/Piezoid/pugz).

From the readme:

"Contrary to the pigz program which does single-threaded decompression (see
[https://github.com/madler/pigz/blob/master/pigz.c#L232](https://github.com/madler/pigz/blob/master/pigz.c#L232)),
pugz found a way to do truly parallel decompression. In a nutshell: the
compressed file is splitted into consecutive sections, processed one after the
other. Sections are in turn splitted into chunks (one chunk per thread) and
will be decompressed in parallel. A first pass decompresses chunks and keeps
track of back-references (see e.g. our paper for the definition of that term),
but is unable to resolve them. Then, a quick sequential pass is done to
resolve the contexts of all chunks. A final parallel pass translates all
unresolved back-references and outputs the file."

------
LinuxBender
Somewhat related, for bzip2, I use pbzip2 which uses all the cores, or as many
as you specify. [1] It is in the EPEL repo for RHEL/CentOS/Fedora.

[1] - [https://linux.die.net/man/1/pbzip2](https://linux.die.net/man/1/pbzip2)

