
A parallel implementation of gzip for modern multi-processor multi-core machines - odeke-em
https://github.com/madler/pigz
======
dmourati
I learned about pigz in the High Performance MySQL O'Reilly book appendix. I
used it, and other techniques there to improve our MySQL backup/restore time
by 7x. This in turn won first place at a company hackathon.

~~~
wahnfrieden
Have you compared with mydumper / myloader?

------
maxpert
I love it! I just wish somebody can make a GPU based compression library
(doesn't have to be gzip or bzip), every mobile device today is shipping with
a GPU, there are some techniques out there like this ([http://on-
demand.gputechconf.com/gtc/2014/presentations/S445...](http://on-
demand.gputechconf.com/gtc/2014/presentations/S4459-parallel-lossless-
compression-using-gpus.pdf)), but I am still waiting for a solid
implementation.

~~~
yzh
You may find work from my colleagues interesting: Parallel Lossless Data
Compression on the GPU:
[http://www.idav.ucdavis.edu/publications/print_pub?pub_id=10...](http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1087)
Fast Parallel Suffix Array on the GPU:
[http://escholarship.org/uc/item/83r7w305](http://escholarship.org/uc/item/83r7w305)
I think built on top of the second paper (suffix array paper), there should be
fast compression library. We have one implementation in our library CUDPP 2.0,
you can try it out if you like:
[https://github.com/cudpp/cudpp/blob/master/src/cudpp/app/sa_...](https://github.com/cudpp/cudpp/blob/master/src/cudpp/app/sa_app.cu)

------
Rapzid
Used extensively while a system engineer at a hosting company; 10/10, would
use again. Excellent utility if you have the cpu cycles to spare and need to
cut time; gzip is almost always your bottleneck. Didn't seem to quite scale
linearly, but what does.

\--rsyncable support too.

------
Twirrim
pigz is extremely fast and very capable, plus it's packed up and provided for
almost every mainstream linux distribution, and can act as a drop in
replacement for gzip as it supports the same flag syntax.

On the bzip2 side there is pbzip2, which is also a drop in replacement,
[http://compression.ca/pbzip2/](http://compression.ca/pbzip2/)

~~~
treffer
pbzip2 only does parallel compression. lbzip2 can parallelize compression and
decompression.

With both pbzip2 and lbzip2 I got errors every now and then while everything
worked with bzip2. YMMV.

Also note that bzip2 is block based (due to bwt) and thus does not compromise
compression ratios like parallelized gzip implantations.

At 32 cores you can saturate Gbit links (even with good compression ratios!).
I hope we will see some more bzip2 love in the future due to it's perfect fit
for parallelism.

~~~
0x0
Errors sound deeply concerning. What kind of errors? Is it too risky to use
[pl]bzip2 for archival reasons?

~~~
treffer
Well, errors at a rate of one every few terabyte, and most of the time it was
mainly decompression, and always caused program abort iirc.

But those have been a few years ago, so my best advice would be: retest.

You are testing your archives anyway, right? ;-)

~~~
paxcoder
>You are testing your archives anyway, right? ;-)

Umm no, I handle errors which I expect to be reported to me.

------
mgerdts
pigz parallelizes compression. I've made changes that make it so that it can
parallelize uncompression as well.

[https://github.com/mgerdts/pigz](https://github.com/mgerdts/pigz)

This feature is present in Solaris 11.3 and later. Actually, it is in some
later patches to 11.2 as well. I added it to speed up suspend and resume of
kernel zones.

------
spullara
My guess is that this compresses less efficiently as you would have to shard
the dictionaries. Might be close though for large files. I was surprised that
there were no speed or efficiency comparisons in the README.

~~~
kbaker
The max window size for zlib is 32 KB, so I don't think the default sharding
at 128 KB would change much. You can pass the -b parameter if you find out a
bigger shard works better on your data.

If you are looking for details of the design of pigz, there is a very well-
documented overview in the source of pigz.c:

[https://github.com/madler/pigz/blob/master/pigz.c#L187](https://github.com/madler/pigz/blob/master/pigz.c#L187)

~~~
chii
had a quick scroll thru that source code - i didnt know you could implement a
try/catch in C via macros...mind blown.

~~~
TimJYoung
This book (C Interfaces and Implementations: Techniques for Creating Reusable
Software) has a whole section on it:

[https://www.amazon.com/Interfaces-Implementations-
Techniques...](https://www.amazon.com/Interfaces-Implementations-Techniques-
Creating-Reusable/dp/0201498413)

------
tobias3
And the same for LZMA:
[https://github.com/vasi/pixz](https://github.com/vasi/pixz)

(it's relatively easy to remember those commands)

~~~
0x0
The main purpose of this "pixz" appears to be its chunking of the compressed
data so that is it partially decompressible (i.e. random access). "xz" has
-T/\--threads= already for multithreaded processing (although it does seem
like pixz has a different default value "all cores" instead of "1 thread") .

~~~
vasi
xz in multithreaded mode supports random access too, at least theoretically.
But there's no reasonable way with xz to actually find the file in a tarball
you want to access, it's that bit that pixz provides.

Another nice thing about pixz is it does parallel decompression, as well as
compression.

(Disclaimer: I'm the original author of pixz.)

~~~
mgerdts
I was thinking about that "no reasonable way" comment. When you uncompress the
first block, you will find the first tar header. From that you can know the
uncompressed offset of the next tar header. If the compressed stream does
support random access, you should be able to uncompress a block (assuming
uncompressed block size was a multitple of 512 bytes) to get to the next tar
header. You can repeat this until you get to the file you are looking for.

With large files, this approach would be of huge value. If the files tend to
be no larger than block_size - 512, there will be no speedup.

Of course, this would need to be implemented directly in tar, not by piping
the output of a decompression command through tar.

------
bpchaps
Why doesn't parallelized gzip get more attention? For larger files, it can be
quite a pain to see a multicore machine sit mostly idle. Something like this
works pretty well, but I've never seen it done outside of my own stuffs:

split -l1000000 --filter='gzip > $FILE.gz' <(zcat large_fille.txt.gz)

------
lateralux
I'm using pbzip2 to solve this problem.
[http://compression.ca/pbzip2/](http://compression.ca/pbzip2/)

~~~
Slartie
I was doing that too, until I tried lbzip2 (
[http://lbzip2.org/](http://lbzip2.org/)). Does basically the same, also a
drop-in replacement, but is even faster by a noticeable amount!

------
axelfontaine
Previous discussion 6 years ago:
[https://news.ycombinator.com/item?id=1233317](https://news.ycombinator.com/item?id=1233317)

------
dorfsmay
If you know that what you compress is alphabetised text, and space is more
important to you than time, then please use lzips over gzips. For a parallel
version:

[http://www.nongnu.org/lzip/plzip.html](http://www.nongnu.org/lzip/plzip.html)

~~~
0x0
Is this gunzip-compatible? If not, then as long as gzip remains the only
viable/compatible compression mode for HTTP I think advances in gzip
compression is still valuable.

~~~
dorfsmay
No it's not. Lzip is more useful for storing text long term, or send very
large text file as email attachments.

Text log file and database dump in text usually generate very large text file,
lzip can reduce their size by a factor of 5 to 10, compared to gzips.

------
pselbert
It's amazing to see what somebody like Mark Adler can achieve when they focus
on a specific niche. You often hear about the importance of specialization
when positioning a business, but it isn't mentioned as much for individuals.
His work serves as an ideal case of specialization—if I wanted to ask an
expert for advice on compression, or was looking for an outside expert on
compression that is immediately who I'd think of.

------
discreditable
You can also use 7-zip for multithreaded gz/xz/bz2/7z/zip compression.

------
sshaginyan
Why not just use GNU parallel?

`parallel gzip ::: file1 file2`

I wish there was a standard for this type of stuff. So that an app will check
for existing child spawns and act accordingly (IPC).

~~~
0x0
That doesn't help if you have only one big file or stream.

~~~
ole_tange
cat bigfile | parallel --pipe -k gzip > out.gz

parallel --pipepart -a bigfile -k gzip > out.gz

The last is faster.

------
teej
I use this extensively in schlepping data around for analytics work. Any data
that moves over the network going into or out of a data store gets compressed
with pigz.

------
haughter
How does the strategy for parallelization of gzip in this project differ from
the strategy used for LZMA2 parallelism as implemented by 7z?

------
slashcom
[http://lbzip2.org/](http://lbzip2.org/) also exists for bzip2. It works
really well, especially since bz2 files are split into discreet blocks which
can be un/compressed independently.

------
lmeyerov
We use this in production at Graphistry, super useful for latency-sensitive
dynamic media applications! (In this case, we needed to add an auto-tuner +
node bindings.)

------
axelfontaine
The big challenge seems to be parallel gunzip as that requires a special
stream with an index to work, with no general purpose solution available so
far.

------
Hydraulix989
Probably useful for servers more than anything where perf really does matter
at scale and content type is gzip.

~~~
MichaelGG
Web servers have multiple connections so this isn't so applicable. Maybe
special purpose ones zipping large datasets on the fly?

~~~
lmeyerov
We combine both strategies for content: multicore native encoding + decoding
:)

------
noipv4
For bzip2 lovers the tool is pbzip2 ;)

------
donatj
Huh. I was living under the incorrect assumption that gzip was inherently
single threaded

------
abhishivsaxena
Any ideas how this would compare to gzip while on a Microserver? I'm thinking
of Atom C2750 bare metal from packet.

~~~
opcenter
Looks like that has 8 cores? I would imagine it would make a huge difference
depending on other loads on the system. I gave it a try on my file server at
home that has an AMD E-350 processor with 2 cores and it shaved off a good 42%
of the total time:

    
    
      > time gzip -v xubuntu-16.04-desktop-amd64.iso 
      xubuntu-16.04-desktop-amd64.iso:	  1.5% -- replaced with xubuntu-16.04-desktop-amd64.iso.gz
      gzip -v xubuntu-16.04-desktop-amd64.iso  119.07s user 4.66s system 98% cpu 2:05.22 total
    
      > time pigz -v xubuntu-16.04-desktop-amd64.iso           
      xubuntu-16.04-desktop-amd64.iso to xubuntu-16.04-desktop-amd64.iso.gz 
      pigz -v xubuntu-16.04-desktop-amd64.iso  128.19s user 6.64s system 184% cpu 1:12.97 total

