
Happy new year and goodbye bzip2 - Morgawr
https://www.kernel.org/happy-new-year-and-good-bye-bzip2.html
======
jaytaylor
Good riddance; for text files, gzip is mediocre compression and fast, lzma
(xz) is high compression and slow. Bzip2 is medium-high compression and
_really_ slow [0].

[0]
[http://tukaani.org/lzma/benchmarks.html](http://tukaani.org/lzma/benchmarks.html)

~~~
pedrocr
Your link shows bzip2 having a more balanced compression time than lzma that
is often 3-4 times slower. So for a tarball xz sounds indeed better as you
compress once and uncompress many times. For compressing HTTP or things of the
sort bzip2 might be better, but maybe for those cases gzip is already best.

~~~
fiffig
Here is another with lz4, lzo as well as lzma1, although they're not very
relevant for the tarball test. Gzip -9 is on par with bzip2 -1 while almost
twice as fast. And uses practically no memory(!).

[http://pokecraft.first-
world.info/wiki/Quick_Benchmark:_Gzip...](http://pokecraft.first-
world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO)

~~~
acdha
-1 is an edge-case: if you're memory constrained, bzip2 isn't a good fit. Otherwise, you pay the extremely low percentage increase in CPU time to use -9 where bz2 will massively outperform gzip on most workloads.

------
itry
I always was fascinated by compression. It feels like no big progress was ever
made in lossless compression. How would todays best compression algos compare
to the ones used during the times of the C64? And compared to the first zip
algorithm implemented in PKZIP?

~~~
Sharlin
Well, there are clear information theoretic limits on how much you can
compress something, so any improvements are necessarily going to be
incremental and converging to the theoretical limit. The more special-purpose
the compression algorithm, the more assumptions it can make, the better it can
perform on the sort of data it's designed for, at the expense of behaving
poorly in the general case. Indeed, it can be shown that _any_ possible
lossless compression algorithm, on average, will yield an output _larger_ than
its input; it's just that most "interesting" data contains plenty of
redundancy that's reliably compressible.

~~~
Someone
Also, it often is better to measure an improvement as relative to the maximum
improvement possible, rather than as a percentage of the file size.

For compression, let's say that the best possible compressor compresses a file
to 30% of its size, and the current compressor reaches 50%. Then, an
improvement to 45% should not be seen as 'only 5%', or as '10% smaller', but
as '25% of the maximum possible improvement'. A follow-on step that gets you
to 40% would be a larger improvement of 33%.

That, IMO, is a reasonable way to somewhat compensate for the fact that the
low hanging fruit gets plucked by those who come first.

And yes, there is a problem there. That 'best possible compressor',
theoretically, can produce extremely small files. Maybe your Wikipedia dump
happens to be equal to the binary expansion of sin(1/e + 34/sqrt(PI)) to a few
billion digits, but how are you going to find out? So, for most files, we
don't really know what that best compression is.

------
renownedmedia
With typical files, XZ Utils create 30 % smaller output than gzip and 15 %
smaller output than bzip2.

[http://tukaani.org/xz/](http://tukaani.org/xz/)

------
Myrth
Next up: lrzip?

[http://www.techradar.com/us/news/software/applications/best-...](http://www.techradar.com/us/news/software/applications/best-
linux-compression-tool-8-utilities-tested-933098/2)

------
bcrack
The most appropriate compression algorithm to use is highly case-dependent
(data structure, compression vs decompression cpu/wall-clock time, compression
ratio, system memory,bandwidth, etc.). I'm sure people at kernel.org have
their reasons to convert to xz.

The following link has some interesting (albeit non-exhaustive) benchmarks,
comparing various compression algorithms (both serial and parallel
implementations).
[http://vbtechsupport.com/1614/](http://vbtechsupport.com/1614/)

------
xuhu
Why ditch bzip2 and not gzip ? It was clearly for storage space.

~~~
clarry
For compatibility and speed, gzip is a winner. It's fast and it's everywhere
(plus it compresses text well enough that it makes sense to use in place of
non-compressed files).

For strong compression, there are better options than bzip2.

In a sense, bzip2 is "worse of both worlds".

~~~
ksec
Tl;dr, Thx for the summary

------
thaJeztah
XZ sounds interesting. Anybody know if work is in progress to add XZ support
in webbrowsers as an alternative to gz compression? Any webserver already
supporting xz?

------
slashdotaccount
I used to create WinRAR backup archives with recovery record or recovery
volumes for DVD backups. Are there other (free and open) archive formats with
such technologies? Or maybe stand-alone tools?

~~~
happyhappy
If I understand your question correctly, you should look into PAR2

------
d0ugie
did someone say WebP? :)

------
manuw
Did the bzip maintainer steal someones girlfriend or anything else?

