
Stop using gzip - imoverclocked
http://imoverclocked.blogspot.com/2015/12/for-love-of-bits-stop-using-gzip.html
======
geofft
The trouble with this is that, as a software author, _it doesn 't really
matter_ if it takes 70 seconds instead of 33 to install my software. 70
seconds is fast enough, for someone who's already decided to start downloading
something as involved as Meteor; even if it took one second it wouldn't get me
more users. And it would have to take over 5-10 minutes before I start losing
users.

On the other hand, having to deal with support requests from users who don't
have any decompressor other than gzip will cost me both users and my time.
Some complicated "download this one if you have xz" or "here's how to install
xz-utils on Debian, on RHEL, on ..." will definitely cost me users, compared
to "if you're on a UNIXish system, run this command".

From a pure programming point of view, sure, xz is better. But there's nothing
convincing me to make the _engineering decision_ to adopt it. The practical
benefits are unnoticeable, and the practical downsides are concrete.

~~~
rileymat1
Is it likely that a user has gzip on a system but not tar itself? From the
article:

What about tooling? OSX: tar -xf some.tar.xz (WORKS!) Linux: tar -xf
some.tar.xz (WORKS!) Windows: ? (No idea, I haven't touched the platform in a
while... should WORK!)

~~~
zorked
Tar does not implement decompression. If you don't have xz installed it won't
work.

~~~
masklinn
That doesn't seem correct, on osx 10.11 `tar xf` can extract `.tar.xz` yet
doesn't fork an xz. AFAIK 10.11 doesn't even come with xz.

~~~
jessaustin
tar can link the xz lib without forking.

~~~
masklinn
So tar does "implement decompression" (and compression, by delegating the work
to libarchive) and it can work even "if you don't have xz installed".

~~~
jessaustin
It would require liblzma, but you are correct that the library is a separate
thing from the executable xz.

~~~
masklinn
> It would require liblzma

Yep, in the same way it requires libz and libbz2.

------
cellularmitosis
I'd argue that bzip2 is a better example of a compression algorithm which no
one needs anymore.

Considering these features:

    
    
      * Compression ratio
      * Compression speed
      * Decompression speed
      * Ubiquity
    

And considering these methods:

    
    
      * lzop
      * gzip
      * bzip2
      * xz
    

You get spectrums like this:

    
    
      * Ratio:    (worse) lzop  gzip bzip2  xz  (better)
      * C.Speed:  (worse) bzip2  xz  gzip  lzop (better)
      * D.Speed:  (worse) bzip2  xz  gzip  lzop (better)
      * Ubiquity: (worse) lzop   xz  bzip2 gzip (better)
    

So, xz, lzop, and gzip are all the "best" at something. Bzip2 isn't the best
at anything anymore.

~~~
randerson
bzip2 can take advantage of any number of CPU cores when compressing.

~~~
hrez
Really? How? My bzip2 has no option for it and when tested it stuck to one
CPU. xz on the other hand has

    
    
      -T, --threads=NUM   use at most NUM threads; the default is 1; set to 0
                          to use the number of processor cores

~~~
mappu
Side note: `xz` only got -T option in stable releases less than 12 months ago
(5.2.0 in 2014-12-21), so it hasn't made it into every distro yet.

------
gizmo
One of the great things about gz archives is that the --rsyncable flag can be
used to create archives that can be rsynced efficiently if they change only
slightly, such as sqldumps and logfiles. Basically the file is cut into a
bunch of chunks, and each chunk is compressed independently of the other
chunks. xz doesn't seem to have an equivalent feature because the standard
implentation isn't deterministic[1].

Changing from one compression format to another seems harmless, but it always
pays to think carefully about the implications.

[1]:
[https://www.freebsd.org/cgi/man.cgi?query=xz&sektion=1&manpa...](https://www.freebsd.org/cgi/man.cgi?query=xz&sektion=1&manpath=FreeBSD+8.3-RELEASE)

~~~
mappu
The `--rsyncable` patch never got upstreamed, and in recent debian the feature
is totally broken (rsync needs to transmit ~100% of the file again).

`pigz` has a similar flag that works reliably, though.

~~~
kevinoid
Are you referring to [https://bugs.debian.org/cgi-
bin/bugreport.cgi?bug=708423](https://bugs.debian.org/cgi-
bin/bugreport.cgi?bug=708423) ? If so, it was fixed in December 2013 and
doesn't affect the current Debian release (Jessie), although it does affect
the previous release (Wheezy) and there is an open request to backport the fix
[https://bugs.debian.org/cgi-
bin/bugreport.cgi?bug=781496](https://bugs.debian.org/cgi-
bin/bugreport.cgi?bug=781496)

------
jzwinck
There are many more concerns to address than just compression ratio. Even the
ratio one is questionable, because some people have really fast networks but
we all have basically the same speed of computers. So a 4x CPU time and memory
pressure penalty may be much worse on a _system_ than a 2x stream size
increase. Another use case is a tiny VM instance: half a gigabyte of RAM is
not actually present in every machine today. Embedded, too.

Another way compression formats can win you much more than a 2x space
reduction is by supporting random access within their contained files. Gzip
sort of supports this if you work hard at it. Xz and bzip2 appears similar
(though the details are different). I achieved a 50x speedup with this in real
applications, and discussed it a bit here:
[http://stackoverflow.com/questions/429987/compression-
format...](http://stackoverflow.com/questions/429987/compression-formats-with-
good-support-for-random-access-within-archives)

~~~
acqq
Thanks for the random access discussion!

And you are right for embedded! .xz just doesn't work there.

I've also found that on the faster systems, for different uses of mine, when I
want the compression to last as little as possible and the total round trip
time matters (compression and decompression), gzip -1 gives the best resulting
size for the reasonably short time I want to spend.

~~~
geographomics
I've come across quite a lot of firmware on embedded Linux devices that uses
LZMA (the xz compression algorithm) to compress the kernel, u-boot, and/or
filesystems. One memory optimisation for these, as they are typically being
decompressed straight into RAM, is for the decompressor to refer to its output
as the dictionary rather than building a separate one, as would be the case in
decompressing to the network or disk.

------
LeoPanthera
He didn't mention the biggest difference between gzip and xz - ram usage. At
maximum compression, you need 674 MiB free to make a .xz file, and 65 MiB to
decompress it again. That's not much on most modern systems, but it's quite a
lot on smaller embedded systems.

Admittedly, in most cases, that isn't much excuse though.

~~~
gizmo
It can also lead to disaster on a web server when linux decides to OOM kill a
critical part of the infrastructure like the database server or memcached.
Then you can get a cascading problem of services failing, all because of a
careless unzip statement. (I've been there.)

~~~
phaemon
You can set exclusions for the OOM killer to prevent this. See:

[http://backdrift.org/oom-killer-how-to-create-oom-
exclusions...](http://backdrift.org/oom-killer-how-to-create-oom-exclusions-
in-linux)

~~~
Spooky23
....or just use gzip and get 90% of the value minus the high probability that
that setting will be fubar at an inconvenient time.

------
hdmoore
Summary: Compatibility and decompression speed is more important than
compression ratios for many use cases. Gzip is nearly universal, where lz4,
xz, and parallel bzip2 are not.

The challenge of sharing internet-wide scan data has unearthed a few issues
with creating and processing large datasets.

The IC12 project[1] used zpaq, which ended up compressing to almost half the
size of gzip. The downside is that it took nearly two weeks and 16 cores to
convert the zpaq data to a format other tools could use.

The Critical.IO project[2] used pbzip2, which worked amazingly well, except
when processing the data with Java-based tool chains (Hadoop, etc). The Java
BZ2 libraries had trouble with the parallel version of bzip2.

We chose gzip with Project Sonar[3], and although the compression isn't great,
it was widely compatible with the tools people used to crunch the data, and we
get parallel compression/decompression via pigz.

In the latest example, the Censys.io[4] project switched to LZ4 and threw data
processing compatibility to the wind (in favor of bandwith and a hosted search
engine).

-HD

1\.
[http://internetcensus2012.bitbucket.org/images.html](http://internetcensus2012.bitbucket.org/images.html)
2\. [https://scans.io/study/sonar.cio](https://scans.io/study/sonar.cio) 3\.
[https://sonar.labs.rapid7.com/](https://sonar.labs.rapid7.com/) 4\.
[https://censys.io/](https://censys.io/)

------
bitwize
Me, I wish people would stop using RAR. It's proprietary and doesn't have a
real compression advantage vs. e.g., 7-Zip, bzip2, or xz.

------
ak217
For anyone looking to stop making compromises, I recommend pixz. It's binary
compatible with xz, and is better at compression speed, decompression speed,
and ratio than both gzip and xz on multicore systems. I've adopted it in
production to great benefit.

~~~
justinmayer
Totally agree with this. As someone with a commit bit to the project, as well
as a long-time user, I'd like to second the recommendation. Pixz is a terrific
parallel XZ compression/expansion tool. I find it indispensable for logs and
database backups. Link:
[https://github.com/vasi/pixz](https://github.com/vasi/pixz)

Fish shell users can take advantage of the Extract and Compress plugins I
wrote, which utilize Pixz if installed:
[https://github.com/justinmayer/tackle/tree/master/plugins/ex...](https://github.com/justinmayer/tackle/tree/master/plugins/extract)

------
KirinDave
Being a windows user these days, I am getting kinda frustrated with how anemic
everyone is at even trying to google for 20s to find the windows solution.

7zip is the program you want to handle most everything, with both gui and
command line options: [http://www.7-zip.org/](http://www.7-zip.org/)

Given how radically MS is trying to reform itself to be an open-source
friendly company and how ineffectually inoffensive they've been the last 5
years, can we at least try and throw them a bone or two?

~~~
justinmayer
The article is not talking about Windows, so folks here aren't either. Why are
you surprised that folks are uninterested in Windows?

I've preferred Mac systems for longer than most of the HN crowd has been
alive, so I understand what it's like to feel ignored and in the minority. For
years, Mac users were treated as pariahs. The tables have turned, and as
someone who has been in your situation, I should have great empathy for your
predicament.

And I do, but your tone in general -- and your last sentence in particular --
makes it very hard to empathize. Microsoft used its near-monopoly status to
stifle innovation for years, and many of us have figurative scars that will
never heal. You seem to think that they are making great strides (while I see
them as half-hearted overtures), but either way I'm not about to "throw them a
bone." Their decades of misdeeds, in my eyes, will not be expiated so easily.

Perhaps Microsoft will someday be worthy of forgiveness, either from the
perspective of morality (e.g., Mozilla) or product excellence (e.g., Apple).
Until that day, Microsoft will continue to reap what they have sown, given no
more attention than they have earned.

~~~
KirinDave
> And I do, but your tone in general -- and your last sentence in particular
> -- makes it very hard to empathize. Microsoft used its near-monopoly status
> to stifle innovation for years, and many of us have figurative scars that
> will never heal. You seem to think that they are making great strides (while
> I see them as half-hearted overtures), but either way I'm not about to
> "throw them a bone." Their decades of misdeeds, in my eyes, will not be
> expiated so easily.

I feel like Apple's has forgotten what was important, created then ruined a
market, and lost everything that made it interesting (long before Steve Jobs
passed away, by the way). Which cuts all the more deeply because back in the
early 2ks they were walking the walk and taking a lot from NeXT's culture of
developer friendliness. I grew up deeply invested in Macs and NeXT, which
makes the realization painful, but... Apple wants to annihilate maker culture
as it monetizes its platform. It's also stopped caring about design on a grand
scale, instead appealing to very shallow notions of "visual simplicity".

That's all gone now, and they're consequently useless to me. I'd rather
patronize a company currently doing the right thing after a troubled past than
pretend a previously aligned company was still there.

It should be very telling that Apple AND Google's flagship hardware
announcement of 2015 was something that Microsoft has been doing for years.

And if Microsoft suddenly goes evil again? Fuck them, I'll drop them and move
somewhere else. Not Linux, unless the distros pull their act together, but I'm
sure a competitor will emerge. Or I'll make one.

~~~
justinmayer
You make some good points here, which I completely understand. Hopefully the
future will bring better options for us all.

~~~
KirinDave
I know your feeling too, thanks for accepting I feel differently. And
sometimes I ask myself "What the hell am I doing with this Surface book?" I
won't pretend I don't have doubts.

It's sort of a rough time for devs right now even as we enjoy unprecedented
prosperity and recognition. Big businesses are attempting to monetize and
control every aspect of developers.

------
orionblastar
When I owned an Amiga they kept on changing the archive format to find a
better one that saved space.

The had arc, pak, zip, zoo, warp, lharc, and every Amiga BBS I got on used a
different archive format. Everyone had a different opinion on which archive
format compressed things in the best way.

I think eventually they decided on lharc when they started to put PD and
shareware files on the Internet.

Tar.gz is used because there are instructions for it everywhere and it seems
like a majority of free and open source projects archive in it. It is a more
popular format than the others right now. Might be because it is an older
format and had more ports of it done.

But I really like 7zip, it seems to compress smaller archives, before 7Zip I
used to use RAR but WinRAR wasn't open source and 7Zip is so I switched.

With high speed Internet it doesn't seem to matter much anymore unless the
file is in over a gigabyte in size. Even then Bit Torrent can be used to
download the large files. I think BitTorrent has some sort of compression
included with it if I am not mistaken. To compress packets to smaller sizes
over the torrent network and then resize them when the client downloads them.
That is if compression is turned on and both clients support it.

~~~
cesarb
> When I owned an Amiga they kept on changing the archive format to find a
> better one that saved space.

It happened on DOS too: ZIP, ARJ, RAR, ...

That was back on the days of floppy disks (which usually had at most 1440 KiB)
and small hard disks (a few tens of megabytes). Even a few kilobytes could
make a huge difference.

As storage and transfer speeds grew, "wasting" a few kilobytes is no longer
that much of an issue, and other considerations like compatibility become more
important. Furthermore, many new file formats have their own internal
compression, so compressing them again gains almost nothing regardless of the
compression algorithm.

The reason both ZIP and GZIP became ubiquitous is, IMO, that the compression
algorithm both use (DEFLATE) was released as guaranteed to be patent-free,
back in a time where IIRC most of the alternatives were either patented or
compressed worse. As a consequence, everything that needed a lossless
compression method chose DEFLATE (examples: the HTTP protocol, the PNG file
format, and so on).

------
anonova
> So, who _does_ use xz?

Arch Linux started using lzma2 compression for their packages nearly 6 years
ago!

[https://www.archlinux.org/news/switching-to-xz-
compression-f...](https://www.archlinux.org/news/switching-to-xz-compression-
for-new-packages/)

~~~
pdkl95
It's very common to see xz files in Gentoo as well.

    
    
        ls /usr/portage/distfiles/      \
          | sed 's/.*[.]//g'            \
          | sort | uniq -c | sort -n -r \
          | head -n 6
    
           3377 gz
           3051 xz
           1656 bz2
            295 zip
            194 tgz
            107 jar

------
samstokes

        OSX: tar -xf some.tar.xz (WORKS!)
        Linux: tar -xf some.tar.xz (WORKS!)
    

I had no idea tar could autodetect compression when extracting. (I wonder if
this is GNU tar only, or whether the OSX default tar can do it too?) I've been
typing `tar zx` or `tar jx` for too long.

~~~
pdkl95
I highly recommend using atool[1], and never worrying about extracting
archives again. It's a wrapper around basically every compression/archive tool
in remotely common use.

Bonus: it decompresses to a safely-named subdirectory, but moves the contents
of that subdirectory back to the current directory if the archive contained
exactly one file. Highly convenient without any risk of accidentally expanding
1000 files into the current directory.

After creating this macro, I've basically never had to care about how to
decompress/unarchive anything.

    
    
        # 'x' for 'eXpand'
        alias xx='command atool -x'
    
        # use
        % cd $UNPACK_DIR    # (optional) (can be the PARENT dir)
        % xx foo.zip        # or .tar.{gz,xz} or whatever
        foo.zip: extracted to `foo' (multiple files in root)
        % cd foo/
        % ls | wc -l 
        3
    

atool actually has many other useful features, but it's worth it just for the
extractor.

[1] [http://www.nongnu.org/atool/](http://www.nongnu.org/atool/)

~~~
chx
Looks awesome by description but last release is from 2012 which makes me
worried slightly.

~~~
pdkl95
Old doesn't mean bad. Sometimes, it means "finished".

About the only thing that a small would need to be updated at this point is
support for a new compressor. (that 2012 release mainly added suppport for
plzip)

~~~
lqdc13
There are usually issues in compatibility between linux zip utility and the
mac one. Has to do with zip not being backward compatible.

How does atool deal with cases where there are two versions of the same
extractor?

~~~
pdkl95
I'm not familiar with that issue, but you can set the path to any extractor in
/etc/atool.conf or ~/.atoolrc

    
    
        # ~/.atoolrc
        path_zip /path/to/preferred/bin/zip
    

See atool(1) for details.
[http://linux.die.net/man/1/atool](http://linux.die.net/man/1/atool)

As far as I know, different zip formats are not auto-detected. However, it
does (optionally) use file(1) to detect the file format, which can be
overridden with the 'path_file' option, so a hack may be possible?

------
bhouston
I wish lzma (xz) was integrated into the browser and curl as an Accept-
Encoding. Would be amazing for us (clara.io), and I am sure a lot of others.

~~~
wmf
Browsers are getting Brotli which is comparable to xz:
[https://groups.google.com/a/chromium.org/forum/#!msg/blink-d...](https://groups.google.com/a/chromium.org/forum/#!msg/blink-
dev/xdVm8c2GOMQ/DsIZc8mhkPcJ)

~~~
bhouston
My tests with brotli is that is it overrated - it is slow and has poor
compression ratios compared to xz. It confuses me why it is being pushed so
hard...

[https://github.com/google/brotli/issues/165](https://github.com/google/brotli/issues/165)

~~~
wmf
Eh, Google gave one example where Brotli does well and you gave one where it
does poorly; we're not exactly in science territory here.

~~~
kijin
Yeah, the example given by GP involves large binary streams. Brotli was
designed for small text documents with lots of English words in them, as we
often see on the web.

~~~
zurn
Where does it say that Brotli is for small English text documents? I didn't
see anything like that in the draft spec or the Google blog post.

The spec doesn't say much on the subject but has this item in the Purpouse
section: "Compresses data with a compression ratio comparable to the best
currently available general-purpose compression methods and in particular
considerably better than the gzip program"

~~~
wmf
Brotli includes a built-in dictionary that contains a lot of English words,
HTML tags, etc. so it will give better compression for that kind of input.

~~~
zurn
Yes, it has that optimization for short data (though it's not restricted to
English), but the PR and specs say it's still meant to be a general purpouse
compressor. And it does very well on most types of large data.

------
sbuttgereit
I think this is one of those things where the author is pretty much 100% right
and it just won't happen. Habits are hard to break and in many cases, the
negatives just don't impose a high enough cost to matter.

There are times when I do seriously look for the optimum way to do things like
this and then there's most of the time I just want to spend brain cycles on
more important problems.

~~~
yborg
The author is not 100% right, as is always the case with this, it depends on
the data you are compressing. Here is a stackexchange with some relevant
experiments: [https://superuser.com/questions/581035/between-xz-gzip-
and-b...](https://superuser.com/questions/581035/between-xz-gzip-and-
bzip2-which-compression-algorithim-is-the-most-efficient)

I believe that the biggest driver of using old-school ZIP or GZIP is the fact
that everyone knows that everything can decompress these formats. And in a
modern world of terabyte disks in every laptop, multicore multi-Ghz CPUs, and
megabit bandwidth, it isn't worth the effort of using a format that saves an
additional 20% on compressed size at the cost of someone not being able to
decompress it.

~~~
bhouston
On typical source trees and mesh data xz is in the range of twice at good at
compressing than gzip. That is very significant imo.

~~~
barrkel
That's only really much good if you're in the business of archiving things
like that. For most people, source trees are ad-hoc downloads for patch fixes,
oddball platform compiles, etc. And then the universality of gzip is better
than any marginal space savings from xz.

------
jmspring
I mentioned it when it came up on another thread. Compare apples and apples --
use one of the standard corpuses when running bench marks.

Ian Witten put together the Calgary corpus -
[https://en.m.wikipedia.org/wiki/Calgary_corpus](https://en.m.wikipedia.org/wiki/Calgary_corpus)

------
profquail
Windows users: 7-zip can extract .xz files should you need to (article didn't
mention a Windows solution).

~~~
mappu
Although 7-Zip can't "look through" a .tar.{x,g}z file - browsing a .tar.xz
will require fully decompressing the .tar to a temporary location.

~~~
Redoubts
Tar isn't great about any of those files either, it's just building a list
while decompressing to /dev/null

[http://serverfault.com/q/59795](http://serverfault.com/q/59795)

------
mkj
Which version of RHEL does "Linux" include? The world isn't all Ubuntu recent
releases.

~~~
bbatha
RHEL 6 doesn't include it. So that's most of enterprise deployments...

~~~
mrmondo
It's such a shame that so many slow moving 'enterprises' still have RHEL 6
servers, it's so incredibly outdated - not only does it limit what they can
do, but it negatively affects peoples impressions of linux.

------
Phemist
For me, as a Python user, I've found that gzip is currently the only
compression format that allows streaming compression/decompression. I don't
want to have to store hundreds of gigabytes of data and THEN compress it,
rather than compressing it right during file generation. I haven't found any
other compression lib that supports this out of the box.

~~~
Tiksi
I generally use gzip for everything because it's everywhere and good enough,
but xz and bzip also support streaming, in fact anything that tar compresses
does afaik.

    
    
        > dd if=/dev/urandom bs=1M count=5 | gzip > test
        5+0 records in
        5+0 records out
        5242880 bytes (5.2 MB) copied, 0.438033 s, 12.0 MB/s
    
        > dd if=/dev/urandom bs=1M count=5 | xz > test
        5+0 records in
        5+0 records out
        5242880 bytes (5.2 MB) copied, 1.52744 s, 3.4 MB/s
    
        > dd if=/dev/urandom bs=1M count=5 | bzip2 > test
        5+0 records in
        5+0 records out
        5242880 bytes (5.2 MB) copied, 0.804324 s, 6.5 MB/s

~~~
Ded7xSEoPKYNsDd
The algorithms support streaming, but that doesn't mean implementations in (in
this case) Python libraries still might not do so. Although I can't understand
why they wouldn't, presumably they just wrap the same C libraries everybody
else uses, and a streaming interface wrapping another stream (i.e. file,
socket) should feel very natural and easy.

------
jph
I prefer compressing with gzip because it's on more systems, works well even
with low RAM, and enables fast rsync for updates.

------
daemonk
Decompressing takes 4 times as long? I wonder if that is slow enough to create
a bottleneck in processing. Not everyone uses compression for purely archival
purposes. In the genomics field, most sequencing data are gzipped to save disk
space. And most programs used to process the sequencing data can take in the
gzipped files directly.

~~~
faho
There might still be use cases left for gzip, but this article is specifically
about software tarballs. And in that case, I have to agree with it.

------
jack9
geezip is fun to say. Until there's a catchy name for "crosszip"/xz/whatever,
I think we're preaching to the wrong choir. There's a human element in
toolchains. Address it.

------
zipzipzipzip
it's a shame that algorithm improvements would necessitate a shift away from
the name "gzip". It would be better if the intent to compress/decompress was
orthogonal to the features of the implementation (compression ratio, speed,
split-ability, etc...)

------
zbuf
The article misses (but the comments here touch on) that all compression
algorithms have a built in obsolescence, even the fancy shiny xz.

It's not the algorithm per-se that go obsolete, but their use in specific
cases until all are diminished. Whether lossy or lossless, eventually other
technological advancements renders them unnecessary.

And it seems that strongest algorithm is usually the earliest to be widely
adopted; these are almost never toppled.

Just like .gz, look at MP3 or JPEG -- 'better' alternatives exist, but the
next widely adopted step will be to eliminate that compression entirely. The
first radio station playout systems were hardware MPEG audio compression, and
the next most widespread step was to uncompressed WAVs. Even video pipelines
based on uncompressed frames are becoming more widespread. Eventually the
complexity and unpredictability of compression is shunned for simplicity.

Read the gzip docs and the focus is around compression of text source code, a
key use case at the time but barely considered these days -- tar.gz source
archives exist almost only out of habit; they could just as well be tar.

~~~
yaur
Media codecs are a little different because there is a significant cost to
replacing hardware that only supports the old standard. It seems to me that
AAC is becoming pretty ubiquitous and probably will be the go to standard for
the next 10-20 years. We are also at a point where the vast majority of users
aren't going to notice the difference between a 100kbps MP3 vs AAC stream,
which is going to be less than 1% of an he stream, so there is little
incentive to innovate by the players with the deepest pockets. Until network
capacity is free (or at least cheap compared to to the cost of CPE) pipelines
based on uncompressed media are not going to be a thing outside of the
production end of things.

For source tar balls though there is basically 0 cost to switching. Since you
can download a new compressor in under a minute and should be able to assume
that your users are pretty sophisticated. The incentives are similar to media
in that cost transfer has to be weighed against the cost of CPE, except that
since users supply the CPE the cost is effectively 0 and compression will
probably always make sense.

------
faragon
gzip is fast, gzip -1 is even faster, gzip has low memory requirements, gzip
is widely adopted. Those are the reasons of gzip is still being used, and why
gzip has future. I.e. the gzip "ecosystem" is rich and useful, despite not
being the best compressor in terms of compressed size.

P.S. There are gzip-compatible implementations with tiny per-connection
encoding memory requirements (< 1KB).

------
skywhopper
tl;dr: xz compresses better but is significantly slower. This isn't the
deepest analysis of potential tradeoffs you might be able to find.

A few reasons why gzip is still useful to have around:

* Speed is critical for many applications, and so size can take a backseat when performance is critical or resources are low.

* gzip is basically guaranteed to be available everywhere in utility and library forms.

* Download speeds vary and so the faster your pipe, the less the archive size factor will matter, and the faster-worse compression might win out in other comparisons.

* xz doesn't compress every type of data this much better than gzip. I've dealt with scenarios where the difference is consistently less than 2%, and the extra time xz spends is actually a tremendous waste.

Sure, for package downloads where xz files will be significantly smaller it
makes sense to save the bandwidth, time and storage space. But it's not 100%
cut and dry.

------
InclinedPlane
Ahh, kids. "Let's all start adopting this new thing that has existed for a few
years that's slower and uses more ram because we're wasting literally _tens_
of megabytes all the time!"

~~~
outworlder
Tens of megabytes, times the number of people downloading it, can get pretty
significant pretty fast. And costly.

~~~
InclinedPlane
Fortunately, the same is not true of time or memory.

~~~
chris_wot
On the server, it would increase the amount of time and memory required.

~~~
InclinedPlane
Oh, right, that seems obvious. Wonder why it's not obvious to everyone...

~~~
chris_wot
_Very_ obvious.

------
jeffdavis
I don't see any reason the author made the jump from "xz is better than gzip"
to "stop using gzip!".

What is the compelling thing here that makes him feel that this is a moral
imperative?

------
imslavko
Can node-tar untar .tar.xz? This is what Meteor uses to parse tarballs.

[https://github.com/npm/node-tar](https://github.com/npm/node-tar)

~~~
voltagex_
Correct me if I'm wrong but there doesn't look to be any kind of compression
supported.

------
Animats
Amusingly, "news.ycombinator.com" serves its pages with .gz compression. Even
if you send an HTTP header that demands plain text only.

~~~
vive-la-liberte

        $ ncat -C --ssl news.ycombinator.com 443 <<EOF
        GET / HTTP/1.1
        host: news.ycombinator.com
        
        EOF
    

Works for me, no compression. Maybe you messed something up or maybe there's a
non-compliant proxy between you and the rest of the internet?

~~~
chris_wot
What if you put in:

    
    
      accept-encoding: identity

~~~
vive-la-liberte
Uncompressed still.

------
djhworld
I asked this question on StackExchange a few months ago
[http://unix.stackexchange.com/questions/183465/fastest-
way-o...](http://unix.stackexchange.com/questions/183465/fastest-way-of-
working-out-uncompressed-size-of-large-gzipped-file)

That's my problem with GZIP, in this particular use case anyway.

------
fz7412
If I'm sending contents of a website to client in .xz format, will browsers be
able to decompress it?

~~~
masklinn
No.

------
bshanks
There's also lzip, which apparently uses a similar compression algorithm as xz
but is apparently built with partial recovery of corrupted archives in mind
(so more useful for long-term archival or backup storage). It's made by the
same guy who made ddrescue.

------
otterley
The only people who care about compression ratios are:

(1) People who still use 56k modems to download content

(2) People who host extremely popular downloads and who want to minimize their
outbound bandwidth bills

If you're not one of these two, you almost certainly care more about
compatibility and compression time than compression ratio. gzip continues to
win on both those fronts, and it explains why it's still the most popular
compression format other than ZIP (which is a better choice than gzip if you
frequently need to extract a single file from a compressed archive).

Until there's a compression tool released that can compress at wire speed
(like gzip) _and_ has a significantly better compression ratio, don't expect
the landscape to change much.

------
ultim8k
Aha! Thanks for sharing.

------
JustSomeNobody
Is xz greppable?

~~~
vectorjohn
No. Neither is gz. In fact, how could any compression algorithm possibly be
greppable?

~~~
labster
Apparently it is possible, as zgrep can search gzipped files.

~~~
pdkl95
The 'xzgrep' script (and related xzdiff, xzless, xzmore scripts) are part of
the standard xz package, though they are an optional feature so YMMV between
distros.

    
    
        ~ $ xx /usr/portage/distfiles/xz-5.0.8.tar.gz
        ~ $ cd xz-5.0.8/
        ~/xz-5.0.8 $ ./configure --help | grep -A1 scripts
          --disable-scripts       do not install the scripts xzdiff, xzgrep, xzless,
                                  xzmore, and their symlinks

------
foxhop
tar ... tape archive

------
necessity
What about availability? I often find myself having to download and compile
(de)compressing software because the authors of some other software I need
decided to ship it in something other than the standard (.tar.gz), which is
available in basically all *nix boxes.

------
modarts
no

------
GalacticDomin8r
ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/10.2/

------
eximius
ick, the invocation is `tar tJvf`? Granted, I can alias it but a capital J is
just about the worst option letter I can think of.

~~~
LeoPanthera
I can never remember the shortcuts so I always do:

tar -cv "folder" | xz >"out.tar.xz"

~~~
breadbox
GNU tar also has easy-to-remember long options for all the major compression
programs: --gzip, --xz, --lzop, --lzma, --lzip.

