Hacker News new | past | comments | ask | show | jobs | submit login

Huh, my results are drastically different:

    $ dd if=/dev/urandom bs=512 count=2048 | base64 | gzip | wc
    ...
    4088   31928 1059085

    $ dd if=/dev/urandom bs=512 count=2048 | xxd -p | gzip | wc    ...
    5019   33798 1231268
1 megabyte of random data consistently results in ~1 megabyte of compressed base64 text, ~1.2 megabytes of compressed hex.



Repeating the exercise using a photograph (https://imgur.com/B4tqkrZ):

    $ pv lock-your-screen.png | base64 | gzip | wc
     2.1MiB 0:00:00 [  13MiB/s] [================================>] 100%
       8641   48904 2264151


    $ pv lock-your-screen.png | xxd -p | gzip | wc
     2.1MiB 0:00:00 [4.41MiB/s] [================================>] 100%
      10109   49956 2573293

See the same against the jpg version (I've no idea why I have kept both a jpg and png of the same image around, especially given jpg is much better suited format):

    $ pv lock-your-screen.jpg | base64 | gzip | wc
     377KiB 0:00:00 [19.2MiB/s] [================================>] 100%
       1420    8373  392796


    $ pv lock-your-screen.jpg | xxd -p | gzip | wc
     377KiB 0:00:00 [9.36MiB/s] [================================>] 100%
       1487    8935  441077
In both cases base64 is both faster and compresses smaller.


This test isn't very informative because both .png and .jpg are already compressed formats, with "better than gzip" strength so gzip/deflate isn't going to be able to compress the underlying data.

You only see some compression because gzip is just backing out some of the redundancy added by the hex or base64 encoding, and the way the huffman coding works favors base64 slightly.

Try with uncompressed data and you'll get a different result.

Your speed comparison seems disingenuous: you are benchmarking "xxd", a generalized hex dump tool, against base64, a dedicated base-64 library. I wouldn't expect their speeds to have any interesting relationship with best possible speed of a tuned algorithm.

There is little doubt that base-16 encoding is going to be very fast, and trivially vectorizable (in a much simpler way than base-64).


> both .png and .jpg are already compressed formats, with "better than gzip" strength

FWIW, PNG and gzip both use the DEFLATE algorithm, so I wouldn't call PNG's compression "better than gzip".

Source: https://en.wikipedia.org/wiki/DEFLATE

> This has led to its widespread use, for example in gzip compressed files, PNG image files and the ZIP file format for which Katz originally designed it.


Like any domain-specific algorithms PNG uses deflate as a final step after using image-specific filters which take advantage of typical 2D features. So in general png will do much better than gzip on image data, but it will generally always do at least as well (perhaps I should have said that originally). In particular, the worse case .png compression (e.g., if you pass it random data, or text or something) is to use the "no op" filter followed by the usual deflate compression, which will end up right around plain gzip.

Now at least as good is enough for my point: by compressing a .png file with gzip you aren't going to see additional compression in general. When compressing a base-64 or hex encoding .png file, the additional compression you see is largely only a result of removing the redundancy of the encoding, not any compression of the underlying image.


Ooops, that should read "Like many domain-specific algorithms" not "Like any ..."


My data wasn't quite random! It repeats every 1kb, much smaller than zlib's window size (which is I think 16kb)


A good rule-of-thumb might be that if your results show that you able to consistently compress supposedly random data to less than the size required for just the random binary bits, you should either recheck your numbers, verify your random number generator, or quickly file for a patent!


You should add googling “Shannon” “entropy” and “information theory” to that list ;-)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: