
Show HN: Neural Image Compression Demo - tonic_section
https://colab.research.google.com/github/Justin-Tan/high-fidelity-generative-compression/blob/master/assets/HiFIC_torch_colab_demo.ipynb
======
tonic_section
Hi everyone, I've been working on an implementation of a model for learnable
image compression together with general support for neural image compression
in PyTorch. You can try it out directly and compress your own images in Google
Colab [1] or checkout the source on Github [2].

This project is based on the paper "High-Fidelity Image Compression" by
Mentzer et. al. [3] - this was one of the most interesting papers I've read
this year! The model is capable of compressing images of arbitrary size and
resolution to bitrates competitive with state-of-the-art compression methods
while maintaining a very high perceptual quality. At a high-level, the model
jointly trains an autoencoding architecture together with a GAN-like component
to encourage faithful reconstructions, combined with a hierarchical
probability model to perform the entropy coding.

What's interesting is that the model avoids compression artifacts associated
with standard image codecs by subsampling high-frequency detail in the image
while preserving the global features of the image very well - for example, the
model learns to sacrifice faithful reconstruction of e.g. faces and writing
and use these 'bits' in other places to keep the overall bitrate low.

The overall model is around 700MB - so transmitting the model wouldn't be
particularly feasible, and the idea is that both the sender and receiver have
access to the model, and can transmit the compressed messages between
themselves.

If you have any questions or notice something weird I'd be more than happy to
address them.

\---

[1] Colab Demo: [https://colab.research.google.com/github/Justin-Tan/high-
fid...](https://colab.research.google.com/github/Justin-Tan/high-fidelity-
generative-compression/blob/master/assets/HiFIC_torch_colab_demo.ipynb)

[2]: Github: [https://github.com/Justin-Tan/high-fidelity-generative-
compr...](https://github.com/Justin-Tan/high-fidelity-generative-compression)

[3]: Original paper: [https://hific.github.io/](https://hific.github.io/)

[4]: Sample reconstructions: [https://github.com/Justin-Tan/high-fidelity-
generative-compr...](https://github.com/Justin-Tan/high-fidelity-generative-
compression/blob/master/assets/EXAMPLES.md)

~~~
garblegarble
Would this work for a lossless / near lossless approach by having a final pass
storing a delta between the compressed image and the original pixels, or do
you think they diverge too much on a purely pixel-for-pixel basis for this to
be valuable?

~~~
godelski
The model uses a GAN which does not learn the exact PDF. So not lossless, but
as you can see from the images it gets extremely visually accurate results.

From the README

> The generator is trained to achieve realistic and not exact reconstruction.
> It may synthesize certain portions of a given image to remove artifacts
> associated with lossy compression. Therefore, in theory images which are
> compressed and decoded may be arbitrarily different from the input. This
> precludes usage for sensitive applications. An important caveat from the
> authors is reproduced here:

> "Therefore, we emphasize that our method is not suitable for sensitive image
> contents, such as, e.g., storing medical images, or important documents."

~~~
est31
> "Therefore, we emphasize that our method is not suitable for sensitive image
> contents, such as, e.g., storing medical images, or important documents."

As an example of this going wrong previously, xerox had once implemented
compression based on deduplicating duplicate parts of documents. Obviously
numbers contains tons of duplicate symbols (digits). The problem was that the
scanner software deduplicated different numbers with each other, leading to
wrong numbers.

[http://www.dkriesel.com/en/blog/2013/0802_xerox-
workcentres_...](http://www.dkriesel.com/en/blog/2013/0802_xerox-
workcentres_are_switching_written_numbers_when_scanning)

------
atorodius
Nice demo! Thanks for porting our paper to Pytorch, and stay tuned for the
official implementation, coming soon.

Also check out the previous discussion on HN:

[https://news.ycombinator.com/item?id=23652753](https://news.ycombinator.com/item?id=23652753)

------
whatever1
Not an compression expert, but my eyes have been trained to ignore color
gradient issues and minor pixelation as long as the outline of the shapes is
clearly defined. This approach while doing better job on preserving detail in
colors and avoids pixelation, it distorts significantly the shapes themselves
(see the clock on the last example). It makes the images seem like google map
3D renders of shorts. How finely can you tune the target compression ratio ?
Maybe with a less aggressive target these would not be that evident ?

~~~
tonic_section
During training, you can set a target bitrate by heavily penalizing examples
which exceed the target rate in the rate-distortion objective - so the model
should learn to produce compressed representations at or below this bitrate.
However, this constraint is only enforced on aggregate throughout the entire
dataset - like many ML systems, there is no guarantee of behaviour for
individual examples, either within or outside the training set. Despite this,
the model appears to respect the target rate well, even on out-of-sample
images.

One shortcoming is that this current model is non-adaptive - which means that
the target rate is fixed. So to achieve different target compression rates you
would have to train multiple models in different rate regimes. In the Colab
demo there is the option to select between 3 different models trained with a
target bits-per-pixel (bpp) rate at 0.14bpp, 0.30bpp, and 0.45bpp,
respectively - higher rates correspond to more higher-fidelity
reconstructions, at the expense of a lower compression ratio. The default is
the `HiFIC-med` model (and this is what the all samples in the README were
generated with), but the model trained at the highest bitrate should have less
obvious imperfections.

There's also an aspect to the distortion that can be attributed to the entropy
coding process rather than the model itself - currently the system clips
values outside a certain probability range, resulting in artificial distortion
- a fix is in the pipeline though.

------
fireattack
The result seems pretty poor to me?

(I just use the exmaple image that is already in the notebook)

Original: [https://i.imgur.com/Q66mHTD.png](https://i.imgur.com/Q66mHTD.png)
Result: [https://i.imgur.com/4R6qn8e.png](https://i.imgur.com/4R6qn8e.png)

There are lots of random spots on the image, and the brightness level changes
totally.

Sure, 5232 kB to 124 kB is impressive, but people would probably prefer a
badly compressed JPEG over this, since at least JPEG artifact is predictable
(and if image isn't displayed in 100%, the artifact would be less obvious,
unlike brightness change and spots in this result).

Edit: I just saw the result in
[https://hific.github.io/](https://hific.github.io/) for the same picture, but
that one has none of these flaws (no brightness change, no weird spots here
and there) with even smaller filesize. Why?

~~~
tonic_section
Hey, thanks for bringing the brightness issue to my attention - turns out I
wasn't normalizing the output correctly - I just pushed a fix and the output
images don't have the brightness change now.

As for the random spots, that's an artifact of the entropy coding algorithm.
In principle this is lossless but there is some distortion because I'm using a
custom vectorized version of an rANS encoder and it's hard to encode overflow
values in a vectorized fashion, I'm working on this though. If you can live
with really slow decoding times (2-3mins) then you can disable vectorization
to eliminate these small imperfections entirely.

As for the comparison to the official model, that's mainly because of compute
constraints v. Google (this is just my weekend project). My model uses a
smaller architecture and was trained for only 4e5 steps versus the 2e6 steps
they reported in the paper - even then it took 4+ days on AWS! The model is
also trained on the Openimages dataset, which is presumably much smaller and
more noisy than the massive internal dataset Google used.

~~~
fireattack
Just curious, is the change on the model side? Since I didn't see much
relevant in the notebook's rev history [1].

[1] [https://colab.research.google.com/github/Justin-Tan/high-
fid...](https://colab.research.google.com/github/Justin-Tan/high-fidelity-
generative-compression/blob/master/assets/HiFIC_torch_colab_demo.ipynb)

------
est31
Reddit thread with the creators of Hific answering questions:
[https://www.reddit.com/r/MachineLearning/comments/hgkup5/r_h...](https://www.reddit.com/r/MachineLearning/comments/hgkup5/r_highfidelity_generative_image_compression/)

------
tombh
Here's a close up comparison so you can clearly see the differences:
[https://i.imgur.com/TP1QQpJ.png](https://i.imgur.com/TP1QQpJ.png)

Edit: original is on the right

------
xiphias2
I would be interested in a comparision with a neural JPEG decoder that tries
to restore the original image as much as possible.

It's incredibly hard to change the default file format on the web, but there's
an opportunity to switch libjpeg to a decoder with much more realistic output
images.

------
browserface
Damn I got an error at cell 19

otherwise seemed to work

~~~
tonic_section
What was the error? I tried to make the demo notebook as robust as possible -
you should be able to execute all cells in sequence once then execute cells
out of sequence etc. without trouble, but it's hard to legislate for errors in
Jupyter-like notebooks sometimes.

~~~
s-macke
In the step

# Setup model

I get an error in the function call 'prepare_model'

UnpicklingError: invalid load key, '<'.

~~~
0-_-0
I get the same

