
Generative Adversarial Networks for Extreme Learned Image Compression - relate
https://data.vision.ee.ethz.ch/aeirikur/extremecompression/
======
rasz
Picture is not compressed, its hallucinated from vague memory of the real
thing, a mere dream. Cars vanish, building change wall structure, even the
license plate receives fake text absent from source materia.

Its a giant guesswork of what was there originally. Reminds me of Xerox
scanners lying about scanned in numbers
[http://www.dkriesel.com/en/blog/2013/0802_xerox-
workcentres_...](http://www.dkriesel.com/en/blog/2013/0802_xerox-
workcentres_are_switching_written_numbers_when_scanning)

~~~
yorwba
All lossy compression algorithms hallucinate. That's the whole point: reducing
image size by dropping some information and then hallucinating a plausible
replacement to decompress.

The only difference is that this compression is better at hallucinating, so
you don't get ringing artifacts or blocks, but some internally consistent
alternate reality.

If you don't want to lose data you should not use lossy compression at all.
JPEG can erase the distinction between digits as well.

~~~
skybrian
Okay, but still, some kinds of changes are better than others. They should
probably start testing for this in the visual perception tests: is lost
information greyed out in a visible way? Are words and digits always fuzzed,
or replaced?

Because it turns out that fuzziness and compression artifacts have a higher-
level meaning: when you see them, you know something has been lost. That's an
important (if inadvertent) signal. We need to make sure the artifacts don't go
away.

~~~
mcbits
It could be a problem if it hallucinates the wrong license plate number at a
crime scene. If all you want is a gigantic 8K resolution stock photo of a
woman holding her baby in front of a laptop without devouring 10 MB of the
user's data cap, it may be fine if the woman has a slightly different (but
still highly detailed) hair style.

~~~
skybrian
It would always decompress the same way, so for artistic purposes, if you
preview it and it looks good, it is good.

But if you're using photography to look at things in the world, that's a whole
different story.

------
iTokio
The trick is not so much about compression, but rather about image generation.
The trade off is completely different from usual lossy algorithms. A highly
compressed image might still retain high visual quality but with completely
different details, textures.

Kinda what would happen if you use a perfect painter with a blurry memory.

~~~
Tossrock
This is quite visible in their demonstration slider-image; the building in the
back-right changes from a brick building with glass windows to a stucco-ish
building with a bunch of exterior duct-work.

~~~
fludlight
Also, the car behind the bus disappears. It looks like it's been photoshopped
out. This is unexpected behavior from a compression algorithm. Users are
conditioned to expect the quality to degrade uniformly across the whole image.

~~~
Veedrac
In fairness that detail isn't preserved in the other formats either. The new
issue is merely the illusion of accuracy.

~~~
rasz
car is visible in WebP.

------
bcheung
I'd be curious to see how different levels of quantization affect the image.
From the paper it looks like the quantization is applied at the latent feature
space. I wonder if it has similar effects like the celebrity GAN's we have
seen where interpolating in the latent space results in morphing from one face
to another. Could be funny when compression doesn't result in something blocky
or distorted, but replacing objects with other objects that look similar to
them.

This seems to be for static images, but this gets me wondering if an RNN can
be used and have better motion prediction that other current "hard coded"
solutions.

Also, the more specific the domain, the better the compression, since it can
specialize. I'm wondering about the practical applications of this. Do we have
different baselines that can be used for different use cases?

~~~
make3
it also means that you could get a meaningful average of two images by
averaging their compressed form (aka latent state z), and decoding, just like
with the celebrities :)

------
return1
I wonder how pied piper will respond to that. This could be a good idea for
video compression that is "monothematic". Hmmm, i can think of a video
industry that is monothematic...

------
mmastrac
Just waiting for this to show up in a video compression standard. With the
right network it could be just as fast to decompress, though probably insanely
slow to compress.

~~~
dschn_dstryr
No, the tradeoff is actually the other way around. Encoding with a neural
network can potentially be faster than the exhaustive tree searches that are
done in current compression methods. On other hand, current decoders are
fairly dumb and extremely optimized for speed. Neural networks will probably
have trouble competing. In the design of video codecsan increase in decoding
time is considered at least 10x more costly than the same increase in encoding
time.

source: I worked with both HEVC and neural network based compression.

------
stochastic_monk
I think that the title of the paper should state that it is for _lossy_ image
compression, which clearly states how it works and what task it performs.

I would be surprised if there wasn't a way to provide a learned, lossless
method of compression, but that would be a very different paper and result.

------
tmpmov
Take the following as coming from a dilettante... I'm still trying to
understand the remainder of the paper but felt like writing on the basics of
the encoder/decoder/quantizer setup they mention.

I found this particularly interesting "To compress an image x ∈ X , we follow
the formulation of [20, 8] where one learns an encoder E, a decoder G, and a
finite quantizer q."

I feel like this is related to some of the standard human
memorization/learning techniques. Example: I'm learning the guitar fretboard
note placement in e standard. It's difficult for me to visualize the first 4
frets on a 6 string guitar with notes on each fret.

To help me memorize the note placement I develop various mnemonic devices
(both lossy and lossless). I know I've memorized the fretboard sufficiently
when I can visualize it.

Attempting to translate my reading of the paper I believe the following
analogy is apt. My "encoder" operates on a short term image when I close my
eyes after looking at a fret diagram. It produces semantic objects, i.e. an
ordered sequence of "letters" or pairs of letters (letters that are
horizontally, vertically or diagonally aligned). The quantizer takes these
objects and looks at the order/distribution. The quantizer places more
importance on some of the semantic objects than others (the fourth fret has 4
natural notes before an accidental). My decoder is interpreting the
stored/compressed note information to try to produce the image. It may be off
substantially, so I correct and repeat the process.

The process of optimizing what the semantic objects are, the weight each gets,
and how I use them to derive the original image seems like a fairly good
representation of what I do (though at least some of that appears to be fixed
in the learning algorithm typically). Of course, analogies are just that and
mine doesn't take into account the discriminator or the remaining "heart" of
the paper.

I think the heart of the paper is that they're trying to determine through
GANs a good way to both store the image and recover it while reducing bits per
pixel and increasing the quality of reproduction. Using some classical terms,
the GAN algorithm thus tweaks the compressor, the data storage format and the
decompressor to optimize what should be "hard-coded" in the
compressing/decompressing process or program vs what will be stored as a
result of the compression program.

Very handwavey but I think the general idea is right?

~~~
bcheung
An encoder / decoder architecture learns a more "efficient" representation. It
tries to find features it can use that are useful for describing the
variations in the input data (images) that it has seen.

For example, if trained on faces, it will learn features for things like eyes
and mouths. So the image can be encoded as put a mouth of this type with this
width at this location rather than operating at the level of pixels.

If trained on text, it might learn features related to letters and typography
(boldness, italics, size, spacing). So it might encode things as Helevetica,
16pt, italics.

This is a gross oversimplification, and things rarely map exactly to concepts
humans would use, but hopefully it communicates the concept.

------
FrozenVoid
For photo quality material this will be a detail loss, but some
media(animation/clip art/compressed video) can benefit greatly if the
algorithm of reconstruction is fast enough. They should compare it with
AV1/x265 codecs.

------
pornel
It seems that some form of neural network based compression is the future, but
how to go from academic one-off implementation to a widely deployable codec?

------
ttoinou
Was thinking about this use case of neural networks for months... Glad to read
a paper about that. Wonder how to adapt that to video

~~~
bcheung
It's very interesting. I've heard it said in online lectures that it does a
sort of compression but nobody really uses it for that because existing
algorithms perform much better. Guess this is no longer true.

------
fredguth
Loved the site. Great way to present research, will be even better with source
code or a notebook.

