
Real-Time Adaptive Image Compression - cardigan
http://www.wave.one/icml2017
======
trevyn
Nice work, but disingenuous to not include a BPG (HEVC) image for comparison
-- BPG is close to state-of-the-art, not WebP -- even their own SSIM charts
show this.

Interesting that decoding is slower than encoding. Also curious about
performance on CPU.

This approach may also be susceptible to "hallucinating" inaccurate detail;
you can see a little bit of this on the upper-right of the girl's circled
eyelid compared to the original Kodak image. See also:
[http://www.dkriesel.com/en/blog/2013/0802_xerox-
workcentres_...](http://www.dkriesel.com/en/blog/2013/0802_xerox-
workcentres_are_switching_written_numbers_when_scanning)

~~~
tomaskafka
> This approach may also be susceptible to "hallucinating" inaccurate detail

Yes! This. I can see us givng up decisions to 'AI' without realizing that it's
just a loose association machine. Would I want to have my mortgage rate
adjusted by a 'loose association'?

[https://twitter.com/keff85/status/862690920805916672](https://twitter.com/keff85/status/862690920805916672)

------
vladdanilov
As someone interested in the field, this does not look much different from
state-of-the-art video codecs, e.g. variable-size blocks, wavelet, predicator,
arithmetic coding. Only that the predicator is trained on real data. But
symbol dictionaries are already used in modern compressors like brotli and
zstd.

Most codecs have been tuned for a mean subjective score (MOS). MS-SSIM is not
particularly good to fully rely on [1] [2]. In my experiments it performed
poorly.

I think Google team effort will have a much bigger impact [3] just by
combining all recent practical improvements in the image compression.

Meanwhile, they could have optimized images on the website a little better.
Saved ~15% with my soon-to-be-obsolete tool [4].

[1]
[https://encode.ru/threads/2738-jpegraw?p=52583&viewfull=1#po...](https://encode.ru/threads/2738-jpegraw?p=52583&viewfull=1#post52583)

[2] [https://medium.com/netflix-techblog/toward-a-practical-
perce...](https://medium.com/netflix-techblog/toward-a-practical-perceptual-
video-quality-metric-653f208b9652)

[3] [https://encode.ru/threads/2628-Guetzli-a-new-more-
psychovisu...](https://encode.ru/threads/2628-Guetzli-a-new-more-psychovisual-
JPEG-encoder?p=52198&viewfull=1#post52198)

[4] [http://getoptimage.com](http://getoptimage.com)

------
amelius
From their website:

> Lubomir holds a Ph.D. from UC Berkeley, 20 years of professional experience,
> 50+ issued patents and 5000+ citations.

I just hope this type of research isn't going to end in a patent encumbrance,
like it did with JPEG and MPEG.

These techniques are right around the corner, no matter who invents the file
formats.

So if their idea is to lock these general ideas down with more patents, I'd
want them to stop their research and let people with more open intentions
research this further.

~~~
sitkack
This also looks like a meta-algorithm, an algorithmic way to generate domain
specific compressors, potentially anything this thing creates would also be
covered by patents.

------
tomaskafka
A rarely discussed danger of all machine learning models: If they don't know
the answer, they'll rather make something up.

Here's a Google Translate example:
[https://twitter.com/keff85/status/862690920805916672](https://twitter.com/keff85/status/862690920805916672)

I wouldn't like to lose a part of parcel in a lawsuit because an adaptive
algorithm made up some details in aerial photograph so that it compresses
better ...

------
maaark
So is this only good for ridiculously low target filesizes? Noone in their
right mind is going to compress a "480x480 image to a file size of 2.3kB"

What I want to see is an acceptable looking JPG next to a WaveOne image of the
same size. Or an acceptable looking WaveOne next to a JPG of the same size.

How small is good enough? How good is small enough?

~~~
espadrine
The use of a small image is for illustrative purposes. Saying "this image is
less bytes than that one" is less striking than "look at those two images:
they have the same size, but one is ugly".

------
SimplyUnknown
I wonder how this compares to FLIF. I also tried to compress images based on
shape and structure but by approximating these using skeletons.

I'm just a bit struggling with their performance comparison. The graphs they
present are very pretty and promising but for the presented images we're quite
left in the dark. They dump some images and theirs looks prettier and the
authors give us _some_ indication of quality but it's not conclusive evidence
that their method produces better images. Typically when different compressed
images are presented two things can vary: quality and file-size. In the
presented images both seem to varying without telling us which is which. Also,
there is no baseline to compare against, either in terms of filesize or what
the should look like. Sure, we humans can make a very educated guess but it is
just sloppy to not include the uncompressed original image.

I will be fully convinced when I can try it for myself on my own image set.

~~~
bhouston
FLIF is lossless. So very very different.

~~~
vanderZwan
Err... FLIF is intended for lossless compression, but _can_ be lossy (it
doesn't perform as well as intentionally lossy codecs, although that might
also be a matter of optimising for it).

However, it also has a kind of adaptive ML-ish approach so it might be
technically similar.

> _FLIF is based on MANIAC compression. MANIAC (Meta-Adaptive Near-zero
> Integer Arithmetic Coding) is an algorithm for entropy coding developed by
> Jon Sneyers and Pieter Wuille. It is a variant of CABAC (context-adaptive
> binary arithmetic coding), where instead of using a multi-dimensional array
> of quantized local image information, the contexts are nodes of decision
> trees which are dynamically learned at encode time. This means a much more
> image-specific context model can be used, resulting in better compression._

[http://flif.info/](http://flif.info/)

------
svantana
Very impressive work, though it seems like a mistake to focus on compression,
which gets less valuable as storage and bandwidth gets cheaper. You need only
look to the staying power of jpeg, which is so far from the state of the art,
yet it's not going anywhere. Why? The demand for replacing it is not strong
enough.

They obviously have some good image priors here, if I were them I would
consider applying this tech to other image-related things, like image
manipulation, or image search. Although competition is heating up quickly in
these fields...

~~~
CyberDildonics
It isn't 'very impressive work', it is marketing for Silicon Valley
(impressive marketing though).

Literally the first sentence of the linked page:

"Even though over 70% of internet traffic today is digital media, the way
images and video are represented and transmitted has not evolved much in the
past 20 years (apart from Pied Piper's Middle-Out algorithm)."

EDIT: This is embarrassing that not one person in this thread seems to have
actually read any of the paper. Now with obvious evidence that this is
fiction, people still don't want to believe it.

~~~
bhouston
There are real people behind it:

[https://scholar.google.ca/citations?user=reEAEWsAAAAJ&hl=en](https://scholar.google.ca/citations?user=reEAEWsAAAAJ&hl=en)

[https://scholar.google.ca/citations?user=OXFjRnEAAAAJ&hl=en](https://scholar.google.ca/citations?user=OXFjRnEAAAAJ&hl=en)

Company entry on Linkedin: [https://www.linkedin.com/company-
beta/12953035/](https://www.linkedin.com/company-beta/12953035/)

~~~
CyberDildonics
It literally quotes a fictional TV show in the synopsis and directly in the
paper. Are you seriously not getting that is is fiction?

Why would they have this in the actual PDF?

"Finally, Pied Piper has recently claimed to employ ML techniques in its
Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded
in mystery"

This is promotion, they are getting a fake paper to permeate throughout the
internet. If I had to guess, I would say they are making a statement about
reproducing results in academia and not taking a single paper as gospel. If
so, I think they are making their point pretty well.

~~~
bhouston
Okay, it may be. It is something that can be done from a technical level for
sure, given enough training data and enough data on the client side to do
reconstructions.

I guess once you get popular enough you can get TV consultants who can propose
real solutions as TV props. Heh.

------
bhouston
Deep learning will be a great way to do compression for sure, both for audio,
video and images. I could see that one could download "knowledge sets" for
these decompressors. Looking at Google Earth, download the supplemental
"knowledge set" for overhead shots of cities and country side. Looking at
people, download the supplemental "knowledge set" for faces and clothing, etc.

Basically each domain you want to do well in you need a knowledge set that is
trained on that data. Then you need a discriminator on the compression side to
classify an image or subregions of an image into those categories.

If you can make the knowledge sets downloadable on demand and then cached you
can be incredibly efficient over the long term, while maintaining very small
initial download sets. I think evolveable knowledge sets over time also ensure
that the codex is flexible to handle currently not foreseen situations. Nobody
wants a future where are DL-based image/video compression tool only knows a
few pre-determined sets and is mediocre on everything else.

~~~
thesz
The "knowledge set" would be a small matrix - the one that allows to decode
whatever encoder has put into compressed data. I guess it will be in volume
range of 8x8 16-bit floats or so. Maybe three to six such matrices per
channel.

------
discreditable
This encoder seems to have some weird distortions that are most visible in the
aerial shots. Compare [1] and [2]. The lines on the basketball court are
distorted and curved. If you look closely, there is also curvature added to
the sidewalks where there isn't any. In case those links break, I'm referring
to the top row of aerial images.

[1]
[https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...](https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b040/590bade05016e1ca29d9d74f/5910f88203596eba4ac0e444/1494284425679/bloomington28_3_crop_aerial_bpp0.1_disc_reconst_WO.png)

[2]
[https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...](https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b040/590bade05016e1ca29d9d74f/5910f88086e6c0368e048208/1494284423491/bloomington28_3_crop_aerial_bpp0.1_disc_reconst_JP2.png)

~~~
tomaskafka
Yep - and see how it almost disappeared the white car in the shadow in the
left middle part. "We've deleted your only alibi so that our data would
compress better ..."

~~~
AstralStorm
Lossy codec being lossy is a feature, not a drawback.

------
espadrine
> _While we are slightly faster than JPEG (libjpeg) and significantly faster
> than JPEG 2000, WebP and BPG, our codec runs on a GPU and traditional codecs
> do not — so we do not show this comparison._

This is great news!

I'd actually like to see the plot, though. (Both for encoding and decoding.)
It stands to reason that a neural network can optimize image compression, as
it can encode high-level information like "this is a face". But encoding /
decoding speed is the sticking point, so I feel successes there should be
emphasized.

The necessity of having a GPU doesn't seem problematic nowadays; everything
has one. Testing it with a mobile-grade GPU would be interesting.

~~~
SimplyUnknown
Thing is, "running on GPU" might mean "uses CUDA" which would make it more
problematic

------
boromi
I'm going to need to see this code in practise to believe it.

~~~
madez
I didn't find the code. Did you have more luck?

~~~
CyberDildonics
It is a joke paper as a marketing stunt for Silicon Valley. I would bet they
could get it accepted to some journals / conferences too since it looks
extremely convincing.

------
rothron
Seems like a slightly unfair comparison. Training the compressor moves data
from the images into the compressor, making the bit per pixel evaluation
slightly more iffy.

~~~
huhtenberg
Not really.

As long as the decompressor needs just an image file and no other data, it's a
fair game.

~~~
bhouston
How large is the decompressor to download?

Is this image compression tool good at images it was not trained on?

How bad does it get in those situations?

Is this training data fixed into the codex forever? Will there be slightly
different image codexs that have different training data? That would be sort
of hellish.

~~~
rothron
You'd need some bits in the file telling you which trained decoder you need to
get the proper image out.

What would the image of that girl look like if they used WaveOne Aerial?

How is this not cheating if the example image they used to compare algorithms
is in the training data?

------
mcraiha
One big reason for "hardcoded" encoders and decoders is that they much easier
to implement in hardware.

One can improve e.g. H.265 somewhat easily, if software only solution is an
option. But if you need cheap hardware only solution then ML-required-way
seems a bit too expensive.

------
CyberDildonics
Does no one realize this is a joke / marketing?

Directly from the paper's PDF:

"Finally, Pied Piper has recently claimed to employ ML techniques in its
Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded
in mystery."

~~~
akx
Or, you know, that could just be a humorous reference to the TV series, while
this is a real implementation.

~~~
web007
This is my read too, since they're citing Judge et al versus the characters or
the characters' papers. Its the same vein as when Dropbox's Lepton article
cited middle-out as a humorous attempt at self-promotion.

------
bhouston
Is this open source or something that you are aiming to license?

------
creo
Where is PNG?

~~~
H4CK3RM4N
I think this is lossy compression, so it doesnt really overlap w/PNG in terms
of performance/file size goals.

~~~
creo
Oh, right. I didn't consider that.

------
hojijoji
for some reaso they do not show the uncompressed image for comparison

~~~
syberspace
They mentioned the Kodak dataset[1] in the second paragraph. It seems to be
Kodak Image 15[2]

edit: as for the other images: it would be indeed nice to see those.

[1] [http://r0k.us/graphics/kodak/](http://r0k.us/graphics/kodak/) [2]
[http://r0k.us/graphics/kodak/kodim15.html](http://r0k.us/graphics/kodak/kodim15.html)

