
Lossless Image Compression Through Super-Resolution - beagle3
https://github.com/caoscott/SReC
======
crazygringo
This is utterly fascinating.

To be clear -- it stores a low-res version in the output file, uses neural
networks to predict the full-res version, then encodes the difference between
the predicted full-res version and the actual full-res version, and stores
that difference as well. (Technically, multiple iterations of this.)

I've been wondering when image and video compression would start utilizing
standard neural network "dictionaries" to achieve greater compression, at the
(small) cost of requiring a local NN file that encodes all the standard image
"elements".

This seems like a great step in that direction.

~~~
OskarS
It's a really cool idea, but I don't know if this would ever be a practical
method for image compression. First of all, you could never change the neural
network without breaking the compression, so you can't ever "update" it. Like:
what if you figure out a better network? Too bad! I mean, I guess you could,
but then you need to to version the files and keep copies of all the networks
you've ever used, but this gets messy quick.

And speaking of storing the networks: I don't know that you would ever want to
pay the memory hit that it would take to store the entire network in memory
just to decompress images or video, nor the performance hit the decompression
takes. The trade-off here is trading reduced drive space for massively
increased RAM and CPU/GPU time. I don't know any case where you'd want to make
that trade-off, at least not at this magnitude.

Again though: it's an awesome idea. I just don't know that's ever going to be
anything other than a cool ML curiosity.

~~~
daveguy
I think the idea is the network is completely trained and encoded along with
the image and delta data. A new network would just require retraining and
storing that new network along with the image data. It doesn't use a global
network for all compressions.

~~~
codeflo
I don't think this would work, the size of the network would likely dominate
the size of the compressed image.

~~~
mstade
Wouldn't the network be part of the decoder?

~~~
mastre_
Yes, and this is why you couldn't update the network. Still, much like how
various compression algos have "levels," this standard could be more open in
this regard, adding new networks (sort of what others above refer to as
versions) and the image could just specify which network it uses. Maybe have a
central repo from where the decoder could pull a network it doesn't have (i.e.
I make a site and encode all 1k images on it using my own network, pull the
network to your browser once so you can decode all 1k images). And even
support a special mode where the image explicitly includes the network to be
used for decoding it along with image data (could make sense for a very large
images, as well as for specialized/demonstrational/test purposes).

All in all, a very interesting idea.

~~~
mstade
I wonder what the security implications of all this is, sounds dangerous to
just run any old network. I suppose maybe if it's sandboxed enough with very
strongly defined inputs and outputs then the worst that could happen is you
get garbled imagery?

------
acjohnson55
Interesting. It sounds like the idea is fundamentally like factoring out
knowledge of "real image" structure into a neutral net. In a way, this is
similar to the perceptual models used to discard data in lossy compression.

~~~
dehrmann
I wonder if there's a way to do this more like traditional compression;
performance is a huge issue for compression, and taking inspiration from a
neural network might be better than actually using one. Conceptually, this is
like a learned dictionary that's captured by the neural net, it's just that
this is fuzzier.

~~~
retrac
Training the model is extremely expensive computationally, but using it often
isn't.

For example, StyleGAN takes months of compute-time on a cluster of high-end
GPUs to train to get the photorealistic face model we've all seen. But
generating new faces from the trained model only takes mere seconds on a low-
end GPU or even a CPU.

------
propter_hoc
This is really interesting but out of my league technically. I understand that
super-resolution is the technique of inferring a higher-resolution truth from
several lower-resolution captured photos, but I'm not sure how this is used to
turn a high-resolution image into a lower-resolution one. Can someone explain
this to an educated layman?

~~~
mywittyname
From peaking at the code, it seems like each lower res image is a scaled down
version of the original plus a tensor that is used to upscale to the previous
image. The resulting tensor is saved and the scaled image is used as the input
to the next iteration.

The decode process takes the last image from the process above, and
iteratively applies the upscalers until the original image has been
reproduced.

Link to the code in question:
[https://github.com/caoscott/SReC/blob/master/src/l3c/bitcodi...](https://github.com/caoscott/SReC/blob/master/src/l3c/bitcoding.py#L106)

~~~
peter_d_sherman
If we substitute "information" for "image", "low information" for "low
resolution" and "high information" for "high resolution", perhaps compression
could be obtained generically on any data (not just images) by taking a high
information bitstream, using a CNN or CNN's (as per this paper) to convert it
into a shorter, low information bitstream plus a tensor, and then an entropy
(difference) series of bits.

To decompress then, reverse the CNN on the low information bitstream with the
tensor.

You now have a high information bitstream which is _almost_ like your
original.

Then use the entropy series of bits to fix the difference. You're back to the
original.

Losslessly.

So I wonder if this, or a similar process can be done on non-image data...

But that's not all...

If it works with non-image data, it would also say that mathematically, low
information (lower) numbers could be converted into high information (higher)
numbers with a tensor and entropy values...

We could view the CNN + tensor as mathematical function, and we can view the
entropy as a difference...

In other words:

 _Someone who is a mathematician might be able to derive some identities, some
new understandings in number theory from this_...

~~~
valine
Convolution only works on data that is spatially related, meaning data points
that are close to each other are more related than data points that are far
apart. It doesn't give meaningful results on data like spreadsheets where
columns or rows can be rearranged without corrupting the underlying
information.

If by non-image data you mean something like audio, then yes it could probably
work.

------
ilaksh
I asked a question about a similar idea on Stack Overflow in 2014.
[https://cs.stackexchange.com/questions/22317/does-there-
exis...](https://cs.stackexchange.com/questions/22317/does-there-exist-a-data-
compression-algorithm-that-uses-a-large-dataset-distribu)

They did not have any idea and they were dicks about it as usual.

------
Der_Einzige
This technology is super awesome... and it's been available for awhile.

A few years ago, I worked for #bigcorp on a product which, among other things,
optimized and productized a super resolution model and made it available to
customers.

For anyone looking for it - it should be available in several open source
libraries (and closed source #bigcorp packages) as an already trained model
which is ready to deploy

------
trevyn
On the order of 10% smaller than WebP, substantially slower encode/decode.

~~~
baq
is webp lossless?

~~~
dubcanada
It's both lossless and lossy -
[https://en.wikipedia.org/wiki/WebP](https://en.wikipedia.org/wiki/WebP)

------
6510
Reminds me of this.

[https://en.wikipedia.org/wiki/Jan_Sloot](https://en.wikipedia.org/wiki/Jan_Sloot)

Gave me a comical thought if such things can be permitted.

You split into rgb and b/w, turn the pictures into blurred vector graphics.
Generate and use an incredibly large spectrum of compression formulas made up
of separable approaches that each are sorted in such a way that one can dial
into the most movie-like result.

3d models for the top million famous actors and 10 seconds of speech then
deepfake to infinite resolution.

Speech to text with plot analysis since most movies are pretty much the same.

Sure, it wont be lossless but replacing a few unknown actors with famous ones
and having a few accidental happy endings seems entirely reasonable.

------
m3at
Related for an other domain, lossless text compression using LSTM:
[https://bellard.org/nncp/](https://bellard.org/nncp/)

(this is by Fabrice Bellard, one wonder how he can achieve so much)

------
Animats
This is a lot like "waifu2x".[1] That's super-resolution for anime images.

[1] [https://github.com/nagadomi/waifu2x](https://github.com/nagadomi/waifu2x)

------
asciimike
Reminds me of RAISR ([https://ai.googleblog.com/2016/11/enhance-raisr-sharp-
images...](https://ai.googleblog.com/2016/11/enhance-raisr-sharp-images-with-
machine.html)).

I remember talking with the team and they had production apps using it and
reducing bandwidth by 30%, while only adding a few hundred kb to the app
binary.

------
LoSboccacc
and what's the size of the neural network you have to ship for this to work?
has anyone done the math on the break even point compared to other compression
tools?

e: actually a better metric would be how much does it compress compared to
doing the resolution increase with just lanczos in place of the neural net and
keeping the Delta part intact

------
nojvek
Does anyone know how much better the compression ratio is compared to png?
Which is also a lossless encoder.

------
hinkley
I wonder how well this technique works when the depth of field is infinite?

Out of focus parts of an image should be pretty darned easy to compress using
what is effectively a thumbnail.

That said, the idea of having an image format where 'preview' code barely has
to do any work at all is pretty damned cool.

------
tjchear
Would massive savings be achieved if an image sharing app like say, Instagram
were to adopt it, considering a lot of user-uploaded travel photos of popular
destinations look more or less the same?

~~~
chickenpotpie
My guess is that it would be much more expensive unless it's a frequently
accessed image. CPU and GPU time is much more expensive than storage costs on
any cloud provider.

~~~
kevinventullo
Wouldn't it be cheaper if the image is _infrequently_ accessed? I'm thinking
in the extreme case where you have some 10-year-old photo that no one's looked
at in 7 years. In that case the storage costs are everything because the
marginal CPU cost is 0.

~~~
chickenpotpie
It depends if the decompression is done on the server or on the client. If the
client is doing the decompressing it would be better to compress frequently
accessed images because it would lower bandwidth costs. If the server does the
decompressing it would be better for infrequently accessed images to save on
CPU costs.

------
fxtentacle
I believe a big issue with this will be floating point differences. Due to the
network being essentially recursive, tiny errors in the initial layers can
grow to yield an unrecognizably different result in the final layers.

That's why most compression algorithms use fixed point mathematics.

There are ways to quantizise neutral networks to make them use integer
coefficients, but that tends to lose quite a lot of performance.

Still, this is a very promising lead to explore. Thank you for sharing :)

------
eximius
Is this actually lossless - that is, the same pixels as the original are
recovered, guaranteed? I'm surprised such guarantees can be made from a neural
network.

~~~
tasty_freeze
The way many compressors work is based on recent data, they try to predict
immediately following data. The prediction doesn't have to be perfect; it just
has to be good enough that only the difference between the prediction and the
exact data needs to be encoded, and encoding that delta usually takes fewer
bits than encoding the original data.

The compression scheme here is similar. Transmit a low res version of an
image, use a neural network to guess what a 2x size image would look like,
then send just the delta to fix where the prediction was wrong. Then do it
again until the final resolution image is reached.

If the neural network is terrible, you'd still get a lossless image recovery,
but the amount of data sent in deltas would be greater than just sending the
image uncompressed.

~~~
eximius
Ah, I understand! I wasn't aware that was how they worked!

------
jbverschoor
I though superresolution uses multiple input files to "enhance". For example -
extracting a highres image from a video clip

~~~
s_gourichon
They reformulate the decompression problem in the shape of a supperresolution
problem conforming to what you just wrote. Instead of getting variety through
images of a video clip they use the generalization properties of a neural
network.

"For lossless super-resolution, we predict the probability of a high-
resolution image, conditioned on the low-resolution input"

------
ackbar03
This is interesting but I'm not sure if the economics of it will ever work
out. It'll only be practical when the computation costs become lower than
storage costs

~~~
The_Colonel
Think youtube or netflix - it's compressed once and then delivered hundred
million times to consumers.

~~~
ackbar03
but if its something that's requested / viewed a lot thats probably something
don't want to be compressing/decompressing all the time. Neural networks still
take quite a lot of computational power and require GPUs.

If its something you don't necessarily require all the time its still probably
cheaper to just store it instead of run it through a ANN. You just need to
look at the prices of a GPU server compared with storage costs on AWS and the
estimated run time to see there is still a large difference.

I mean I could be wrong (and I'd love to be since I looked at a lot of SR
stuff before) but that's sort of the conclusion I reached before and I don't
really see anything has significantly changed since

~~~
The_Colonel
> but if its something that's requested / viewed a lot thats probably
> something don't want to be compressing/decompressing all the time.

You're compressing only once. Decompression is usually way less expensive and
that happens on the client, so without additional cost to netflix/youtube.

Of course this does not mean youtube must use it on all the videos including
the ones with 10 views.

------
dvirsky
How do ML based lossy codecs compare to state of the art lossy compression?
Intuitively it sounds like something AI will do much better. But this is
rather cool.

~~~
MiroF
They perform better, from what I've read.

~~~
qayxc
Depends entirely on your definition of "better".

In terms of quality vs bit rate, ML-based methods are superior.

In terms of computation and memory requirements, they're orders of magnitude
worse. It's a trade-off; TINSTAAFL.

~~~
MiroF
> memory requirements

Agreed, although this bit is unclear - the compressed representations of the
ML-based methods take up much less space in memory than traditional methods,
but yes - the decompression pipeline is memory-intensive due to intermediary
feature maps.

------
slaymaker1907
Looks like FLIF has a slight edge on compression ratio according to the paper,
but it beats out other common compression schemes which is impressive.

------
Animats
How does it work for data other than Open Images, if trained on Open Images?
If it recognizes fur, it's going to be great on cat videos.

------
pbhjpbhj
It seems like "lossless" isn't quite right; some of the information (as
opposed to just the algo) seems to be in the NN?

Is a soft-link a lossless compression?

It's like the old joke about a pub where they optimise by numbering all the
jokes, .. just the joke number isn't enough, it can be used to losslessly
recover the joke, but it's using the community storage to hold the data.

~~~
chickenpotpie
As long as you get back the exact same image you put in, it's lossless.

~~~
yters
In that sense, I can losslessly compress everything down to zero bits, and
recover the original artifact perfectly with the right algorithm.

~~~
ebg13
You're ignoring two things:

1) that the aggregate savings from compressing the images needs to outweigh
the initial cost of distributing the decompressor.

2) to be lossless, decompression must be deterministic an unambiguous, so you
can't compress _everything_ down to zero bits; you can compress only _one_
thing down to zero bits, because otherwise you wouldn't be able to
unambiguously determine which thing is represented by your zero bits.

~~~
yters
In each case I pick out an algorithm beforehand that will inflate my zero bits
to whatever artifact I desire.

~~~
ebg13
"I will custom write a new program to (somehow) generate each image and then
distribute that instead of my image" is not a compression algorithm. But I
think you'd do well over at the halfbakery.

~~~
yters
It works well if it's the only image!

~~~
ebg13
Now you're chasing your own tail. You've gone from "I can losslessly compress
everything" to "I can losslessly compress exactly one thing only".

~~~
yters
I'm arguing it's the same as this image compression technique. They rely on a
huge neural network which must exist wherever the image is to be decompressed.

If I'm allowed to bring along an unlimited amount of background data, then I
can compress everything down to zero bits.

In contrast, an algorithm like LZ78 can expressed in a 5 line python script
and perform decently on a wide variety of data types.

~~~
ebg13
> _If I 'm allowed to bring along an unlimited amount of background data, then
> I can compress everything down to zero bits._

If by "background data" you mean the decompressor, this is patently false. No
matter how much information is contained in the decompressor (The Algorithm +
stable weights that don't change), you can only compress one thing down to any
given new representation ( low resolution image + differential from rescale
using stable weights ).

If by "background data" you mean new data that the decompressor doesn't
already have, then you're ignoring the definition of compression. Your
compressed data is all bits sent on the fly that aren't already possessed by
the side doing the decompression regardless of obtuse naming scheme.

> _I 'm arguing it's the same as this image compression technique._

That's wrong, because this scheme doesn't claim to send a custom image
generator instead of each image, which is what you're proposing.

