
Neural Enhance – Super Resolution for images using deep learning - eejr
https://github.com/alexjc/neural-enhance
======
ENGNR
We enhanced the image like on CSI and look, the defendants face!

"Because my photos were used heavily in the dataset..."

Jury: So guilty

~~~
dr_zoidberg
Defense should train a network with faces of the jury and then show how the
same technique, run by their biased network, now shows each of them in the
scene of the crime :)

------
nullc
Comparison using nearest neighbor, instead of a more reasonable linear filter,
or-- heaven forbid-- some edge basic directed interpolator... is a little
cheaty.

~~~
dharma1
Agreed, it would have been nice to show other upscaling algorithms. But neural
net super resolution generators can still have significantly more detail at
4-8x, as shown here
[http://arxiv.org/abs/1609.04802](http://arxiv.org/abs/1609.04802)

~~~
nuclai
(Author here.) Yeah, I knew this would come up but decided to proceed with the
pixelated comparison anyway. I couldn't get the GIFs to reflect the results
because of 8-bit quantization/dithering. The images show the neural network
inputs and outputs, not a comparison with other super-resolution algorithms
(still fascinating :-).

I'm working on the Docker instance now, that should help anyone with
interest/experience in the field compare results easily.

------
Leynos
A friend of mine suggested that an approach similar to this could be used to
upscale old standard definition TV shows (specifically, those shot on video
rather than film). I'd imagine that multiple specially trained networks would
be employed for different parts of the image (trained on pictures of
individual performers or types of set/background). Pleased to see that this is
possible. Is there anyone doing something along those lines already?

~~~
Mithaldu
It should also be possible to train it on itself to improve moving scenes by
using the motion itself as temporal super-sampling, just like the human eye
does.

~~~
mm_alex
this works quite well, and does not necessarily require any NN/machine
learning. see the youtube for this paper
[https://www.disneyresearch.com/publication/scenespace/](https://www.disneyresearch.com/publication/scenespace/)
tldr simple brute force weighted average of samples from many frames, combined
with a noisy/low quality depth-from-motion estimate can be used to de-noise,
increase resolution and otherwise manipulate video footage. very cool paper
with great results from a simple technique.

------
ingenter
Here's a list of various image interpolation techniques, with similar goal:

\-
[http://www.wisdom.weizmann.ac.il/~vision/SingleImageSR.html](http://www.wisdom.weizmann.ac.il/~vision/SingleImageSR.html)

\-
[http://chiranjivi.tripod.com/EDITut.html](http://chiranjivi.tripod.com/EDITut.html)

\-
[http://www.tecnick.com/pagefiles/appunti/iNEDI_tesi_Nicola_A...](http://www.tecnick.com/pagefiles/appunti/iNEDI_tesi_Nicola_Asuni.pdf)

\-
[http://www.eurasip.org/Proceedings/Eusipco/Eusipco2009/conte...](http://www.eurasip.org/Proceedings/Eusipco/Eusipco2009/contents/papers/1569192778.pdf)

\- [http://bengal.missouri.edu/~kes25c/](http://bengal.missouri.edu/~kes25c/)

[http://bengal.missouri.edu/~kes25c/nnedi3.zip](http://bengal.missouri.edu/~kes25c/nnedi3.zip)

[http://forum.doom9.org/showthread.php?t=147695](http://forum.doom9.org/showthread.php?t=147695)

\-
[http://arxiv.org/pdf/1501.00092v2.pdf](http://arxiv.org/pdf/1501.00092v2.pdf)

[http://waifu2x.udp.jp/](http://waifu2x.udp.jp/)

[https://github.com/nagadomi/waifu2x](https://github.com/nagadomi/waifu2x)

[http://waifu2x-avisynth.sunnyone.org/](http://waifu2x-avisynth.sunnyone.org/)

[https://github.com/sunnyone/Waifu2xAvisynth](https://github.com/sunnyone/Waifu2xAvisynth)

\- [http://i-programmer.info/news/192-photography-a-
imaging/1010...](http://i-programmer.info/news/192-photography-a-
imaging/10100-no-comment-super-resolution.html)

[https://github.com/david-gpu/srez](https://github.com/david-gpu/srez)

\-
[http://arxiv.org/pdf/1609.04802v1.pdf](http://arxiv.org/pdf/1609.04802v1.pdf)

~~~
andai
My question is offtopic, but how do you keep lists of urls like that? Do you
just use text files? I'm struggling with too much to read

~~~
oh_sigh
The key when you have too much to read is losing links, not retaining them
better.

------
placebo
It definitely makes a significant qualitative improvement, making the picture
appear more in sync with what our brain interprets as a higher resolution
picture, but my first thought is whether this particular example goes beyond
aesthetics. Is there really any instance where this method could for instance
turn an unintelligible picture of a license plate to something in which the
characters can be recognised? More generally, I wonder whether there has been
any research on the limits - i.e, what needs to be the combined minimal size
of the information stored in the neural network plus the information on its
inputs before the output can be said to be true to the source with probability
x ?

~~~
PeterisP
Well, it doesn't create any information that wasn't in the original data
(nothing can do that, you can only lose information in processing) so if e.g.
the characters can be recognized in the processed image of a licence plate,
then by definition they could have been recognized from the original data as
well in some manner.

However, they can make things more easily interpretable by _humans_. A rough
analogy is turning up the contrast - given a very dark image of licence plate
where the black parts are totally black (#000000) and white parts are just
very dark (#010101), the characters definitely can be recognized even while
human in normal conditions would just see it as totally black, and processing
would help.

~~~
placebo
> Well, it doesn't create any information that wasn't in the original data
> (nothing can do that, you can only lose information in processing)

I'm not sure this is correct. In a sense, it _does_ contain information that
wasn't in the original inputs - i.e information added by the weights in the
neural network which itself was obtained by information extracted from an
enormous amount of previous samples. Of course, the largest and best trained
neural network won't be able to tell the license number given 2 pixels of
information, but I am curious as to the theoretical limits of what can be
achieved in extreme cases of with very little information as input and a
neural network that has almost limitless resources.

------
anilgulecha
This is amazing. The surprise is that while the higher resolution images seem
real, they are reconstructions based on the previous learning, and can be very
different from the actual.

Nice test images to include would have been an original image, downsampled
image, and the reconstructed image. If the author is reading this, could they
add this to the README?

Sci-fi on TV is making it to the real world :)

~~~
ehsanu1
Would also be interesting to see some pixel art run through this. It probably
won't work that well given that its trained on real downsampled photos though,
but who knows.

------
kartan
This technique is akin to hire an artists to draw a high resolution version of
your pixelated photos.

A good example of this is "They Of The Tentacle Remastered"
([http://dott.doublefine.com/](http://dott.doublefine.com/)). The new game
looks extremely similar to the old one but it has been redrawn.

As some one suggested you should be able to take an old TV show, train the
neural network with HD pictures of the cast. And let it redraw it in its own
"artistic" interpretation of the images.

------
WhitneyLand
This approach can be equaled or bettered with no machine learning.

This example allows easy comparison between common techniques. Choose image 7
to see an example with a person:
[https://dl.dropboxusercontent.com/u/2810224/Homepage/publica...](https://dl.dropboxusercontent.com/u/2810224/Homepage/publications/2015/SuperResolution_CVPR_2015/supp/Urban_SRF_4.html)

~~~
nuclai
(Author here.) Did you see the faces example on the GitHub page? It was a
domain-specific network trained adversarially for that purpose, but I have yet
to see any super-resolution of that quality with or without machine learning.

Most other approaches don't even try to inject high-frequency detail into the
high-resolution images because the PSNR/SSIM benchmarks drop. Until those
metrics/benchmarks are dropped, there'll be little more progress in super-
resolution.

------
hcarvalhoalves
The example w/ the Japanese ideograms is impressive, it seems to actually make
a difference on readability.

[https://github.com/alexjc/neural-
enhance/blob/master/docs/St...](https://github.com/alexjc/neural-
enhance/blob/master/docs/StreetView_example.gif)

------
eigengrau
How does this compare to waifu2x?

------
kuschku
How much RAM do I need to run this?

It dies for me with a MemoryAllocation Error after eating 28GB of RAM…

~~~
andai
Wow how much do you have?

~~~
kuschku
On this system I’ve got 32GB, of which about 2GB were used by the OS itself,
and another 2GB by firefox, that’s why it stopped at around 28GB.

~~~
nuclai
(Author here.) Maybe it's worth moving to a GitHub issue. Try `--model=small`.
The demo server limits the number of pixels to around 320x200 or 256x256 and
can do only 4 at the same time to fit in RAM.

------
beautifulfreak
I do photo restorations on Reddit, where people often submit blurry photos
that sharpening just can't fix. It would be great if this were offered as an
online service.

~~~
nuclai
Yes, it sounds possible with this code — but would require training a new
network. Do you have a link to some examples?

~~~
beautifulfreak
Deblurring requests turn up frequently on
[https://www.reddit.com/r/estoration/](https://www.reddit.com/r/estoration/)
and
[https://www.reddit.com/r/picrequests/](https://www.reddit.com/r/picrequests/)

------
thenomad
This looks amazing.

Question for the more experienced deep learning folk: if I wanted to use this
to upscale textures for a game, would I have to train it on the same _type_ of
texture? In other words additional wood textures when upscaling wood, brick
when upscaling brick textures, and so on?

~~~
nuclai
(Author here.) If you have the luxury to train on domain-specific textures,
the results will definitely be better. That's why I included all the training
code in the repository as well—to allow for this kind of solution.

If you scroll down on GitHub to see the faces examples, those are achieved by
a domain-specific network. I suspect you'll similarly get extremely high-
quality if you have good input images.

------
manav
I've seen a number of neural network approaches for super-resolution like
waifu, but I haven't seen something general purpose thats better than
bicubic/fourier/nearest neighbor.

Would be nice if the author did a comparison.

~~~
nuclai
(Author here.) My biggest insight from this project is that super-resolution
with neural networks benefits significantly from being domain specific. If you
train on broader datasets, it does pretty well but has to make compromises.
Many recent papers do a comparison in terms of pixel similarity (PSNR/SSIM),
and using those metrics the quality drops because high-frequency detail is
punished under those criteria (even though it may look better perceptually).
Reference: [http://arxiv.org/abs/1609.04802](http://arxiv.org/abs/1609.04802)

On GitHub, below each GIF there's a demo comparison, but on the site you can
also submit your own to try it out (click on title or restart button). Takes
about 60s currently; running on CPU as GPUs are busy training ;-)

~~~
webmaven
_> super-resolution with neural networks benefits significantly from being
domain specific. If you train on broader datasets, it does pretty well but has
to make compromises._

To what extent could the need for this trade-off be overcome with a larger
network?

------
return0
Train this using a huge facial database such as the one US immigration holds
and you have the perfect human detector, able to identify you even from
nighttime security cameras.

------
shash7
Granted it won't be actually sharpening the images but for 99% of the use
cases it would be awesome!

~~~
nuclai
(Author here.) Unlike most other non generative adversarial network (GAN)
approaches to super-resolution, it does try to inject high-frequency detail;
see the faces example on GitHub. But I tuned down that parameter in the
released models a bit so it performed better generally.

~~~
jamesluo
Hi, what was the parameter you used for that face example, it's really
impressive.

