Hacker News new | past | comments | ask | show | jobs | submit login
Pixel Recursive Super Resolution (arxiv.org)
107 points by somerandomness on Feb 6, 2017 | hide | past | favorite | 46 comments



I hate to be a debbie downer but the results don't look particularly great to me. For comparison, see "AffGAN" [1] and "LAPGAN" [2] for better (IMO) super-resolution results using GAN-like techniques. Granted, all three papers are applied in somewhat different settings (different input/output resolutions, different datasets), so direct comparison is difficult.

[1] https://openreview.net/pdf?id=S1RP6GLle

[2] https://arxiv.org/abs/1506.05751


Are there any turnkey, easy to use software packages that use LAPGAN or AffGAN for image enhancement? Or are these techniques purely in the research realm at this point?


Does research always have to yield a result that's better than any other approach? In my understanding it's worthwhile to pursue different ways of approaching a problem even if some of them don't work as well as others (but you don't know that beforehand).


Basically, this is a survial law in the academic jungle even if depressing.


You're not the only one who feels this way; I wasn't very impressed by the adversarial-network synthesized moving GIFs, that everyone went crazy over a few months ago, either.


It's really cool to see a writeup for this on real-life images. Similar work at Pinterest http://engineering.flipboard.com/2015/05/scaling-convnets/, for Anime scaling https://github.com/nagadomi/waifu2x and for sprite scaling (can't find the reference I'm thinking of).

This tech has been around for several years, and some variation was presented in concert with the Boston Marathon investigation. https://arstechnica.com/information-technology/2013/05/hallu...

(Not clear if this was used as part of the investigation, or if it could be used for future investigations)


Kind of alarming to see this being proposed as an investigative tool. Isn't the entire point of this area of image processing that the network is creating plausible information where there is none? That's great for creating higher-resolution versions of entertainment assets, but it would seem categorically inappropriate for forensic science.


Exactly. It's not extracting information, it is (as the Boston Marathon link says) hallucinating the additional information. It's an artist's impression, not CSI's magic 'enhance'.

In terms of the danger of this kind of image confabulation, it seems similar to the block-based compression on Xerox photocopiers which sometimes changed numbers in scanned documents: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...


I expected that to be a 1970s computing shanty but D.Kriesel spotted it in 2013 and it took until 2015 to get it properly patched. Incredible.


> it would seem categorically inappropriate for forensic science

It would be absolutely wrong use, as it substitutes some information that the software has learned before and combined instead of the non-existing one in the original pixels.

If the software was trained with the picture of the innocent person, it would produce it instead of the picture of the really guilty person. It can only guess and only guess based on with what it was trained with.


It's bad enough when people don't notice that they filled in the gaps based on pure speculation and prejudice, it will be even worse when the same happens with the "authority" of a computer.

Maybe there could be a moral obligation for the designers of the training set: trained with the right data, if you say "enhance" once too often the model might overfit into a scene from Star Trek ("after shooting, video footage shows man in command uniform firing at random redshirts, people in medical uniforms miraculously unharmed").


However, it is possible to extract high-resolution images from multiple frames of low-resolution video.


Yep. Happens because the phase of the sampling grid changes on different frames, which effectively increases resolution.


that's also why 1080i effective resolution is more than simply 540 lines of resolution (1080i for a sitcom has a higher effective resolution than 1080i watching a a sporting even, which is also why I think ESPN chose to transmit it 720p, at least for some time, as opposed to 1080i, but could be wrong about that)


Definitely glad to see Waifu2x mentioned here, in the age of increasingly common paywalled full-resolution "imagery" (ahem..), it's a godsend to anyone who isn't using a monitor from the early 00's.

It's also nice to just add a little extra quality to older... imagery. Works surprisingly well for non-2D stuff too.


Interesting tid-bit: First author is Ryan Dahl, creator of NodeJS, now a Google Brain Resident.


I caught that. I was wondering what ry was up to these days!


Another recent "super resolution" method (RAISR) from Google Research:

https://arxiv.org/abs/1606.01299

https://research.googleblog.com/2016/11/enhance-raisr-sharp-...

>Given an image, we wish to produce an image of larger size with significantly more pixels and higher image quality. This is generally known as the Single Image Super-Resolution (SISR) problem. The idea is that with sufficient training data (corresponding pairs of low and high resolution images) we can learn set of filters (i.e. a mapping) that when applied to given image that is not in the training set, will produce a higher resolution version of it, where the learning is preferably low complexity. In our proposed approach, the run-time is more than one to two orders of magnitude faster than the best competing methods currently available, while producing results comparable or better than state-of-the-art.

>A closely related topic is image sharpening and contrast enhancement, i.e., improving the visual quality of a blurry image by amplifying the underlying details (a wide range of frequencies). Our approach additionally includes an extremely efficient way to produce an image that is significantly sharper than the input blurry one, without introducing artifacts such as halos and noise amplification. We illustrate how this effective sharpening algorithm, in addition to being of independent interest, can be used as a pre-processing step to induce the learning of more effective upscaling filters with built-in sharpening and contrast enhancement effect.


BTW, if you're interested in using advanced scaling methods in GPU-accelerated video playback, check out the madVR and MPDN projects:

http://forum.doom9.org/showthread.php?t=146228

http://forum.doom9.org/showthread.php?t=171120


Are there any simple explanations of this technique? The paper is a bit dense in some parts.


Forgive me if I've missed something here, but these where only trained against synthetic images (images that where scaled down using various formula). Due to this, I'd expect this to not work as well as it could on actual images taken by sensors.

Do any datasets even exist where the images are at sensor pixel level?

That way the model would 'know' about imaging effects (I can't think of any specifically mechanical effects that could be in play here right this second) etc?

Or am I way off base here....


No, I think you are correct. I think the result for the CelebA dataset is a toy. But many results in this area are toys, e.g. deep dream.


You could exploit debayering artefacts.

In fact I wonder if any imaging sensor vendors run R&D trying to come up with novel neural net based debayering approaches - this could be a cheap way of bumping image quality/perceived resolution.


Zoom! Enhance!

https://youtu.be/LhF_56SxrGk

(sorry couldn't help myself)



I usually downvote cliches but in this case it's an absolutely appropriate thing to cite.


I wonder, could you craft a shader for tree-foliage from this?

Given the Background, and the leave texture + alpha, instead of rasterizing, anti-aliazing and then using z-baked lightsources and probe reflections to light it semi-correctly, what would a neural net implementation look like?

Would you even notice the mistakes in a constant flickering scenery like this?


There seems to be an error on figure 7 in the third row: The face image for "ground truth" is a duplicate of the "Ours" result.


Specific observations like this are best forwarded to the authors, who are unlikely to see your (entirely valid) observation here.


Any actual implementations of this (or other superresolution algorithms) to play with?


There's the waifu2x project [1] which also uses neural networks for super resolution. There's also the MPDN extensions project [2] which has various kinds of image scaling methods. It depends what kind of algorithms you want to play with really.

[1]: https://github.com/nagadomi/waifu2x

[2]: https://github.com/zachsaw/MPDN_Extensions


Amazing, but also a bit scary. This will surely be used for retroactive identification from existing photographs, and will be a free gift for authoritarian law enforcement. Of course such extrapolative technologies are subject to challenge, but criminal juries have a tendency to accept forensic claims at face value notwithstanding their actual scientific reliability. It's partly because of this that if I ever found myself on trial for a crime I didn't commit I'd probably waive my right to a jury trial - laypeople are far too easily fooled.


The results are synthesised/generated - there is no way to use this for face recognition from low res images because the result, while plausible, is not real


So what? That's never stopped people before. If it's good enough to be useful, it will be used. that's how things are in the real world.

http://www.livescience.com/49929-faulty-forensic-science-fai...

https://ncforensics.wordpress.com/2013/03/04/thousands-of-ca...


I wonder if something like this could be used to boost the fidelity of radio signals that are picked up by SDR (software defined radio).

Here are some manual techniques people currently use to hunt signals on SDR.[1] A lot of what they do is visual, and enhanced visual fidelity of potential signals would definitely be a big help, if it worked.

[1] - https://www.youtube.com/watch?v=9fXnwkK2kQI


So far I think this has only been optimizied for anime girls http://waifu2x.udp.jp/


It works surprisingly well for non-anime-styled stuff too. Just about any 2D art with decently-defined edges upscales beautifully.


They don't have a comparison to GANs which is weird.


Imagine how could it would be if this would be implemented in JS and used in website to increase the resolution of low-quality pictures.


Techanically, the work should be compared with the well-known GAN SR work cited as [18] in it to show its power.


The celeb sets look like nightmare fuel.


nice. wish pixelCNN/wavenet wasn't so computationally heavy to train and run


We'll have an Intel vs NVidia arms race kicking off this year. And... There are probably some major algorithmic speedups on the table still.


Nvidia v AMD, Nvidia will win

Intel doesn't make GPUs.


No, but they intend to go head to head on deep learning performance. https://newsroom.intel.com/news-releases/intel-ai-day-news-r...


Zoom. Enhance.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: