
New AI Imaging Technique Reconstructs Photos with Realistic Results - dsr12
https://news.developer.nvidia.com/new-ai-imaging-technique-reconstructs-photos-with-realistic-results/
======
exhilaration
In case you only read the HN comments, don't miss the amazing video from the
article:
[https://www.youtube.com/watch?v=gg0F5JjKmhA](https://www.youtube.com/watch?v=gg0F5JjKmhA)

~~~
acmecorps
I thought I was the only one. Came for the news, stayed for the comments.

------
SlowRobotAhead
The example with the eyes being removed works nicely on the attractive girl,
it adds sexy eyes back in.

Works less nicely on the old man, where it adds the same sexy eyes.

~~~
azinman2
The left eye (her right) doesn’t look realistic to me

~~~
apendleton
It looks weird when they only do one, but when they do both it becomes
symmetrical again and looks less weird.

------
Viliam1234
Here is some prior art:
[https://en.wikipedia.org/wiki/Censorship_of_images_in_the_So...](https://en.wikipedia.org/wiki/Censorship_of_images_in_the_Soviet_Union#/media/File:Soviet_censorship_with_Stalin2.jpg)

------
lepouet
Adobe often show things like that, maybe a little less advanced... This is
nice with a small picture in a video, but for now you always end up
disapointed when you try on high res pictures where quality is important...

~~~
fudged71
Having used photoshop content-aware-fill quite a bit, what was very
interesting in the video with the eyes and the hairline was that it wasn't
just filling in with what was already in the image, but also what it thought
should be in the image. It's not just locally aware but also... world aware
(if given the proper training data).

------
agumonkey
personal opinion: I feel we're reaching the existential uncanny valley.
computation can "do" everything and .. I feel dead inside thinking about it.

~~~
matte_black
The thing to remember about uncanny valley is that it is a valley, and once
you cross through you will start rising up again to the next peak. _That_ , is
where peaceful bliss awaits, where technology works so perfectly it is no
longer unsettling or scary. That is the goal, and that is where we are heading
more and more. The uncanny valley will soon be behind us.

~~~
arbie
Unless the edge of the valley we crossed was actually The Cliff of Insanity
and there _is_ no end to it.

------
state_less
This is pretty cool. I'm looking forward to the day that not only can we
reconstruct what was removed, but reconstruct what's not in the photo (image
sensor), but could be inferred. Can we reconstruct the 3D scene and fill in
the parts the camera can't see? For example, if it's a front view of a person,
make a 3d model of a person and texture the backside clothing.

It seems like if we put together all the advancements these deep inference
engines are producing, we may be able to reconstruct a reality, and allow
people to walk around in it using VR glasses, or less advanced 3D shooter
style controls.

It could be like a live earth view, any photo with a timestamp can contribute
to this view and we'd create a sweet 4D model of earth across time and space.

~~~
vincnetas
This is not reconstruction. It's guessing from what 'i' have seen before. So
this would never reconstruct anything unseen. In a broad sense.

~~~
nartz
Partially correct, but not necessarily true. Speaking extremely high level,
the model is currently trained to fill in the blank space with something like
the _most probable_ options per pixels, based on the training set data.
However, it is conceivable that it could be trained to also, say, insert an
object into a scene, based on other characteristics found in the scene.
Inotherwords, it could be trained to maximize a joint goal, where the second
goal involved generating an object.

------
dschuetz
Whenever Nvidia issues a research paper on how well their products work for AI
applications like that in their paper I feel like instantly buying some of
their more expensive _scientific_ products.

------
dnautics
They keyframe in the YouTube link is creepy because when they obliterate the
eye section there's maybe weak linkage? They don't talk to each other and fill
in different eyes on each side.

~~~
daveguy
Keyframe is only half creepy. The left eye was the one filled in by the
algorithm. The right eye is filled in similarly a few seconds later. For human
faces it seems to substitute very generic replacements of features (and take
no more cue from the surrounding photo than: eye shaped thing should go here,
chin shaped thing should go here, brown hair should go here, etc. The video is
definitely worth watching. For human faces it seems to take less cue from
surroundings than inanimate scenes. Although it probably just seems that way
because we are so sensitive to peculiarities in images of the face.

Edit: it's somewhere between a surrounding texture fill and a semantic /
context based reconstruction. Texture fill would produce blank skin for an
eye. Ideal reconstruction would take into account appropriate wrinkles,
symmetry, expected bone structure. It works better for still life / scenes
than for faces.

------
JoeCoder_
How long until I can use this as a Photoshop plugin?

~~~
gefh
There's been content aware fill for a long time:
[https://youtu.be/Ge9jsJZ3lA0?t=245](https://youtu.be/Ge9jsJZ3lA0?t=245) It's
not the same backing tech but for practical purposes it's as good.

~~~
ttoinou
Current inpainting algorithm is good for removing wires or adding textures.
The new nVidia technique should be able to synthesize new information not in
the original image

------
bob_theslob646
I'm guessing a whole line of business will be created from software like this
existing? Some sort of validation engine which would verify if the photo is
realistic or shopped.

I understand that you can tell in some of the examples in the video, but as it
gets better, it maybe really hard to distinguish real from fake.

~~~
acdha
This industry already exists (search for “photo forensics”) and I'm certain
you're right to suspect it'll be booming soon. ML should also be good at
faking some of the characteristics which current tools use so we must be
looking at an arms race for years.

One of the big challenges I'm expecting will be the categories of attack: it
seems plausible that we'd be able to limit abuse in the case where the
original image is made by someone who isn't malicious by making it easy for
viewers to find the originals (perceptual hashing with some sort of
distributed ledger or signature system, and getting major services like
Facebook to use it) but I haven't seen a convincing suggestion for how to deal
with the case where the original is created by the attacker and thus any
validation system would only show what they submitted it with. It seems like
that'd fall back on much more failure-prone techniques — e.g. you could rely
on public information to convince most people that, for example, Barack Obama
didn't pledge allegiance to ISIS at a public rally but most people aren't
going to have enough rock-solid documentation to prove a negative. If an
attacker said that politician X was having an affair in hotel room it'd
probably seem convincing to many people unless they screwed up and left proof
of e.g. using stock footage, landmarks from a different city, wrong time of
year, etc.

------
hogehoge
Try it [http://blog-
imgs-80.fc2.com/o/p/t/opticalillusion48/shinoda_...](http://blog-
imgs-80.fc2.com/o/p/t/opticalillusion48/shinoda_03.png)

------
potlee
Can someone who understands this tell us what is the catch here?

~~~
johndough
You'll probably hit memory limits if you go much beyond 512x512-sized holes.

Additionally, computation times grow quickly with higher resolution and you
already need a high end GPU for this resolution to get a reasonably
interactive response time.

You'll also need a favorably licensed pretrained model or a few 10000 training
images and masks.

So all in all, I can't see any deal breakers, but I'd probably still use
PatchMatch instead.

~~~
rhcom2
For reference the GPU they're using for this paper is the NVIDIA V100 GPU, a
datacenter GPU costing $8,000.

~~~
penagwin
To be fair, while V100 perform very very well for machine learning, you can
buy almost a dozen 1080ti's or a few titans (whatever the current one is),
which would certainly be much faster.

They say they used V100 but not how many, if they needed a large number then
nevermind.

~~~
rhcom2
The paper says they only ran it on a single V100, I was expecting multiple
GPUs as well.

------
ashraymalhotra
How would it work if the mask was covering the mouth region? Afaik teeth have
always been difficult to reproduce...

------
devit
Any actual software?

Seems amazing for face shot retouching.

------
subcosmos
They're at it a-GAN!

~~~
merpnderp
GAN is just a brilliant concept. I look forward to all the new tools to come
out of this.

~~~
p1esk
They didn't use GAN for this.

~~~
gmiller123456
While I didn't read the paper in detail, there are a lot of papers in the
references section about GANs. And referring back to the text that cites them
makes it pretty obvious they did use GANs.

~~~
p1esk
Perhaps you should read the paper _in detail_ before arguing. Eq. 7 makes it
_pretty obvious_ there's no discriminator.

~~~
subcosmos
it's just a PUN guys

and yes, this is just an acronym

------
khuss
Nice breakthrough. What could be some potential applications?

~~~
jackweirdy
The first thing that comes to mind is restoring the images that were recovered
(damaged) at the end of this process (follow through to part 3 for full
examples of images):

[http://www.bbc.co.uk/rd/blog/2017-12-morecambe-wise-video-
fi...](http://www.bbc.co.uk/rd/blog/2017-12-morecambe-wise-video-film-archive-
restoration)

[https://www.bbc.co.uk/rd/blog/2017-12-morecambe-wise-
video-x...](https://www.bbc.co.uk/rd/blog/2017-12-morecambe-wise-video-xray-
microtomography)

[http://www.bbc.co.uk/rd/blog/2017-12-morecambe-wise-film-
rec...](http://www.bbc.co.uk/rd/blog/2017-12-morecambe-wise-film-recovery-
processing-algorithm)

------
ashraymalhotra
Is the code for this open source?

