New AI Imaging Technique Reconstructs Photos with Realistic Results (nvidia.com)
219 points by dsr12 9 months ago | hide | past | web | favorite | 70 comments

In case you only read the HN comments, don't miss the amazing video from the article: https://www.youtube.com/watch?v=gg0F5JjKmhA

I thought I was the only one. Came for the news, stayed for the comments.

The example with the eyes being removed works nicely on the attractive girl, it adds sexy eyes back in.

Works less nicely on the old man, where it adds the same sexy eyes.

It's trained on celebrities, hence the bias :p

Note it's not any old man, it's Ernest Borgnine, who played Dominic Santini in Airwolf.

The left eye (her right) doesn’t look realistic to me

It looks weird when they only do one, but when they do both it becomes symmetrical again and looks less weird.

Adobe often show things like that, maybe a little less advanced... This is nice with a small picture in a video, but for now you always end up disapointed when you try on high res pictures where quality is important...

Having used photoshop content-aware-fill quite a bit, what was very interesting in the video with the eyes and the hairline was that it wasn't just filling in with what was already in the image, but also what it thought should be in the image. It's not just locally aware but also... world aware (if given the proper training data).

personal opinion: I feel we're reaching the existential uncanny valley. computation can "do" everything and .. I feel dead inside thinking about it.

The thing to remember about uncanny valley is that it is a valley, and once you cross through you will start rising up again to the next peak. That, is where peaceful bliss awaits, where technology works so perfectly it is no longer unsettling or scary. That is the goal, and that is where we are heading more and more. The uncanny valley will soon be behind us.

Unless the edge of the valley we crossed was actually The Cliff of Insanity and there _is_ no end to it.

I don' know so much about valleys and such. I just think about how people rely on visual and audio evidence and how we might reach a place where it can be hard to guarantee that we are not being manipulated with doctored materials. This may ultimately be the utility of any kind of technology that documents all steps in processing from camera capture on. (erm blockchain?)

I feel that I would be waiting in the queue to buy a GPU to expand my own brain someday.

I'm super not sold on this. I've learned so much without any help, even when crippled by illness. I believe the superhuman thing is just a marketing trick to pursue the tech growth.

Computers are clearly different and complementary to our own brains. We most definitely don't know how to replace our brains with better hardware, but having some kind of implant that would allow me to maintain a perfect calendar in my head instead of on my phone or computer would be really helpful, as would being able to do complex arithmetic a the speed of thought instead of the speed of typing equations into a calculator/Mathmatica/etc. It's not about replacing anything we already have with something better, it's adding new tightly integrated features.

speed in mathematics always remind me of the Gauss legend for the Sum of 1 to N. If he had a computing machine, we may not know his formula :)

You'd like the short story 'Profession', by Isaac Asimov.

> I've learned so much without any help

"so much" by what standards? Our reference for knowledge acquisition is biased to what we perceive as our limits.

For example we've collectively spent several millennia improving our game of Go theory and knowledge, and then a machine trained for a few days discovered new ideas we hadn't considered.

Of course it's biased, and why do I care about the absolute scale of things ?

Would you buy a McLaren F1 to go grocery shopping ? this is how the mind expanding thing feels to me. Just clean and use your mercedes sedan.

Mind expansion is for those who want to go beyond "grocery shopping".

you're not alone.

Now it might be highly dependant on my age and life events. It shapes my perception of past, present and future. But I'm really not excited by the future tech sale. All I see is waste.

Well seeing all things are finite, including the universe itself, I'm not really bummed by the fact that eventually consciousness extremely more powerful than humans will exist.

But I feel that like you I think humanity has a really enormous value and beauty -- beauty that hasn't run its course yet and is perhaps partially independent of "raw intelligence".

I have a few human beings in particular that I admire and love so much that I'd be immeasurably bummed out if we were replaced by some kind of hyperintelligent simpleton.

It's still quite far though, I believe. I think humans are not too far from the practical physical limits of complex cognition. But almost surely the day will come that we'll be replaced with something else...


On a side note, I've been exploring different branches of philosophy to quench existential dread. Lately I've been more and more intrigued by Tegmark's 'Mathematical Universe Hypothesis'. My argument is, it is too absurd for this universe to be everything that will ever be or has been. Why this particular thing and why do we exist at all? The only plausible answers to me are two:

1) Brute fact, just accept and shut up. (Sean Carrol et al's view)

2) Everything that is mathematically possible exists (Max Tegmark et al).

I'm leaning toward believing (2) is more convincing atm. The unresolved problem is if you can define something of a probability distribution on the same of all possible universes that would explain ourselves as "roughly typical". Doesn't look like an insurmountable problem.

How does this relate to existential dread? Basically, we're just this one universe. Others exist (with greater or lower likelihood/numbers ?) with all sorts of things happening, all sorts of beings and variations of humans. It soothes me in a strange way.

Anyway, I guess I think way too much about existential questions, although by now I have some personal resolutions (some more some less satisfying) for most questions I've faced. The cosmological question aforementioned is a quite satisfying answer I've found, but there are even worse ones that still keep me up at night... (please don't ask me what they are unless you are a rationalism masochist like me!)

my thinking is quite simple:

- most of my deepest happy moments had probably few ties to technology and were mostly a blend of ignorant fueled magic (that is the "few".. I did enjoy video games to death as a kid[1]) and human relationship in plain nature.

- the more tech, the less of the previously mentioned I see. For the social marketing spread, it feels quite individualistic in effect (know more, see more, save more, even sharing more is often twisted into a shallow game of upvote)

therefore I'm not thrilled by tech

[1] I did love tech to bits but growing up this feeling vanished a lot. I also realize the beauty is really subjective. We feel that it's a more valueable thing because it's "hard science" based. chemistry, electronics... But when you peel the layers you realize that a lot of it was mostly socially driven and imagination. I often think that for people that lived in forest long ago, this feeling of beauty was probably here, but projected onto what they had there, plants, animals, gems.

Yea I get what you mean. I'd say you're a humanist, as I believe am I. But I have some counterpoints to offer.

I'd say the big part of your disillusionment is part of being american (in general living in cutthroat capitalism). The entire system is made for you to equate money with happiness; and tech/luxury is often put as the things you actually buy with said money to achieve said happiness.

That's of course complete bullshit. A convenient illusion carefully orchestrated by almost every institution and company directed at you -- every company wants you to consume from them (so they want you to equate their products and your money with happiness), and practically every institution wants you to make more money or give them more money.

In reality, happiness is more or less environmentally unconditional. You could be happy in practically every situations -- I find it likely that past generations which were less wealthy, had less stuff, etc. were not less happy in general, and in particular you can see there are some relatively poor countries near the top of world happiness index list[1], and there are much better explanatory variables than per capita GDP, like wealth inequality (explaining why the US isn't on top) and probably general quality of life like public healthcare and natural parks. And this index probably includes variables biased towards comfort and luxury (part of its definition of "well being"), which I believe don't necessarily make you happier. Happiness is mostly a "good" internal condition, an internal satisfaction. Endearing social relationships, not being stresses about a million things, a positive outlook on life, those sort of things are likely much more powerful than anything you could buy to guarantee happiness (and in general enabling those is very cheap). But the "system" doesn't want you to achieve this, otherwise your economic hyper-productivity is in jeopardy. You're not supposed to reach the carrot, it has to be permanently just outside your reach.

While that may seem like an attack on technology, this argument isn't related to technology at all. Technology is just being exploited as a carrot to make you believe it is the be-all, end-all of happiness, while usually all it can do is alter you external conditions, make you a little more comfortable, make you spend less energy. Nothing causally dictative on happiness. But technology still has value outside cutthroat economic competition.

Technology empowers you to change your internal state of mind. For the economy, this is a spurious, undesired, byproduct: they don't want to risk you steering away from the productivity paradigm. But nonetheless, technology is a tool (or the tool) to enable reasoning, good basic living conditions, and an environment for meaningful social interactions -- basis of a good definition of happiness.

Why is reasoning so important? Reasoning is what, in turn, enables you to change your state of mind reliably, to achieve, or strive to achieve, what is actually "good", to understand what happiness even is, to experience the world and its beauty. Beauty resides inside the observant mind, not in nature, the characteristics of nature (and various human constructs) are merely the subject of observation.

Here I make an assumption that knowing more can indeed show you how to be happy, and not actually just torment you. Since it seems you've face existential questions before, this assumption does seem shaky. But I firmly believe that the solution cannot be simply putting a blind and refusing the truth -- I believe that in the end those ailments are temporary and can be relieved by digging deeper. In the worst case, if in the end after ultimate contemplation, you've absolutely convinced yourself that some things are better not being known and not reasoned about, you can always find ways of passing this to others and attempt to forget. But I don't think this will ever be truly necessary. You can uncover the truths about what makes you really happy, what are the forms of happiness, what are the forms of pleasure, what is a truly "good" form of happiness (e.g. I wouldn't accept happiness from seeing the ruin of others even if instinctive), and how to achieve it.

Another point is that while our life is quite short, modern society enabled by technology enables is to uncover the basic truths about life, the universe and everything. That is inherently valuable in my opinion, it has a particular beauty. We're this puny form of life living in a tiny rock in a massive, possibly infinite, universe, and yet we've discovered the basic tenets of reality. We are not completely blind and oblivious. If life was even shorter, I might agree that one should not preoccupy with those considerations and just bask in hedonism, but I'm glad we have some time to dig deeper and really See.

I have more to say (and my points aren't perfectly articulated here), but I want to keep this reasonably short. I'll gladly continue the discussion later if you're interested.

In conclusion I plead you to see, and take action, that technology can be used contrary to the nihilistic nature of the "system" to really enable our humanity to shine. Let us not over glorify nature because although we are natural, so is this path (it is a form of evolution after all). In this sense we must fight nature to maintain humanity, because I truly believe we have something special going, even though we're not perfect. Care for other human beings, be cared back, and we can prolong this goodness for a long time.

My contribution to this is particion in an art and technology group: http://amudi.com.br/ But ultimately I think the only reliable way to assure this path is through particular forms of government, legislation and regulation (more akin to Swiss or Scandinavian democracies than the US).

[1] https://en.wikipedia.org/wiki/World_Happiness_Report

[2] You mentioned videogames, one of my fondest memories is playing an internet game, an mmorpg called Tibia. It enabled lots of interaction of all sorts between players, and many beautiful aspects like exploring mysteries, creating stories and adventuring -- and it is very much a product of 21st century technology.

Can't answer fully right now but first I do appreciate and share most of your thinking process, so I am.. very happy about it.

About universal truth, it's true, some knowledge has beauty (some mathematics, some physics etc). But I find there's always religion and false gods no matter how much knowledge has been uncovered by human societies. Societies aren't the individuals, and populations move influenced by what I still would call gods. Technology has become the latest one.

This is pretty cool. I'm looking forward to the day that not only can we reconstruct what was removed, but reconstruct what's not in the photo (image sensor), but could be inferred. Can we reconstruct the 3D scene and fill in the parts the camera can't see? For example, if it's a front view of a person, make a 3d model of a person and texture the backside clothing.

It seems like if we put together all the advancements these deep inference engines are producing, we may be able to reconstruct a reality, and allow people to walk around in it using VR glasses, or less advanced 3D shooter style controls.

It could be like a live earth view, any photo with a timestamp can contribute to this view and we'd create a sweet 4D model of earth across time and space.

This is not reconstruction. It's guessing from what 'i' have seen before. So this would never reconstruct anything unseen. In a broad sense.

Partially correct, but not necessarily true. Speaking extremely high level, the model is currently trained to fill in the blank space with something like the most probable options per pixels, based on the training set data. However, it is conceivable that it could be trained to also, say, insert an object into a scene, based on other characteristics found in the scene. Inotherwords, it could be trained to maximize a joint goal, where the second goal involved generating an object.

I think I follow what you're saying, we aren't going to be able to construct what aliens look like on a distant planet (or some other unknown). We're essentially looking at the information we have, which is why I expect continued interest in exploring for new information. Even as we continue to improve our reconstructions of 4D.

Like so:


There's similar work at (at least) Berkeley and Stanford.

That's cool. I liked the use of scene rendering to supply training data to the network.

It'd be nice to see texture prediction on some of the voxels, so painting the occluded voxels in the scene as well as texturing those in the image.

Texture accuracy could be measured by rendering the other side of the bed and see how close the texture predictions were.

Now this would be quite a challenge, but if you could train a network to give D, given RGB, you'd have RGBD and could maybe use internet video to create some structure. Use something like a SLAM algorithm to get camera position, then detect when a model is viewed from the occluded side and get a lot of texture prediction data using real world internet video.

Whenever Nvidia issues a research paper on how well their products work for AI applications like that in their paper I feel like instantly buying some of their more expensive scientific products.

They keyframe in the YouTube link is creepy because when they obliterate the eye section there's maybe weak linkage? They don't talk to each other and fill in different eyes on each side.

Keyframe is only half creepy. The left eye was the one filled in by the algorithm. The right eye is filled in similarly a few seconds later. For human faces it seems to substitute very generic replacements of features (and take no more cue from the surrounding photo than: eye shaped thing should go here, chin shaped thing should go here, brown hair should go here, etc. The video is definitely worth watching. For human faces it seems to take less cue from surroundings than inanimate scenes. Although it probably just seems that way because we are so sensitive to peculiarities in images of the face.

Edit: it's somewhere between a surrounding texture fill and a semantic / context based reconstruction. Texture fill would produce blank skin for an eye. Ideal reconstruction would take into account appropriate wrinkles, symmetry, expected bone structure. It works better for still life / scenes than for faces.

How long until I can use this as a Photoshop plugin?

There's been content aware fill for a long time: https://youtu.be/Ge9jsJZ3lA0?t=245 It's not the same backing tech but for practical purposes it's as good.

Current inpainting algorithm is good for removing wires or adding textures. The new nVidia technique should be able to synthesize new information not in the original image

I doubt Adobe's content aware fill could replace someone's eye or a few of the other examples in the video.

About -10 years ago (at least that's about when a gimp script doing a version of this was put out, Smart Remove / Heal Selection -- https://www.youtube.com/watch?v=3h1gZJsjKxs -- I recall Adobe demoing similar stuff at various points though I suppose this is the first time someone's used a generative neural net to fill things in rather than just nearby pixels)


I'm guessing a whole line of business will be created from software like this existing? Some sort of validation engine which would verify if the photo is realistic or shopped.

I understand that you can tell in some of the examples in the video, but as it gets better, it maybe really hard to distinguish real from fake.

This industry already exists (search for “photo forensics”) and I'm certain you're right to suspect it'll be booming soon. ML should also be good at faking some of the characteristics which current tools use so we must be looking at an arms race for years.

One of the big challenges I'm expecting will be the categories of attack: it seems plausible that we'd be able to limit abuse in the case where the original image is made by someone who isn't malicious by making it easy for viewers to find the originals (perceptual hashing with some sort of distributed ledger or signature system, and getting major services like Facebook to use it) but I haven't seen a convincing suggestion for how to deal with the case where the original is created by the attacker and thus any validation system would only show what they submitted it with. It seems like that'd fall back on much more failure-prone techniques — e.g. you could rely on public information to convince most people that, for example, Barack Obama didn't pledge allegiance to ISIS at a public rally but most people aren't going to have enough rock-solid documentation to prove a negative. If an attacker said that politician X was having an affair in hotel room it'd probably seem convincing to many people unless they screwed up and left proof of e.g. using stock footage, landmarks from a different city, wrong time of year, etc.

Can someone who understands this tell us what is the catch here?

You'll probably hit memory limits if you go much beyond 512x512-sized holes.

Additionally, computation times grow quickly with higher resolution and you already need a high end GPU for this resolution to get a reasonably interactive response time.

You'll also need a favorably licensed pretrained model or a few 10000 training images and masks.

So all in all, I can't see any deal breakers, but I'd probably still use PatchMatch instead.

For reference the GPU they're using for this paper is the NVIDIA V100 GPU, a datacenter GPU costing $8,000.

To be fair, while V100 perform very very well for machine learning, you can buy almost a dozen 1080ti's or a few titans (whatever the current one is), which would certainly be much faster.

They say they used V100 but not how many, if they needed a large number then nevermind.

The paper says they only ran it on a single V100, I was expecting multiple GPUs as well.

Since they programmatically generate the masks you wouldn't need those, just the set of training images. So it wouldn't be too hard to find since you're not looking for paired images, just a bunch of images of faces/landscapes/whatever you're trying to inpaint.

The catch is it is more like an artists interpretation. Anyone expecting that it will truly fix old damaged pictures to be like they were is going to be disappointed. Anyone just wanting to fill in some gaps will be excited. (Same for just procedural generation of some images, I suppose. Would be neat to see what it can do from a very low fidelity outline of a house/forest/etc.)

As opposed to it being truly magical, I guess?

I think the expectation has to be set that this is great for creative use cases. Not so much for forensic style ones.

To that end, "reconstruct" is not so much reconstructing what was lost, but more "fills in holes" of photos. If you read the title expecting something to literally be reconstructed, you are likely to feel lied to.

How would it work if the mask was covering the mouth region? Afaik teeth have always been difficult to reproduce...

Any actual software?

Seems amazing for face shot retouching.

They're at it a-GAN!

GAN is just a brilliant concept. I look forward to all the new tools to come out of this.

They didn't use GAN for this.

While I didn't read the paper in detail, there are a lot of papers in the references section about GANs. And referring back to the text that cites them makes it pretty obvious they did use GANs.

Perhaps you should read the paper in detail before arguing. Eq. 7 makes it pretty obvious there's no discriminator.

it's just a PUN guys

and yes, this is just an acronym

Nice breakthrough. What could be some potential applications?

The first thing that comes to mind is restoring the images that were recovered (damaged) at the end of this process (follow through to part 3 for full examples of images):




Hate watermarks? Remove them with this service and no one will know you got your image from 9gag...

Clever idea

Demosaic JAV!

Deep fakes

Is the code for this open source?

