
How an A.I. ‘Cat-And-Mouse Game’ Generates Believable Fake Photos - gk1
https://www.nytimes.com/interactive/2018/01/02/technology/ai-generated-photos.html?_r=0
======
oillio
Good article and great tech. However, I don't know if I believe the results
are as good as they claim. Many of the pictures look a bit off to me, like
they all have dead eyes. Maybe celebrities generally look like that anyway, so
it is being true to form. :)

In particular, I think this guy is missing a pretty significant part of his
head: [https://static01.nyt.com/newsgraphics/2017/12/26/ai-
faces/8e...](https://static01.nyt.com/newsgraphics/2017/12/26/ai-
faces/8e890dbb781d60b011c9be1f9a169ed2583a0dda/finished/7.png)

~~~
visarga
The inventor of GANs, which have been considered the most interesting idea in
ML in the last decade, is Ian Goodfellow. I met him on reddit a few years ago.
I was supposed to get private ML tutoring from him, just around the time
Andrew Ng opened the first Coursera course. I didn't get lessons because I
gave up and eventually took the MOOC. But it's amazing to know we share the
same forums and sometimes exchange a comment or two.

The great idea about GANs is that they replace one of the most hard to
understand parts of a neural net - the "Loss function" \- with another neural
net, thus making the loss function learnable. This opens up the door for a
kind of unsupervised learning that was impossible to make work before. GANs
are very very important also because they are almost like reinforcement
learning (actor + critic = RL, generator + discriminator = GAN), and RL is
supposed to be the way to AGI.

The most famous problem of GANs is instability during training and mode
collapse - which is like a student learning especially for an exam (and not in
general) thus optimising for the test instead of the real thing.

~~~
claytonjy
> RL is supposed to be the way to AGI

Could you expand on that? The more I read from folks like LeCunn & Chollet
seem to disagree strongly. Just this week Yan posted about unsupervised
modeling (with or without DL) to be the next path forward, and described RL as
essentially a roundabout way of doing supervised learning.

~~~
bitL
RL/DRL assumes world is Markovian, i.e. past doesn't matter between two
states, which is way too simple. It requires huge amount of tries/episodes and
properly tuned exploration-exploitation ratio. It is somewhat based on
biological reinforcement learning, so there might be basis in reality as it is
with convolutional neural networks and visual field maps in visual cortex
(even if very rough approximation). DRL is the technique that allows modeling
decisions; so for predictions you have CNN/RNN/FCN, for generation GANs and
for decisions DRL; together they are closest to AGI we have right now.

~~~
halflings
> RL/DRL assumes world is Markovian, i.e. past doesn't matter between two
> states, which is way too simple.

There's plenty of RL papers using RNNs and some types of memory networks.

~~~
bitL
Likely as value function approximators for one piece of the whole algorithm
(as is the case with DQN/DDQN). However the main algorithm is likely using
variation of Bellman equation, that assumes Markovian property and gives
strong guarantees about convergence.

~~~
gwern
If you're using DQN or pretty much anything in DRL, you don't have any
guarantees about convergence in the first place, and using a RNN does give you
the history summary you need (at least up to the minimum error achievable with
that fixed-length summary, not that that is any more likely to converge than
the overall DRL algo is).

~~~
bitL
I meant that under Markovian assumption value iteration used for Bellman
equation is guaranteed to converge. So it makes math people happy, even if
such property doesn't hold in the real world nor in the problem they try to
solve, and the "deep" in DRL is just heuristics, though surprisingly working
in many cases.

------
minimaxir
For those looking for a well-commented you-don't-need-a-PhD-to-understand
implementation of GANs + variants (using Keras), I recommend the examples in
this repo: [https://github.com/eriklindernoren/Keras-
GAN](https://github.com/eriklindernoren/Keras-GAN)

------
coldcode
Scary to think where this will be in 10 years. Perhaps even video evidence
will be hard to believe any more. How do you convict someone if this
technology is mature?

~~~
adventured
And or how do you prove someone's innocence if you can generate a believable,
fake, crime video.

Supporting counter evidence will become that much more important.

~~~
Grasshoppeh
You might be able to use this same technology to counteract this from
happening.

If you generate content you would have a base to test an AI to spot actual
fake content. You could use video's and pictures like these to test a learning
AI to spot discrepancies, then report findings in detail.

Makes me wonder if there is a future in forensics for this type of technology.

~~~
andrewla
This network was trained using an adversarial approach. What that means is
that a second network that does exactly what you say was used to train the
first.

They kept training until they created images that could reliably fool the
discriminator. A more powerful discriminator would just be used to create
better fakes.

------
sago
I don't understand the images associated with that article. They purport to
show the progressive refinement of the output over a series of days. But the
figure changes dramatically from image to image, all the way to the end of the
run.

At the very least it seems the output is not stable: a human has to decide
when to stop the Wheel of Fortune. It looks more like a series of images taken
from different training sets or parameters, for the NNs I'm used to.

Caveat: I've done a lot of ML, but not GANs specifically. Is this common? How
do you solve the 'where to stop' problem if the output is so unstable?

~~~
pk78
I am not sure if "unstable" is the word I would use. Sure, even after training
for days the GAN produces not-so-realistic images, but the rate at which it
generates those images gradually decreases over the training period and the
images get more "realistic".

>How do you solve the 'where to stop' problem if the output is so unstable?

Looking at the discriminator loss would be a good start for that.

~~~
sago
It's not the quality I was referring to. Look at the main image sequence. The
images from 0 to, say, Day 5 show the kind of progressive refinement I
expected: the network is improving its image over time. Each image is a
refinement of the previous.

But compare the images from Day 5 the end. Eye colour is changing and then
changing back. As is the background. And the hair colour. The position of the
parting. Whether the mouth is closed or showing teeth. Day 16 is not an
intermediate point between Day 9 and Day 18.

If it runs for another couple of days, would we get another version like Day
16?

That's what I mean by instability.

~~~
pk78
Ah, I understand what you are saying. The instabilities could be explained by
the batches sampled during those training days and the generator's input.
Training a GAN is not very straightforward and even minor changes in batch
sampling could produce vastly different generated images.

------
IIAOPSW
One application I imagine for this is a future of game development similar to
the movie inception where an "architect" designs the layout and setting and
then all the details are filled in ad hoc by the computers "imagination".

Today its faces that feel familiar but aren't real. Tomorrow its whole cities
that feel familiar but aren't real. The cities are filled with people you
swear you've seen before. Perhaps the details are tailored to you personally
based on the corpus of photos you've posted online.

~~~
hellbanner
This is happening.
[https://www.youtube.com/watch?v=1Ea57XERywM&index=6&list=PLc...](https://www.youtube.com/watch?v=1Ea57XERywM&index=6&list=PLcZ-
fMpNVsjUQ_eKlGrOH_OOgjxKzT84w)

"Narrative Dungeon Design"

------
blaze33
The paper was published in October and is named: Progressive Growing of GANs
for Improved Quality, Stability, and Variation
[http://research.nvidia.com/publication/2017-10_Progressive-G...](http://research.nvidia.com/publication/2017-10_Progressive-
Growing-of)

~~~
visarga
And the one hour movie of celeb faces generated by the Progressive GAN
(ProGAN) is here.

[https://www.youtube.com/watch?v=36lE9tV9vm0](https://www.youtube.com/watch?v=36lE9tV9vm0)

The most amazing AI video I have ever seen, actually. I spent hours staring
into it - it works great as background for many pieces of music, you can think
of it as the AI version of the burning log video.

------
saycheese
>> “QUESTION: Look at the two photos below and see if you can figure out which
person is real.”

>> “ANSWER: Sorry! This was a trick question. Both images were generated by
computers.”

Not really a trick question when even if you know they’re both fake that the
only way to be right (confirm you are right) is to be wrong.

~~~
pluma
Yeah, I thought both looked a tiny bit off. I think it has to do with the
reflection in the eyes which is a tiny bit inconsistent, among other things.

~~~
disantlor
maybe so (they fooled me) but you were already prepped to scrutinize them. To
the point others have made, we’ll soon need to be constantly prepared to
assume fakery.

the technology of fakery is rising the meet the “everything is fake news”
moment

~~~
logfromblammo
I immediately picked the right image, because I saw whisker stubble on the
left, and I already knew that image-generation AIs seem to have a thing for
painting whisker stubble all over anything even remotely resembling a male
face.

Surprise! Guess I should have considered the possibility of a trick question.

------
m3kw9
So Cat and Mouse is layman’s term for Adversial network?

~~~
fjsolwmv
No, is NYT's made up term. "Arms race" is the traditional term

------
ducampopinus
Great overview. I especially appreciate that they linked straight to the paper
instead of a popsci/buzzfeed regurgitation of the results.

------
fjsolwmv
The faces are pretty good but the ears and craniums are awful, presumably
because of a dependency on neighboring pixels and getting confused by diverse
backround images. Why ruin their work by including he garbage parts in the
presentation/claims? And why not leart foreground and background separately,
and mask them together?

------
matte_black
I would love this as a service for generating fake users.

------
hellbanner
See also:
[https://news.ycombinator.com/item?id=16040463](https://news.ycombinator.com/item?id=16040463)
\-- fake porn faceswap generated by AI

------
lerie82
Excellent images and the more I think about what this could be used for the
creepier it gets. One day, truly, software will be very dangerous.

~~~
ByThyGrace
And it makes you wonder, what is the current legal framework preventing this
kind of tool to be misused? How does it differ across countries/unions?

How can such a thing be enforced to begin with?

Are companies/labs/universities/individuals themselves the only thing standing
between fair play and massive misuse of realistically generated media?

------
EGreg
_Mr. Hwang believes the technology will evolve into a kind of A.I. arms race
pitting those trying to deceive against those trying to identify the
deception._

That's like a chess game. We have seen AlphaGo and other MCTS implementations
take the "trying to detect the deception" into account.

By the time the image is generated, it would have already been factored in.

~~~
fjsolwmv
Where does AlphaGo try to detect deception? What is deception in perfect
information games?

~~~
floofyreal
[https://www.youtube.com/watch?v=XaQu7kkQBPc](https://www.youtube.com/watch?v=XaQu7kkQBPc)

Imagine automated system for danger recognition on for example airport. These
kind of deception attacks could make problems with these systems. Imagine if
suddenly 10,20,100 airports all around globe would recognize weapons, bombs or
any other dangerous items? I can imagine panic and huge news headlines
badmouthing AI.

People don't trust AI. These kind of errors could only prolong proper
integration, which in many ways could enhance the way we live.

------
booleandilemma
I’d be really worried if I was a photo model.

~~~
bcaulfield
We've made good looking people obsolete.

------
oliv__
These are pretty damn good but it seems to me like the program is sort of over
optimizing the pictures: in my mind, the pictures from the 5th to the 7th day
are the most realistic.

After that, it feels to me like the "realness" slowly degrades.

------
hokkos
Great PR but none of the faces look human, with the weird position of the
nose, the strange curly thing around the hairs, there is still work to do to
trick the most fundamental tool of the brain, recognize a fellow human face.

------
olivermarks
“Believe nothing you hear, and only one half that you see.” ― Edgar Allan Poe

------
33W
I found the Obama video at the end very interesting. It would be a neat next
step to map non-Obama audio to the generated video. For example, pull audio
from an Obama impersonator.

~~~
visarga
> pull audio from an Obama impersonator

We can impersonate voices with neural nets. We can clone timbre and style, and
this tech is being used commercially by Baidu at the very least (keyword: Deep
Voice 3).

------
bobajeff
Watching the progress of this system reminds me of that scene at the beginning
of The Thing where The Thing almost but not quite mimicks one of the humans.

------
IgorPartola
Very interesting. A this kind of tech something that is mere mortals can play
with?

