
Generative Adversarial Networks – The Story So Far - iyaja
https://blog.floydhub.com/gans-story-so-far/
======
iyaja
Hi everyone. I just published a new blog post which talks about the evolution
of GANs over the last few years. You can check it out here.

I think it's fascinating to see sample images generated from these models side
by side. It really does give a sense of how fast this field has progressed. In
just five years, we've gone from blurry, grayscale pixel arrays that vaguely
resemble human faces to thispersondoesnotexist, which can easily fool most
people on first glance.

Apart from image samples, I've also included links to papers, code, and other
learning resources for each model. So this article could be an excellent place
to start if you're a beginner looking to catch up with the latest GAN
research.

I hope you enjoy it!

~~~
amelius
Cool! But one thing I'd like to see discussed is to what extent the images in
various publications have been cherry-picked.

~~~
Scaevolus
Cherry picking is no longer necessary with recent advancements. The images on
[https://thispersondoesnotexist.com/](https://thispersondoesnotexist.com/) are
random and have a few artifacts (particularly the backgrounds and hair), but
if you weren't looking for it you'd be unlikely to notice anything.

------
hjk05
I feel like the reporting on these things using just the generated images is
completely off point. Showing an image of a face and saying: “can you believe
this is fake isn’t it amazing?” Just has me going meh? At the very minimum
show the “completely fake” generated image next to the closest image from the
training sets used. Otherwise how do I know you haven’t just built an over-
engineered solution that picks a random number between 0 and the number of
images you had available for training?

~~~
astazangasta
The nature of the Generator is that it is seeded with random inputs and trains
so it is able to fool the adversarial classifier. I.e. it never sees the
"true" data.

~~~
rytill
This isn’t completely accurate. The generator sees the training data in the
same way supervised learning might, because the discriminator sees the data,
and the generator shares gradients with the discriminator.

Your point stands though, that it’s obviously not overfitting, and no
scientist would publish a result that was just overfitting face generation.

------
mellosouls
Thank you for this useful summary; I think you should consider a discussion of
prior work which may undermine the "revolutionary idea over a pint of beer"
narrative, but might encourage independent researchers to press on beyond the
idea phase.

Wikipedia GAN history (and talk) indicates it's not quite as clear cut as your
framing, and this answer below on that subject from somebody who blogged the
idea several years previously demonstrates there is often a hinterland of
discussion that gives rise to the (genuinely) independent ideas of Goodfellow
et al.

In this case the central idea doesn't seem to have been completely original to
Goodfellow, though the credit is his for fully pursuing it to implementation
in the current model.

[https://stats.stackexchange.com/a/301280](https://stats.stackexchange.com/a/301280)

Note: the linked answer - while in the context of the well known Schmidhuber
can of worms, is actually the more obscure and very polite challenge from
Niemitalo (who is mentioned in the Wikipedia history). The point stands
though, regardless of actor.

------
hooloovoo_zoo
I wonder how these GANs can be improved to have better symmetry. All these
faces seem to have mismatched eyes/ears which must be coming from the locality
of CNNs.

~~~
gwern
Self-attention seems to help a lot with that. (My usual example is that with
anime faces, GANs which don't use either self-attention or progressive growing
frequently have a failure mode of mismatched eye color: red+blue, for
example.) Self-attention is expensive, so even BigGAN uses them lightly,
typically only once, like at the 64px level, but there's work on making self-
attention a lot cheaper and approaches like the Sparse Transformer (which OA
recently used for MuseNet to let the Transformer scale to 30k-long sequences)
are promising for making self-attention a lot cheaper.

------
houqp
The pace of improvement is pretty impressive

------
galaxyLogic
This basically means you can take any politician or citizen and create a video
in which they say or do bad (but fake) things. Age of Fake News has truly
arrived. Is there some research on how to distinguish fake generated videos
from real ones?

~~~
mkl
The whole idea of Generative Adversarial Networks is that you're
simultaneously training a network that can generate fake things, and another
network that can distinguish fake from real. The two networks are each other's
adversary, the generator trying to fool the discriminator, and the
discriminator trying to catch the generator.

The article explains this very accessibly.

~~~
Jedi72
It also explains that the training is basically done when the discriminator
has a roughly 50% (random) ability to discern between real and fake.

~~~
mkl
True. I'm not sure "done" is quite the right word. I think the problem is at
that point you no longer have a useful training signal, so you have to stop.

