Hacker News new | past | comments | ask | show | jobs | submit login
Generative Adversarial Networks – The Story So Far (floydhub.com)
213 points by iyaja 3 months ago | hide | past | web | favorite | 25 comments

Hi everyone. I just published a new blog post which talks about the evolution of GANs over the last few years. You can check it out here.

I think it's fascinating to see sample images generated from these models side by side. It really does give a sense of how fast this field has progressed. In just five years, we've gone from blurry, grayscale pixel arrays that vaguely resemble human faces to thispersondoesnotexist, which can easily fool most people on first glance.

Apart from image samples, I've also included links to papers, code, and other learning resources for each model. So this article could be an excellent place to start if you're a beginner looking to catch up with the latest GAN research.

I hope you enjoy it!

Cool! But one thing I'd like to see discussed is to what extent the images in various publications have been cherry-picked.

Cherry picking is no longer necessary with recent advancements. The images on https://thispersondoesnotexist.com/ are random and have a few artifacts (particularly the backgrounds and hair), but if you weren't looking for it you'd be unlikely to notice anything.

I'm curious about that too, and I'd like to know if there's been much work on GANs that can generate videos.

This was a very enjoyable read, thank you! You do a great job in making these concepts understandable.

The self attention mechanisms caught my eyes. Going to look into implementing something like that for a toy dataset. Thanks for the inspiration

I know people have been having trouble adapting these kinds of generative techniques to text. Do you know of anyone making interesting progress there?

I put your question in talktotransformer.com and got this response:

  I know people have been having trouble adapting these kinds of
  generative techniques to text. Do you know of anyone making
  interesting progress there?
  It is difficult to make progress in the field of generative methods
  with text alone - it takes effort and creativity to get a generative
  system working. A big part of our research focuses on generating
  sequences which correspond to handwritten data, and to improving on
  that method we have developed generative techniques which allow us to
  generate a large range of novel sequences. Our work is still small,
  but we are not stopping there, in fact our next major research project
  is to generate novel sequences for novel languages.
  What I see is that we are still in an early stage when it comes to
  the technology used in generative methods to create new words, but I
  suspect this is due to a combination of factors. First and foremost is
  the fact that the techniques we've developed to generate novel
  sequences are highly specialized in a particular kind of context -
  we are not going to create random numbers or sequences because that
  just doesn't work. Generating a word, for example, uses very specific
  computational principles and can only be done if you are aware of the
  context in which it is being generated (or "determined" as the
  linguists would say). Even so, the general principle has been around
  so long, that one could quite easily create several different methods
  to create.

GPT-2 is, however, not adversarial at all, and that might be part of why it rambles and lacks consistency or much of a 'point'.

+1. Also, the author of this article wrote a terrific article on GPT-2 as well. I'll definitively recommend you to check it out if you are interested in the latest breakthrough of text generation: https://blog.floydhub.com/gpt2/



GANs simply try to replicate a set of features - you can think of this as images or text. Variations in the GAN designs will be present, but the general principles are the same.

This is really helpful, I've been wanting to deep dive on GANs and this has pushed me to do it.

I like the mix of images and explanations.


I feel like the reporting on these things using just the generated images is completely off point. Showing an image of a face and saying: “can you believe this is fake isn’t it amazing?” Just has me going meh? At the very minimum show the “completely fake” generated image next to the closest image from the training sets used. Otherwise how do I know you haven’t just built an over-engineered solution that picks a random number between 0 and the number of images you had available for training?

The nature of the Generator is that it is seeded with random inputs and trains so it is able to fool the adversarial classifier. I.e. it never sees the "true" data.

This isn’t completely accurate. The generator sees the training data in the same way supervised learning might, because the discriminator sees the data, and the generator shares gradients with the discriminator.

Your point stands though, that it’s obviously not overfitting, and no scientist would publish a result that was just overfitting face generation.

Thank you for this useful summary; I think you should consider a discussion of prior work which may undermine the "revolutionary idea over a pint of beer" narrative, but might encourage independent researchers to press on beyond the idea phase.

Wikipedia GAN history (and talk) indicates it's not quite as clear cut as your framing, and this answer below on that subject from somebody who blogged the idea several years previously demonstrates there is often a hinterland of discussion that gives rise to the (genuinely) independent ideas of Goodfellow et al.

In this case the central idea doesn't seem to have been completely original to Goodfellow, though the credit is his for fully pursuing it to implementation in the current model.


Note: the linked answer - while in the context of the well known Schmidhuber can of worms, is actually the more obscure and very polite challenge from Niemitalo (who is mentioned in the Wikipedia history). The point stands though, regardless of actor.

I wonder how these GANs can be improved to have better symmetry. All these faces seem to have mismatched eyes/ears which must be coming from the locality of CNNs.

Self-attention seems to help a lot with that. (My usual example is that with anime faces, GANs which don't use either self-attention or progressive growing frequently have a failure mode of mismatched eye color: red+blue, for example.) Self-attention is expensive, so even BigGAN uses them lightly, typically only once, like at the 64px level, but there's work on making self-attention a lot cheaper and approaches like the Sparse Transformer (which OA recently used for MuseNet to let the Transformer scale to 30k-long sequences) are promising for making self-attention a lot cheaper.

The pace of improvement is pretty impressive

This basically means you can take any politician or citizen and create a video in which they say or do bad (but fake) things. Age of Fake News has truly arrived. Is there some research on how to distinguish fake generated videos from real ones?

The whole idea of Generative Adversarial Networks is that you're simultaneously training a network that can generate fake things, and another network that can distinguish fake from real. The two networks are each other's adversary, the generator trying to fool the discriminator, and the discriminator trying to catch the generator.

The article explains this very accessibly.

It also explains that the training is basically done when the discriminator has a roughly 50% (random) ability to discern between real and fake.

True. I'm not sure "done" is quite the right word. I think the problem is at that point you no longer have a useful training signal, so you have to stop.

You can simply use the detector network to discard faulty results in a pipeline and it's already useful

You could produce very good photoshop fakes like these for a very long time now. It has had basically no effect on the politics and everyone knows that this technology exists and treats all the pictures they see accordingly (checking their sources etc). It will just be the same with video.

> everyone knows that this technology exists and treats all the pictures they see accordingly

I don't know if everybody knows that. Lot of people believe Hillary ran a child-sex operation and should be locked up. They chant it. Trump has lied 10,000 times and still a lot of people believe him. Surely they would believe a fake video as well, many of them.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact