Hi everyone. I just published a new blog post which talks about the evolution of GANs over the last few years. You can check it out here.
I think it's fascinating to see sample images generated from these models side by side. It really does give a sense of how fast this field has progressed. In just five years, we've gone from blurry, grayscale pixel arrays that vaguely resemble human faces to thispersondoesnotexist, which can easily fool most people on first glance.
Apart from image samples, I've also included links to papers, code, and other learning resources for each model. So this article could be an excellent place to start if you're a beginner looking to catch up with the latest GAN research.
Cherry picking is no longer necessary with recent advancements. The images on https://thispersondoesnotexist.com/ are random and have a few artifacts (particularly the backgrounds and hair), but if you weren't looking for it you'd be unlikely to notice anything.
I put your question in talktotransformer.com and got this response:
I know people have been having trouble adapting these kinds of
generative techniques to text. Do you know of anyone making
interesting progress there?
It is difficult to make progress in the field of generative methods
with text alone - it takes effort and creativity to get a generative
system working. A big part of our research focuses on generating
sequences which correspond to handwritten data, and to improving on
that method we have developed generative techniques which allow us to
generate a large range of novel sequences. Our work is still small,
but we are not stopping there, in fact our next major research project
is to generate novel sequences for novel languages.
What I see is that we are still in an early stage when it comes to
the technology used in generative methods to create new words, but I
suspect this is due to a combination of factors. First and foremost is
the fact that the techniques we've developed to generate novel
sequences are highly specialized in a particular kind of context -
we are not going to create random numbers or sequences because that
just doesn't work. Generating a word, for example, uses very specific
computational principles and can only be done if you are aware of the
context in which it is being generated (or "determined" as the
linguists would say). Even so, the general principle has been around
so long, that one could quite easily create several different methods
to create.
+1. Also, the author of this article wrote a terrific article on GPT-2 as well. I'll definitively recommend you to check it out if you are interested in the latest breakthrough of text generation: https://blog.floydhub.com/gpt2/
GANs simply try to replicate a set of features - you can think of this as images or text. Variations in the GAN designs will be present, but the general principles are the same.
In case you didn't read it yet, I used a StyleGAN to interpolate between a few popular characters from HBO's Game of Thrones series.
Here's something interesting to note: All the results (images and animations) were generated from Nvidia's StyleGAN that was pretrained on the FFHQ dataset, with absolutely no fine-tuning.
Instead, to make StyleGAN work for Game of Thrones characters, I used another model that maps images onto StyleGAN's latent space. I gave it images of Jon, Daenerys, Jaime, etc. and got latent vectors that when fed through StyleGAN, recreate the original image.
With the latent vectors for the images in hand, it's really to modify them in all the ways described in the StyleGAN paper (style mixing, interpolations, etc.) as well as through simple arithmetic in the latent space (such as shifting the latent vector in the "smiling direction").
As a bonus, since there's no StyleGAN training involved, all the steps that I just mentioned can be executed extremely fast.
Hi everyone. I'm the author. There's one more I thing I wanted to add: a good reason you should try using some sort of hyperparameter search, even you think it's a complete waste of time and compute, is for reproducibility.
This probably applies more to open-source academic contributions, where you're trying to help your fellow practitioners recreate and use your models, as opposed to a corporate setting, where reproducibility would be the equivalent of getting fired.
Recently, I was trying to train a ResNet to beat the top Stanford DAWNBench entry (spoiler alert: I did, but by less than a second). Initially, I blindly tried manually tuning the learning rate, batch size, etc. without even reading the original model's guidelines.
After actually going through a blog post written by the David C Page (the guy with the top DAWNBench entry), I saw that he tried varying the hyperparameters himself and that the ones that were set by default in the code were what he found to be optimal.
That saved me a lot of time and let me focus on other things like what hardware to use.
I think the lesson here is that if more researchers perform and publish the results of some basic hyperparameter optimization, it would really save the world a whole lot of epochs.
I enjoyed the article, and I know that writing these takes a nontrivial amount of time. So I think it would be wise of you to run these through a spell checker before publishing, as this is a less than a minute investment which pays off every time someone reads it.
> The heavier the ball, the quicker it falls. But if it’s too heavy, it can get stuck or overshoot the target.
This explanation of momentum is somewhere between misleading and wrong. Momentum is about inertia and acceleration, i.e., the ability to quickly change speed.
A misleading experiment: the heavier the feather the quicker it falls is obviously true; steel feathers are useless. The same is true for balls (except for the exceptional situation of a perfect vacuum), it's just that drag and air currents don't influence balls all that much at low speeds.
I think it's fascinating to see sample images generated from these models side by side. It really does give a sense of how fast this field has progressed. In just five years, we've gone from blurry, grayscale pixel arrays that vaguely resemble human faces to thispersondoesnotexist, which can easily fool most people on first glance.
Apart from image samples, I've also included links to papers, code, and other learning resources for each model. So this article could be an excellent place to start if you're a beginner looking to catch up with the latest GAN research.
I hope you enjoy it!