I am not sure if "unstable" is the word I would use. Sure, even after training for days the GAN produces not-so-realistic images, but the rate at which it generates those images gradually decreases over the training period and the images get more "realistic".
>How do you solve the 'where to stop' problem if the output is so unstable?
Looking at the discriminator loss would be a good start for that.
It's not the quality I was referring to. Look at the main image sequence. The images from 0 to, say, Day 5 show the kind of progressive refinement I expected: the network is improving its image over time. Each image is a refinement of the previous.
But compare the images from Day 5 the end. Eye colour is changing and then changing back. As is the background. And the hair colour. The position of the parting. Whether the mouth is closed or showing teeth. Day 16 is not an intermediate point between Day 9 and Day 18.
If it runs for another couple of days, would we get another version like Day 16?
Ah, I understand what you are saying. The instabilities could be explained by the batches sampled during those training days and the generator's input. Training a GAN is not very straightforward and even minor changes in batch sampling could produce vastly different generated images.
>How do you solve the 'where to stop' problem if the output is so unstable?
Looking at the discriminator loss would be a good start for that.