
Large Scale GAN Training for High Fidelity Natural Image Synthesis - ericjang
https://openreview.net/pdf?id=B1xsqj09Fm
======
ericjang
This is a paper submission that was submitted to the ICLR 2019 conference. To
preserve the integrity of the double-blind submission process please do not
"dox" the identities of the authors. I am not an author of this paper.

Previous works in this area (and there are many) have tackled the exact
problem of natural image generation. The specifics of the methods presented in
the paper are of considerable interest to the ML/AI research community, but
what I'd like to highlight here the _results_.

One might suppose that a neural network could never make images consistent
with reality because there are so many hidden latent variables & strong causal
factors that lead to the generation of coherent real-world imgaes. For
instance, laws of light transport and camera perspective, smooth texture
variation on objects, bilateral symmetry in organisms, what objects are
"possible" and which objects are not.

Often to generate images that are at least plausible under our reality we need
to make use of explicit physically-based rendering techniques like simulating
the transport of light and accurately modeling a lot of high-fidelity
geometry. We have to hard-code so many physical equations and rules into these
systems, and even then we still have a hard time rendering everyday things
(candlelight, soap bubbles, food).

Prior to this paper, I certainly had my doubts that neural networks could ever
capture enough implicit "knowledge" about the world to synthesize an image
(not in the training set, mind you) that could convince a human it was real.
Machine Learning is the study of generalization and we know of no useful
guarantees on generalization for finite-size training sets and
overparameterized models.

To my knowledge, this is the first paper to generate high-fidelity _natural_
images with no apparent visual artifacts (blurriness, weird textures that
could not exist in the real world). The laws of light transport (NP-hard to
compute) appear to be convincingly preserved, and I am blown away.

------
MrQuincle
An interesting aspect is the sampling from a different distribution on testing
versus training.

The "truncation trick" samples from a truncated Normal distribution rather
than N(0,sigma) for the latent variables. (The values above a particular
threshold are just sampled again until they are below that threshold.) I don't
completely get this. What's going on there? Is there a mode in the network
that defines the prototypical dog and are other dogs defined by nonzero values
in the latent variable space? Then this seem to show that the layer exhibits
an non-intuitive decomposition of the task at hand. I would expect a zero
vector to correspond to an "abstract dog" and have all nonzero parameters
contribute in an attribute like fashion. This seems to be more prototype-like,
similar to old-fashioned vector quantization.

The censored normal max[N(0,sigma),0] is interesting. It reminds me of
nonnegative matrix factorization. Check the paper in Nature or just a nice
blog post: [https://yliapis.github.io/Non-Negative-Matrix-
Factorization/](https://yliapis.github.io/Non-Negative-Matrix-Factorization/).
By using a nonnegative constraint the representation becomes additive (part-
based) and sparse. That's quite different from prototype-like methods.

I'm myself experimenting with more complex priors and in my experience it's
difficult to find priors that blow everything out of the water. Very nice
appendix E. :-)

