Examples of generative language modeling ignored by this paper (not even mentioned):
* Unsupervised Transformer - https://blog.openai.com/language-unsupervised/
* ELMo - https://arxiv.org/abs/1802.05365
* ULMFit - https://arxiv.org/abs/1801.06146
Examples of generative image modeling ignored by this paper (some are mentioned only in passing):
* Glow - https://blog.openai.com/glow/
* RealNVP - https://arxiv.org/abs/1605.08803
* NICE - https://arxiv.org/abs/1410.8516
* Pixel CNN - https://arxiv.org/abs/1606.05328 (and its cousin PixelRNN - https://arxiv.org/abs/1601.06759)
* Classically, a generative model is one that describes a statistical process that generates a whole observation (maybe plus some hidden part) and thus gives you scores that are consistent between different observations. An example for this would be (unidirectional) language models, where the total probabilities for different possible sentences can be compared
* Classically, discriminative models are models where you also get a probability over outputs, but only conditional on some observed facts. As an example, a CNN transforms an image to logits that express the likelihood of different classes for this image. Hence, CNNs (used in this fashion) are a form of discriminative modeling.
If we map this to their "probabilistic families", Generative Models (in the old sense) would be Descriptive in their book, while Discriminative stays Discriminative.
The closest counterpart to their "Generative" models would be generative (in the old sense) models that assume a common distribution over hidden variables and a process that generates the actual output. However, this only loosely fits how GANs are used today, where you don't really assume a distribution over the hidden seed for an output, and where you get a single output (i.e. a deterministic function from seed to output instead of a distribution).
Neural networks have deterministic functions where MRFs and Gibbs sampling and all the pre-2012 machine learning goodies have probability distributions. It's not really helpful to use the term "probabilistic families" here when the most important bits of what they describe are non-probabilistic.
Can you clear up what you mean by a hidden seed? My understanding of GANs were that the generator learns the distribution of the input data without ever directly observing it.
I understand Discriminative and Generative models and their usage in inference of a quantity from a given signal.
However I don't quite get what a Descriptive model does. What is it used for? It doesn't sound like inference is the goal.