One thing people might not realize (I'm not sure how obvious it is) is that these renders depend strongly on the statistics of the training data used for the ConvNet. In particular you're seeing a lot of dog faces because there is a large number of dog classes in the ImageNet dataset (several hundred classes out of 1000 are dogs), so the ConvNet allocates a lot of its capacity to worrying about their fine-grained features.
In particular, if you train ConvNets on other data you will get very different hallucinations. It might be interesting to train (or even fine-tune) the networks on different data and see how the results vary. For example, different medical datasets, or datasets made entirely of faces (e.g. Faces in the Wild data), galaxies, etc.
It's also possible to take Image Captioning models and use the same idea to hallucinate images that are very likely for some specific sentence. There are a lot of fun ideas to play with.
In particular, if you train ConvNets on other data you will get very different hallucinations. It might be interesting to train (or even fine-tune) the networks on different data and see how the results vary. For example, different medical datasets, or datasets made entirely of faces (e.g. Faces in the Wild data), galaxies, etc.
It's also possible to take Image Captioning models and use the same idea to hallucinate images that are very likely for some specific sentence. There are a lot of fun ideas to play with.