Hacker News new | past | comments | ask | show | jobs | submit login

One thing people might not realize (I'm not sure how obvious it is) is that these renders depend strongly on the statistics of the training data used for the ConvNet. In particular you're seeing a lot of dog faces because there is a large number of dog classes in the ImageNet dataset (several hundred classes out of 1000 are dogs), so the ConvNet allocates a lot of its capacity to worrying about their fine-grained features.

In particular, if you train ConvNets on other data you will get very different hallucinations. It might be interesting to train (or even fine-tune) the networks on different data and see how the results vary. For example, different medical datasets, or datasets made entirely of faces (e.g. Faces in the Wild data), galaxies, etc.

It's also possible to take Image Captioning models and use the same idea to hallucinate images that are very likely for some specific sentence. There are a lot of fun ideas to play with.

So how much computational effort would it take to train with a different set of images, to reach the same level of training as this existing data?

Would it be possible on a simple commercial computer?

OpenCL support is coming, but it's not as performing as CUDA support yet for Caffe.

Grab a couple of video cards and have fun!

Might finally be putting some bitcoin GPUs to use.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact