Here's a nice paper. But yes, almost any CNN works.

VGG-16 or VGG-19 seem to be used the most.

Yeah, but if I understood it correctly the VGG-16 that comes with Keras is trained to give a probability distribution over 1000 interpretable labels. I was hoping for something that, like word2vec, embedded image data in low-ish dimension.

