Yes. You would have to have a large training set with these labels but it would be pretty straightforward to train. You would probably want a tagging model not a classifier because there could be multiple objects of interest in the same image. If you get me the training data I could train a model for you pretty quickly.
And it would be an... interesting job to tag the training set. Although for higher level content, I suppose lots of porn videos have very specific category tags that could be an interesting data set to play with. Uh, to analyze.
What's the current ML pet method for multi-label image classification? It seems like you could string together a bunch of individual classifiers e.g. "Scene contains dog", "Scene contains cat", but is there an efficient (and effective) way of doing it in one go? Does it significantly increase the complexity of the network? I would imagine a cat detector would be far simpler than a cat and/or dog detector.