Hacker News new | past | comments | ask | show | jobs | submit login

You don't want your embedding model to be trained only on ImageNet even if it's a small embedding model because the dataset is not very diverse and the bias of training the model on cross-entropy loss on 1000 classes, for a small embedding model is much better to use DINOv2 distilled models like ViT-S/14 (21M parameters), model trained and distilled on 142M images without supervision, so no bias in the training objective.

Btw the splitted batch normalization is very interesting, I wonder if this splitted normalization can also be applied to other types of normalization like layer normalization on transformer models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: