Hacker News new | past | comments | ask | show | jobs | submit login
Convolutional Neural Networks for Visual Recognition (cs231n.github.io)
71 points by yu3zhou4 22 days ago | hide | past | favorite | 4 comments



One page says "CS231n: Deep Learning for Computer Vision" and another says "CS231n: Convolutional Neural Networks for Visual Recognition". Did they change it recently to recognize other methods (ViT), or?


Certainly still worth learning CNNs. Still unclear if ViT is better. And there's certainly enough for a full course on CNNs and a separate course on vision transformers.


Agreed. ViTs are better if you're looking to go multimodal or use attention-specific mechanisms such as cross-attention. If not, there's evidence out there that ViTs are not better than convnets for small networks and at scale (https://frankzliu.com/blog/vision-transformers-are-overrated).


ViTs also have proven to be more effective for zero-shot generalization tasks due to their ability to capture global context and relationships in the input data, which CNNs struggle with.

https://arxiv.org/abs/2304.02643




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: