Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Generally speaking text only models manage to learn a huge amount about the visual world. So when you put the model train on video it might have less to learn. Video is also less abstract than text, generally. But I am sure we can still extract useful learning from videos, it's probably expensive, but we'll have to do that at some point.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: