Hacker News new | past | comments | ask | show | jobs | submit login

Generally speaking text only models manage to learn a huge amount about the visual world. So when you put the model train on video it might have less to learn. Video is also less abstract than text, generally. But I am sure we can still extract useful learning from videos, it's probably expensive, but we'll have to do that at some point.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: