Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: GPT-Vision or Llava for Videos
1 point by vanguardanon on Jan 20, 2024 | hide | past | favorite
I'm interested in a model that can take as input a video and output a caption to describe what is happening in the video. I've looked on huggingface etc. and can only find XCLIP from Microsoft, but that only does video classification. It doesn't write its own caption.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: