Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great way to build labeled training data.

User-submitted videos (with audio for STT), user-crafted bounding boxes (we might not need these soon), and user-guided RLHF.

The submitted videos are likely diverse, challenging (otherwise the human might just do it), and representative of solving actual customer problems.



Doesn't even need to be user guided. Use videos that have audio. You could have one AI that generates a transcript using the audio/video and another that watches the video on mute and tries to read the lips. Feedback would then be provided by the AI that had access to the audio.


I am thinking of the millions of hours of tv news. Presenters are almost always going to be the same position in frame and may already have high quality transcripts.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: