Hacker News new | past | comments | ask | show | jobs | submit login

Aside from the potential problem with regards to copyright, it should also be noted that subtitles in general are not transcripts of dialogue. The subtitlers often have to shorten down sentences of speech so that viewers have time to read before the next couple of subtitles appear on screen.

There shouldn't be any issues with copyright, as long as you aren't redistributing the original work. Otherwise all neural networks would be illegal, since most training data is copyrighted.

As for errors in the subtitles, that's still good enough. As long as the machine learning model can deal with uncertainty, it would just not learn from those examples and learn from the ones that are correct. It might even learn to abbreviate sentences itself!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact