Hacker News new | past | comments | ask | show | jobs | submit login

Models trained on DVD audio are considered derived works. You certainly couldn't release such a model under the GPL.

You also have to solve the (very difficult) subtitle alignment problem before you could begin training.

I'm a neural net trained substantially on copyrighted books, music, tv, and movies. Does that mean I'm a derived work, and consequently all works I create are derived works as well?

I'm not saying you're wrong, necessarily. Since copyright is so vague as to allow that interpretation, that shows how much copyright is incoherent, contradictory, broken, and, ultimately, nonsense.

What about the countless audiobooks available on archive.org [1]? Sure, you may be limited to just books in the public domain, but that's still plenty of books.

[1]: https://archive.org/details/audio_bookspoetry


It's not like you could take the neural net weights aggregated from thousands of movies and retrieve any form of entertainment from them. Is a derived work anything at all based on an original, or just something in the similar field, ie entertainment->entertainment?

My own personal definition is whether the derivative work could survive if the first work did not exist, not for which purpose it was intended to be consumed. Not sure about the legal definition.

Legally, there is a huge gradient between length(work), sha(work), train(transcription(work)), transcription(work), thumbnail(work), etc. Your personal definition of "derived" sounds a lot like the mathematical definition, which isn't amazingly useful in a copyright context.

> Not sure about the legal definition.

Perhaps stating "You certainly couldn't release such a model under the GPL." so surely isn't a great idea?

That actually depends on if the audio is under copyright.

And even so, that is no reason why we as open source collaborators cannot create a million or billion or so samples of "Hello" in foo language as training data as a corpus for all to use.

> Models trained on DVD audio are considered derived works

[citation needed]

There are tons of freely available models based on copyrighted works. Are you sure this is true?

But in order to use the movies for training you would need to buy the thousand and thousands of DVD's

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact