Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI destroyed a trove of books used to train AI models (businessinsider.com)
9 points by redbell 11 days ago | hide | past | favorite | 6 comments





This title is wrong. They destroyed training data sets containing the text of a number books. The books didn't get destroyed, the copies did.

Huge difference.

Also, I don't understand the need for such a misleading headline.

"OpenAI tried to cover up usage of training material" would be sensational enough?


I'm guessing an editor wanted to punch up the headline without understanding the difference

I'm a know-nothing when it comes to LLM internals or copyright law, but presumably, if I as a human, read a work of literature (say JRR Tolkien's collected works) and generated something like Wheel of Time (which is literally what Robert Jordan did, he has even credited Tolkien for his influence, even stated that the initial chapters were modelled around Tolkien's Shire), then everything is admittedly kosher.

Then why is it wrong if an LLM does the same?

I am assuming of course that OpenAI licensed / bought the literature legally as a prolific human bookworm would have done.

As long as the LLM generates output that is distinct and different enough from the original work it was trained on, shouldn't this be entirely fine from a legal perspective?


I think the issue is this is more like Joseph Smith taking bits he had memorized from the King James Bible and including them close to verbatim in the Book of Mormon. (see https://en.wikipedia.org/wiki/Origin_of_the_Book_of_Mormon#K...)

is this books3?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: