Hacker News new | past | comments | ask | show | jobs | submit login

> Moreover, this discussion isn't just about licensing MP3s; it's about managing common knowledge—that which is accessible to everyone, including competitors in different jurisdictions—for the creation of an industrial revolution.

Great, why don't these companies train their models on that "common knowledge" then and leave copyrighted work alone? Would be a great option wouldn't it? Solves the whole problem in one go.

Personally, I think these companies are just stealing copyrighted works, and should be sued for that. It's against the law, it's pretty simple.

And the whole China argument... Sorry but it makes no sense. They could beat us in AI by doing illegal things, so we should do the same illegal things to not let them win?

China has a lot of slave labour too. By the same reasoning, shouldn't we introduce slavery to not let them win production?




I'm not saying that in the future there won't be licensing methods that perhaps will serve to provide more direct access to the data, for example free = you can find them online, or do scraping... if you pay for them you have direct access to the dataset for training in real time , and you know that it is clean, without "watermarks" (even textual), etc.. the comparison with slave labor has nothing to do with it because here you are not violating a person's body to train AI, it is more like a duplication (but actually reinterpreted) of digital information that can be copied infinite times without damage, if I read it 3 times a PDF protected by copyright does not cause 3x damage compared to just once

I see it as more similar, for example, to how when the first search engines were born... if they had had to ask each site for permission to read, rework and provide the "external" search service, everything would have died immediately (at least in Europe and use )

another example is allowing emerging countries to maintain lighter copyright laws to facilitate their growth (yes, they violate them, yes perhaps a small percentage of Zambian inhabitants would perhaps buy the media by paying Western fees, but it is better to leave it alone and look at a greater good than managing copyright guarantees everywhere...) the Americans interpret this legislative concept better, the Europeans if they don't wake up will be left behind for a long time,

and I repeat, comparing the passing of a dataset containing copyrighted material into matrices for the generation of an AI is very different from slavery


> Personally, I think these companies are just stealing copyrighted works, and should be sued for that. It's against the law, it's pretty simple.

Is it copyright infringement to count how many times each letter appears in a book?

I don't know that it is, and if it's not, then there is at least some line you can draw where mechanically reading and learning from a copyrighted work is not copyright infringement.

The question of whether training a transformer model is on the legal side of that line remains to be seen, but I don't think it's as clear cut as you make it out to be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: