"someone who did not know they were infringing and had no reason to believe that they were."
Can this be said by microsoft? They explicitly chose to not include hidden repositories by their paid customers, likely because they knew that those customers would sue them if proprietary code was used as training data.
Apple seemed to have chosen not to include GPL in the app store for very similar reasons. Their term of service require a permission which is incompatible with the terms of GPL, and knowing that GPL software tend to include multiple rights owners, Apple chose to go the route of not allowing GPL.
And last, authors has requested to have their works removed from the training data. It is part of the lawsuit. Can Microsoft then still claim that they did not know they were infringing?
The comment I was responding to was about the case where person X uploads code to GitHub, and that code contains code from person Y whose license to X does not give X permission to grant GitHub the rights that GitHub requires from the uploader, and so GitHub's use of Y's code is without copyright permission.
I believe GitHub would likely be seen as an innocent infringer in that case.
Would that still be the case if Microsoft know that such infringement is likely to occur? Microsoft has been in the software industry for 50 years, has like Apple a app-store and has distributed software from millions of different rights owners. Can they with good faith argue that they had no idea that software often has multiple rights owner and thus a single person who upload software to github is unlikely to have sole copyright ownership.
I doubt Microsoft would make that argument. It is more likely they will argue fair use, but by not using closed repositories owned by paying customers, it seems to show that they themselves have doubt about the legal status of using other peoples copyrighted work for copilot.
> It is more likely they will argue fair use, but by not using closed repositories owned by paying customers, it seems to show that they themselves have doubt about the legal status of using other peoples copyrighted work for copilot.
Or they're worried about leaking secrets, which is a different matter entirely. The amount of copying needed to leak secrets is far lower than the amount needed to commit copyright infringement.
If Copilot is trained on Microsoft's code and accidentally regurgitates a comment, "// for 2024 Xbox", it has done one but not the other.
When copilot was release there were people who got it to print out account and passwords that had been put into the training data. Microsoft should had at minium sanitized the training data so it would not include such information. There is also likely personal information stored in some of those open repositories.
Copyright infringement doesn't have a fixed size. It depend on context and what kind of information is copied. It demonstrate that copilot has not actually learned how to code (as many people like to claim), but is simply a algorithm for copying code. If it had learned to code like a human it wouldn't divulge secrets.
Can this be said by microsoft? They explicitly chose to not include hidden repositories by their paid customers, likely because they knew that those customers would sue them if proprietary code was used as training data.
Apple seemed to have chosen not to include GPL in the app store for very similar reasons. Their term of service require a permission which is incompatible with the terms of GPL, and knowing that GPL software tend to include multiple rights owners, Apple chose to go the route of not allowing GPL.
And last, authors has requested to have their works removed from the training data. It is part of the lawsuit. Can Microsoft then still claim that they did not know they were infringing?