The comment I was responding to was about the case where person X uploads code to GitHub, and that code contains code from person Y whose license to X does not give X permission to grant GitHub the rights that GitHub requires from the uploader, and so GitHub's use of Y's code is without copyright permission.
I believe GitHub would likely be seen as an innocent infringer in that case.
Would that still be the case if Microsoft know that such infringement is likely to occur? Microsoft has been in the software industry for 50 years, has like Apple a app-store and has distributed software from millions of different rights owners. Can they with good faith argue that they had no idea that software often has multiple rights owner and thus a single person who upload software to github is unlikely to have sole copyright ownership.
I doubt Microsoft would make that argument. It is more likely they will argue fair use, but by not using closed repositories owned by paying customers, it seems to show that they themselves have doubt about the legal status of using other peoples copyrighted work for copilot.
> It is more likely they will argue fair use, but by not using closed repositories owned by paying customers, it seems to show that they themselves have doubt about the legal status of using other peoples copyrighted work for copilot.
Or they're worried about leaking secrets, which is a different matter entirely. The amount of copying needed to leak secrets is far lower than the amount needed to commit copyright infringement.
If Copilot is trained on Microsoft's code and accidentally regurgitates a comment, "// for 2024 Xbox", it has done one but not the other.
When copilot was release there were people who got it to print out account and passwords that had been put into the training data. Microsoft should had at minium sanitized the training data so it would not include such information. There is also likely personal information stored in some of those open repositories.
Copyright infringement doesn't have a fixed size. It depend on context and what kind of information is copied. It demonstrate that copilot has not actually learned how to code (as many people like to claim), but is simply a algorithm for copying code. If it had learned to code like a human it wouldn't divulge secrets.
I believe GitHub would likely be seen as an innocent infringer in that case.