Practically every open source license requires attribution, if copilot has a licensing issue, training a model on only repositories with the same license won't fix it except for the extremely rare licenses which do not require attribution.
Could they handle this by generating a collective attribution file that covers every (permissively licensed) repository that Copilot learned from?
Of course this would be massive, so from a practical consideration the attribution file that Copilot generates in the local repository would have to just link to the full file, but I don't think that would be an issue in and of itself.
Maybe? Might depend on the license, I doubt the courts would be amused.
Almost certainly a link would not suffice, basically every license requires that the attribution be directly included with the modified material. Links can rot, can be inaccessible if you don't have internet access, can change out from underneath you, etc.
Makes sense. Maybe something like git-lfs/git-annex would be sufficient to address the linking issue, but it seems like the bigger concern is whether a court would accept this as valid attribution. In a sense it reminds me of the LavaBit stunt with the printed key.
I think a judge could be persuaded that a list of every known human does not constitute a valid attribution of the actual author, even though their name is on the list. The purpose of an attribution is to acknowledge the creator of the work, and such a list fails at that.
Makes sense. That's probably the best interpretation here. Any other decision would make attribution lists optional in general for all practical purposes.