I wonder if, in order to deal with attribution, the system could simply build a multi-megabyte file with "this code is derived from:" followed by all the authors the system could gather from the training data set.
Not but it's big determinant in how you get sued. Several lawyers haven give the advice the best way to avoid a lawsuit is don't be an asshole. The second best way is to spend a bunch of money on an attorney.
it's not
if they've trained on MIT/Apache 2.0/... then they're just as liable as people that have trained on GPL
they would be limited to training on licenses that don't require attribution (BSD2, public domain, etc)
which I suspect limits the size of the training set so much that the output would be useless
Codium here is unintentionally making an argument that undermines legal confidence in their own product
interesting choice!