Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is GitHub Copilot / IntelliCode Legal?
5 points by tentacleuno on Feb 6, 2022 | hide | past | favorite | 3 comments
Two years ago, an issue was opened in Microsoft's IntelliCode GitHub repository[0] titled "Licensing issues". It receives a response from a Microsoft employee. Eventually, the argument is made that this is a derivative work, as it is derived from thousands (?) of open-source projects. From what I understand, this seems to be true.

However, here's the fun part: Microsoft is training its AI dataset on these open-source projects. Would the terms of the license still apply here?

Further, would you say the law hasn't caught up with this use of open-source projects yet?

I am also curious about the legality of GitHub Copilot, since they seem to do largely the same thing from an AI standpoint.

[0]: https://github.com/MicrosoftDocs/intellicode/issues/201

EDIT: IntelliCode, not IntelliSense!




It's definitely in a gray area because the AI models are essentially compression engines that encode the code samples/data into the weights of the matrices that represent the ML model and then "uncompress" it to serve queries. I think it would be easy to argue that a compressed data set no matter how illegible would need to conform to the same license as the data set it was encoding but I don't think any lawyer is smart enough to make that case. So at the moment it remains a very convenient loophole for companies that have enough compute to mangle the data set beyond recognition and then use it to their advantage. So this will probably remain a convenient loophole for large companies to sidestep licensing restrictions by encoding whatever data/code they want to use into some neural network and then sell it as AI.

For why these things are essentially mangled compression engines one can take a look at "Hopfield Networks is all you need": https://arxiv.org/abs/2008.02217. It allows representing all modern transformer networks (which is what CoPilot is using) as a bunch of hopfield networks which are essentially memory modules connected in some complicated topology to encode some data set.


Microsoft has skilled lawyers who think Copilot is legal. Same goes for Google's Alphacode.

The Software Freedom Conservancy has skilled lawyers who think Copilot/etc. isn't legal: https://sfconservancy.org/blog/2022/feb/03/github-copilot-co...

Until there are court cases that set precedent, nobody will know for sure.


My impression is that this goes to show how poor GPL and other copy-left licenses are. Basically a person is never allowed to look at GPL code if that person will ever have a possibility of writing commercial software.

Github Co-Pilot will most likely not be used to directly compete with GPL-licensed software but the argument of the copy-left organization is that even using snippets of it is not allowed. So basically GPL is really a poor idea of open source, imo, since they make it open for people to view but you can't really use it.

If you even read a piece of GPL snippet, you have to erase it from your mind when you're working on a commercial software.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: