There is a difference between a person learning and a commercial product learnin...

adlpz · on May 8, 2023

To be fair, when a programmer learns from publicly available but not public-domain code, and then applies the ideas, patterns, idioms and common implementations in their daily job as a software developer, the result is very much a "commercial product" (the dev company, the programmer themselves if a freelancer) learning from someone else's work and ignoring all the licenses.

The only leap here is the fact that the programmer has outsourced the learning to a tool that does it for them, which they then use to do their job, just as before.

loveparade · on May 8, 2023

No, the difference is that OpenAI has a huge competitive advantage due to direct partnership with Github, which is owned by Microsoft. In fact, it's even worse. With OpenAI making money from GPT, Github has even less incentive to make data easily available to others because that would allow for competition to come in. I wouldn't be surprised if Github starts locking down their APIs in the near future to prevent competitors from getting data for their models.

Nobody is arguing against uploading code. It's about Github/Microsoft specifically.

adlpz · on May 8, 2023

I agree there's a difference in the ease of access, a competitive advantage, sure. And I get that people writing public-source (however licensed) software don't want to make it easier for them (as in, Microsoft) to make money off of "learning" (of the machine type) from it. That's fair.

However, at a first glance, it still feels to me like an unavoidable reality that if you publish source code it'll eventually be ingested by Copilot or whatever comes next.

I mean, for the rest of the content all the new fancy LLMs have been trained with, there wasn't a Github equivalent. They just used massive scraped dumps of text from wherever they could find them, which most definitely included trillions of lines of very much copyrighted text.

In short: not only I don't really see an issue with Copilot-like AIs learning from publicly available code (as I described in the GP comment) but I also think if you publish code anywhere at all it's inevitable that it'll end up in Copilot, regardless of where you host it. If you want to make it more expensive for Microsoft to scrape it, sure, go ahead, but I don't think it matters in the long run.

bamboozled · on May 8, 2023

However, at a first glance, it still feels to me like an unavoidable reality that if you publish source code it'll eventually be ingested by Copilot or whatever comes next.

I’d be quite careful with of this view.

By your logic, it should be ok to take the Linux kernel, copy it, build it, then sell it and give nothing back to the community that built it. Then just blame it on the authors for uploading it to the internet ?