Transformed output _may_ fall under fair use. However - Copilot directly recites...

nonfamous · on July 8, 2021

> However - Copilot directly recites code.

You make that statement as an absolute, but in the interests of clarity, all evidence so far shows that it directly recites code very rarely indeed. Even the Quake example had to be prompted by the specific variable names used in the original code.

In practice, the output code is heavily influenced by your own context — the comments you include, the variable names you use, even the name of the file you are editing — and with use it’s obvious that the code is almost certainly not a direct recitation of any existing code.

shakna · on July 8, 2021

> all evidence so far shows that it directly recites code very rarely indeed.

_Once_ is enough for it to be infringing. The law is not very forgiving when you try and handwave it away.

mthoms · on July 8, 2021

You sound quite sure that the outlying instances of direct copying wouldn't be covered by the Fair Use copyright exemption. Any particular reason for that?

I tend to think it would be covered (provided it there were relatively small snippets and not entire functions).

jkaplowitz · on July 8, 2021

I'm not the person you're replying to, but one strong reason is that the global reach and standardization of copyright law is far broader than the global reach and standardization of the fair use exception. A single non-US country in which GitHub Copilot is used in a way that would be infringing without the US fair use exception, and outside the scope of any such exception in that law, would be enough to cause GitHub/MS a legal hassle. There could well be more than one such country.

mthoms · on July 9, 2021

Oh, absolutely.

I'm not American, but like others around here — I was just restricting the discussion to American law for simplicity's sake.

jkaplowitz · on July 9, 2021

Fair, but GitHub/MS (same company now) can't afford to ignore other countries' law in their internal evaluations of whether globally* available products like Copilot are legal.

* Minus a few countries/regions targeted by US sanctions, I assume, though they've gradually broadened their services in sanctioned countries with the necessary licenses from OFAC.

shakna · on July 9, 2021

Precedent. Google v. Oracle found 9 lines, of an "obvious" implementation to be infringing.

mthoms · on July 9, 2021

Right, but would 3-4 lines in the middle of a 50 line function also be infringing? What about 2 lines?

I don't know the answer. I was only surprised that the commenter seemed dead sure that any and all copying (no matter how small) would be infringing.

That just doesn't correlate with my understanding of how Fair Use works: The "amount" of the infringement is one (of several) factors in determining if something falls under Fair Use:

>The third factor assesses the amount and substantiality of the copyrighted work that has been used. In general, the less that is used in relation to the whole, the more likely the use will be considered fair.

From https://en.wikipedia.org/wiki/Fair_use

svaha1728 · on July 8, 2021

So if a foreign company pilfers the source code to Windows, can they add it to a training set and then 'prompt' the machine learning algorithm to spit out a new 'copyright free' Windows, just by transforming the variable names?

rkeene2 · on July 8, 2021

I think that's my question regarding this whole thing:

If it's so fair use, why not train it on all Microsoft code, regardless of license (in addition to GitHub.com) ? Would Microsoft employees be fine with Copilot re-creating "from memory" portions of Windows to use in WINE ?

nonfamous · on July 8, 2021

Well no, because only GitHub has access to the training set. But more importantly this misunderstands how Copilot even works -- even if Windows was in the training set, you couldn't get Copilot to reproduce it. It only generates a few lines of code at a time, and even then it's almost certainly entirely novel code.

Now, if you knew the code you wanted Copilot to generate you could certainly type it character by character and you might save yourself a few keystrokes with the TAB key, but it's going to be much MUCH easier to simply copy the whole codebase as files, and now you're right back where you started.

svaha1728 · on July 9, 2021

GPT-3 is still Microsoft licensed, but a similar model can be put together with the freely available GPT-2 and source code -- especially if your intent is copyright transfer.

As Francois Chollet points out in this talk, ultimately deep neural network models are locally sensitive hash tables, so the examples of people pulling out source code is an inherent shortcoming of deep learning models in general. Give the right 'key' and you can 'recall' the value you are looking for.

https://www.youtube.com/watch?v=J0p_thJJnoo

ThrowawayR2 · on July 8, 2021

> "However - Copilot directly recites code."

Sounds like that wouldn't be difficult to fix? Transform the code to an intermediate representation (https://en.wikipedia.org/wiki/Intermediate_representation) as a pre-processing stage, which ditches any non-essential structure of the code and eliminates comments, variable names, etc., before running the learning algorithms on it. Et voila, much like a human learning something and reimplementing it, only essential code is generated without any possibility of accidentally regurgitating verbatim snippets of the source data.

salawat · on July 8, 2021

At that point, can we all just agree IP is the stupidest concept to ever be layered on top of math (which programming is) and move on with non-copyrightable code?

jcheng · on July 8, 2021

Only if you agree that copyleft licenses are also stupid; without copyright, there's no way to prevent companies from making closed-source forks of code you wrote and intended to stay open.

ThrowawayR2 · on July 9, 2021

The whole point of copyleft was as a stepping stone to get to RMS's four freedoms (https://www.gnu.org/philosophy/free-sw.en.html) which effectively eliminates copyright for software.

jcheng · on July 10, 2021

Freedom 1: “Access to the source code is a precondition”

With no copyright/copyleft, how do you enforce the rule that derived works must provide access to the source code? I’ve never heard that copyleft was a stepping stone—rather, it’s the stick that fully realizes the four freedoms.

salawat · on July 9, 2021

Correct. Copyleft is idiocy as well. You don't really need a pay for a proprietary fork of a tool when no one can keep you out of the free one, and the proprietary stuff diffuses into the free option.

oefnak · on July 8, 2021

Yes, sure. Without copyright there's no need for copyleft left, right?

jcheng · on July 9, 2021

No...? Not unless that closed-source project's source code is leaked?

BlueTemplar · on July 9, 2021

You don't care about attribution and other moral rights ?

(I guess these are going to depend a LOT on the jurisdiction that you're in ?)

salawat · on July 9, 2021

I care, but in the long run, I care more about our descendants not having tools locked out of their hands. Facilitated information asymmetry is the root of far too many evils.

Where is your ego when you're dead and gone? Where could we be if the majority of human advancement we're not tightly clutched as trade secrets?

As someone who has done paid software engineering (yes, you can feel free to call me a hack or sell out if you wish), I've come to find that the salary I've pulled over the years has not gone to me... But keeping a roof over those I love, helping other people's projects grow, giving people a shot, etc.

My time on the other hand, gets dumped into implementing the same handful of processes doing the same damn thing, but different this time, because you can't just bloody make "Here ya go, here's your Enterprise-in-a-box".

I'd like people more people able to solve novel problems than necessarily need to retread the same path over and over. Some degree of that will always have to be done to keep the skills fresh in the population, but we could do way better at marshaling that split, and I'm convinced part of what necessitates it is creating artificial barriers through things like enforced implementation monopolization. Yes. It ensures a minimum level of novelty and variance across populations, but it also does terribly at not consuming the finite amount of human capacity for truly novel thought to innovate.

It may make societies that function based on greed and economic/fiscal measures work, but I'm not convinced other incentive structures won't keep the rolling stone of innovation from accruing moss.

BlueTemplar · on July 10, 2021

I don't understand what you're talking about, I'm talking about the non-commercial parts of the monopoly rights that are copyrights and patents, the non-commercial parts arguably aren't going to restrict the users much, and their commercial parts are temporary by design.

(Copyright has went IMHO overboard with its duration, we should scale to back to the original 14 years renewable once, just like patents, but copyright doesn't apply to processes anyway, and so arguably it shouldn't apply to software that can't claim to have any artistic merit.)