Of course not. Reading some copyrighted code can have you entirely excluded from some jobs - you can't become a wine contributor if it can be shown you ever read Windows source code and most likely conversely.
Likewise, you can't ever write GPL VST 2 audio plug-ins if you ever had access to the official Steinberg VST2 SDK. Etc etc...
Did people forget why black box reverse engineering of software ever came to be ?
That's not a law. That's a cautionary decision made by those companies or projects to make it more difficult for competitors to argue that code was copied.
Those projects could hire people familiar with competitor code and assign them to competing projects if they wanted. The contributors could, in theory, write new code without using proprietary knowledge from their other companies. In practice, that's actually really difficult to do and even more difficult to prove in court, so companies choose the safe option and avoid hiring anyone with that knowledge altogether.
Now the question is whether or not GitHub's AI can be argued to have proprietary knowledge contained within. If your goal is to avoid any possibility that any court could argue that GitHub copilot funneled proprietary code (accessible to GitHub copilot) into your project, then you'd want to forbid contributors from using CoPilot.
If humans did that, it would be hard to argue they didn't outright copy the source.
When a machine does it, does it matter if the machine literally copied it from sources, or first transformed it into an isomorphic model in its "head" before regurgitating it back?
If yes, why doesn't parsing the source into an AST and then rendering it back also insulate you from abiding a copyright?
You've hit the nail on the head here. If this is okay, then neural nets are simply machines for laundering IP. We don't worry about people memorizing proprietary source code and "accidentally" using it because it's virtually impossible for a human to do that without realizing it. But it's trivial for a neural net to do it, so comparisons to humans applying their knowledge are flawed.
I'm not sure why it's different, but that's a common concern with music. For example: https://www.reddit.com/r/WeAreTheMusicMakers/comments/4v8u8d...
I mean, if you used CoPilot on one computer, stared at it intensely for 1 hour, closed that computer, and then typed out code in the other computer that you were contributing from, you technically didn't use it for the contribution, you just used CoPilot for your education only.
Intellectual property is itself a flawed concept in many ways. It's like asking someone to do physics research but forbidding them from using anything that Einstein wrote.
Patents should have reduced with product lifecycles, copyright should be a similar period; maybe 10-14 years.
My personal opinion.
Does it have flaws and can it be improved upon? Sure. I think society underweights what improvements to the patent system in particular could do. But such ideas are so niche they are hardly even written down, let alone debated at large. Society has bigger issues on its mind.
Like any evolved system IP law encounters new challenges over time and will be expected to evolve again, which it will surely do. A simple fix for Copilot is surely to just exclude all non-Apache2/BSD/MIT licensed code. Although there might technically still be advertising clause related issues, in practice hardly anyone cares enough to go to court over that.
No category of intellectual property covers thoughts, so the question has no relevance to the preceding statement.
If you just used it for inspiration, that's fine; if the way it was coded is a result of technical constraints, that's fine too; if the code is generic it's not distinctive enough to acquire copyright in the first place.
and they made those decisions based on the need to be able to argue in court that code was not copied.
>then you'd want to forbid contributors from using CoPilot
Right, the whole thing about arguing if copilot spits out a ten line function verbatim is not really what will be the problem, the problem is a human programmer still needs to run copilot and they will be the ones shown in the codebase as the author of the code (they could of course put a comment 'I got this bit from copilot' but might be cumbersome and anyway would hardly work as proof), although I suppose it would be not just proprietary code but code with an incompatible license.
> and they made those decisions based on the need to be able to argue in court that code was not copied.
Yeah, but only to make it easier for them to argue it; the letter of the law doesn't require it. You could argue that "Sure, I read Windows source code once -- but that was years ago and I can't remember shit of it, so anything I wrote now is my own invention." That might be harder to get the court to accept as a fact, but it's not a prima facie legal impossibility.
Cautionary decision =/= actual law.
Okay, so it's not law, it's just a policy compelled by preceding legal judgements. Case law, perhaps.
In general, you're absolutely allowed to learn programming techniques from anywhere. You can contribute software almost anywhere even if you've read Windows source code. Re-using everything you've learned, in your own creative creation, is part of fair use.
Your example is the very specific scenario where you're attempting to replicate an entire program, API, etc., to identical specifications. That's obviously not fair use. You're not dealing with little bits and pieces, you're dealing with an entire finished product.
No - google's 9 lines of sorting algorithm (iirc) copied from Oracle's implementation were not considered fair use in the Google / Oracle debacle.
Likewise SCO claimed that 80 copied lines (in the entirety of the Linux source code) were a copyright violation, even if we never had a legal answer to this.
The Supreme Court decided Google v. Oracle was fair use. It was 3 months ago:
That's the highest form of precedent, the question has now been effectively settled (unless Congress ever changes the law).
Edit: added a dummy hash to end of URL so HN parses it correctly (thanks @thewakalix below)
> With respect to Oracle’s claim for relief for copyright infringement, judgment is entered in favor of Google and against Oracle except as follows: the rangeCheck code in TimSort.java and ComparableTimSort.java, and the eight decompiled files (seven “Impl.java” files and one“ACL” file), as to which judgment for Oracle and against Google is entered in the amount of zero dollars (as per the parties’ stipulation).
But I'm happy about all the new GPL programs created by Copilot
At least from a copyright point of few.
TL;DR: Having right, and having a easy defense in a law suite are not the same.
BUT separating it makes defending any law-suite against them because of copyright and patent law much easier. It also prevents any employee from "copying GPL(or similar) code verbatim from memory"(1) (or even worse the clipboard) sure the employee "should" not do it but by separating them you can be more sure they don't, and in turn makes it easier to defent in curt especially wrt. "independent creation".
There is also patent law shenanigans.
(1): Which is what GitHub Copilot is sometimes doing IMHO.
Perhaps Copilot behaves very differently from my own model, but I strongly suspect that the examples that have been going around twitter are outliers. Github's study agrees: https://docs.github.com/en/github/copilot/research-recitatio... (though of course this should be replicated independently).
I'm not really sure why we should consider Copilot legally different from a fancy pen – if you use it to write infringing code then that's infringement by the user, not the pen. This leaves the practical question of how often it will do so, and my impression is that it's not often.
That's the very reason why AI technologies can be useful in augmenting human intelligence; they see problems in a different light, can find alternate solutions, and generally just don't think like we do. There are many paths to a correct result and they needn't be isomorphic. Think of how a mathematical theorem may be proved in multiple ways, but the core logical implication of the proof within the larger context is still the same.
A human brain with an unlimited supply of pencils and paper, then.
This is wrong, this is not what Turing completeness is. It applies to computational models, not hardware.
You could argue that a stack is missing in my simplified model of the human brain, which would be correct. I used the simple model in allusion to the Chinese room thought experiment which doesn't require anything more than a dictionary.
In computability theory, several closely related terms are used to describe the computational power of a computational system (such as an abstract machine or programming language)
But that doesn't mean it's the only thing it does or even that it does it frequently. It's like calling a human a parrot because he completed a line from a famous poem when the previous speaker left it unfinished.
The same argument was brought up with GPT too and has been long debunked. The authors (and others) checked samples against the training corpus and it only rarely copies unless you prod it to.
There also are cases where this happens unintentionally, but those are not the norm.
These aren't your crazy uncle's Markov chain chatbots. They're sophisticated bayesian models trained to approximate the functions that produced the content used in training.
Github could make a blacklist and tell Copilot never to suggest that code. Problem solved. You use one of the other 9 suggestions.
Semi-related, the GNU/Linux copypasta is now more familiar to some than the GNU project in general - this is a shame to me as I view the copypasta to be mocking people who worked very hard to achieve what GNU has achieved asking for some credit.
What provision of copyright law are you referring to? Are you conflating copyright law with arbitrary organizational policies?
> Reading some copyrighted code can have you entirely excluded from some jobs
And they're right. It's because of corporate policies. They never said it was because of a law - you imagined that out of nothing.
@jcelerier flatly contradicted the statement that copyright doesn’t prevent you from reading something.
You’re right that @jceleier didn’t say their example was law, that’s because the example is a straw man in the context of what @lacker wrote.
And, who says improving or clarifying a comment is poor form? What is the edit button for, and why is it available once replies have been posted?
I think you added
> Which “it” are you referring to?...
Because I have a tab open and can see the old one!
Edit - I’m adding another point as an edit to show another way to communicate. Would any of your points been lost had you done something similar?
No that’s not true. I did not edit my posts after reading their reply, and the false accusation was that I changed my comment after it was replied to.
I didn’t challenge whether the question was in good faith, but I’ll just note that the relevant discussion of copyright got dropped in favor of an ad-hominem attack.
My question of which “it” was being referred to is a legitimate question that I believe clarified the intent of my comment, and I added it to make clear I was talking about what @lacker said, not what @jcelerier wrote.
> Edit - I’m adding another point as an edit to show another way to communicate. Would any of your points been lost had you done something similar?
This doesn’t answer my question of why an edit should not be made before I see any replies, nor of why any edit is “poor form” and according to whom. I made my edit immediately. I’m well aware of the practice of calling out edits with a note, I’ve done it many times. I don’t feel the need to call out every typo or clarification with an explicit note, especially when edited very soon after the original comment.
Replies exist before you read them.
> Of course not. Reading some copyrighted code can make you entirely excluded from some jobs - you can't become a wine contributor if it can be shown you ever read Windows source code and most likely conversely.
You can of course read the code. The consequences are thus increased limitations, like you say.
What you mention is not an absolute restriction from reading copyrighted material. You perhaps have to cease other activities as a result.
You've extrapolated "some organizations don't allow you to contribute if you've learned from the code of their direct competitor" to "You're not allowed to learn from copyrighted code", which is absurd.
If that's the case, it should be easy to kill a project like wine - just send every core contributor an email containing some Windows code.
The result would be WINE having an advantage to redo the snippet of code in a totally new and different way and MS being forced to show part of its private code, that would expose them also to patent trolls.
Would be a win-win situation for Wine and a lose-lose situation for MS.
...and the "if" is the important part. This is why Imaginary Property is so absurd.
I think OP has a point here, personally.