Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is GitHub Copilot the death of open source?
12 points by w4ffl35 on June 27, 2022 | hide | past | favorite | 49 comments
Were licenses violated (intentionally or otherwise) in the released Copilot model? I have seen lots of comments in which people would answer "yes" to this question. What is the official stance? Does Github do anything to mitigate license abuse? How is Github giving attribution where due?

Currently this feels as though after many years Microsoft / Github stabbed its users in the back by violating our licenses.

If open source licenses were violated only for Github to sell our laundered code back to us for $100 a year, what is the point in contributing to open source software? This feels like a massive scam.

Many years ago I became concerned that the various websites that constantly ask for free code contributions and solutions (hackerrank, stackoverflow etc) might be building AI based on that code. of course my coworkers laughed (coincidentally one went on to work at github, i wonder if he remembers our conversation) and here we are.


additionally all this


Edit 2:

What if I make a GPT-3 model trained only on famous sci-fi and fantasy authors and then get it to pump out some semi-ok fiction and then charge other people for it for $10 a month? Is that legal?

I have two thoughts on this:

* Copilot isn't nearly as useful as some are making it out to be - I've been seeing thread on Reddit talking about how Copilot will mean devs will now be out of a job, and a bunch of other hot takes about what this means for the industry. I personally think that it won't be all that revolutionary, very useful, but it's not going to disrupt anything.

* The "Software vs Snippets" argument: Say you have an OSS project that is MIT licensed. Copilot learns from some of your code and offers suggestions based on what you wrote to another developer for say a helper function or a script. Technically yes, your License was violated and your code was stolen. But when you think about it on a more practical level it's not that big of a deal. The point I'm making is Software is all about composing a bunch of snippets into something of value. A dev ripping off my build script or key extractor helper function isn't the same as them ripping off my entire app. One is just fine, the other is not in my eyes.

the big deal part is that it's a multi billion dollar corporation making money off of a derivative work while blatantly disregarding copyright.

I agree with the snippets take, Copilot functions like a super-powered search; instead of a Developer looking at code on Github and taking the snippet from it, Copilot just presents that snippet to the developer in a much more quicker fashion.

The problem is not about what it generates, its about how it got the data to generate it.

what is the difference? the output is the output

Why would someone selling your code make it pointless to contribute to open source software? All the permissive open source licenses allow other people to sell your code anyway! I'm not saying what Microsoft did was right at all, but I don't think it massively changes the dynamics of permissive (or even a decent bit of copyleft) open source development.

You contribute to open source because you want open source software to be better for you and for others, and Microsoft's use of it doesn't really affect that equation much (to me, at least).

If, on the other hand, someone was contributing to copyleft software because they dislike proprietary software & the corporations behind it, I could see this affecting their decision making, but even in this mindset, what's the other option? If your goal is FOSS and user-modifiable code, not contributing and or not open-sourcing your code is worse than Microsoft taking bits of it! The best strategy for this perspective to me is to remove all your code from GitHub, put it up on SourceHut or something self-hosted, and tell others to do the same (and don't buy Copilot, and tell others!)

> you want open source software to be better for you and for others

tell that to M$ who is making $100 per user for that free labor.

I don't understand this retort -how does M$ making $100 per user for free labor in any way affect users of your contributed FOSS code?

My use of KDenLive, Blender, Gimp, or OBS isn't affected at all by the existence of scamware rip-off versions on the internet.

why bother having copyrights at all?

Again, I'm not saying Microsoft is right - I think they're wrong - but Open Source would do just fine without any copyrights. Indeed, some great open source software is fully in the public domain (SQLite, for instance!) with all copyright explicitly waived. The lack of copyright doesn't affect me using SQLite at all, and I am very grateful for its existence. I generally license my code under very permissive licenses, and would go full public-domain myself if not for worry about it being ineffective in different jurisdictions, also wanting to give patent licenses, etc.

I take your point - NASA's github code is public domain for example. But we still need to respect existing copyrights and its very disheartening (but not unexpected) to see Microsoft and Github fail in that regard

Source code copyrights are the foundation that open-source licenses stand on, in the legal reality we occupy. In other worlds, there are probably better alternatives. But if Microsoft and GitHub would object to their own copyrighted code being used to train "FSFpilot" (and I would bet they would) then they can't use my code for their project either, not without following the license under which I gave them access to it.


> All the permissive open source licenses allow other people to sell your code anyway!

...with attribution. Which Copilot does not provide.

Perhaps my experience with Copilot is different, but I find that the code suggestions tend to model established patterns and best practices (sometimes) across algorithms, languages, and frameworks. I haven't seen any suggested code that surprises me with it's novelty or uniqueness. Many times, it's just interpreting my own code for patterns and saving me from some multi-cursor-fu.

The biggest value of Copilot for me is the time saved by not having to search Google and/or documentation. It feels like I'm working with a relatively knowledgeable pair programmer -- which is nice for a solo dev. The value compounds for me since I frequently switch between frameworks with pauses in-between.

Since there's concern across the open source community, maybe Microsoft should reconsider the business model and spin-off Copilot as a non-profit. The proceeds can be used fund the staff and servers required to run the service and profits can be shared with open source developers on Github.

The problem is how they got the data to generate anything in the first place.

They trained on code with restrictive licenses (according to reports from multiple users), trained on open source free code etc. and then this microsoft the multi billion dollar corporation that acquired your warm and fuzzy home is now selling that code back to its (often times already paying customers) for $100 per year.

they are taking more than they are giving, benefiting from free labor. classic corporate move.

People have been stealing open source code in much more meaningful ways for years. I'm more annoyed with, and think we'd be better off directing our ire at the countless embedded device vendors and commercial software vendors who ship GPL binaries with no source, rather than splitting hairs over CoPilot.

I find Copilot distasteful and frustrating, but I don't think the misappropriation of code it performs is particularly meaningful. I think it's worth pushing back on, surely, as a matter of principle.

But will it "end open source," or do I think about it when I make contributions to open source code or provide solutions to others? Absolutely not.

Many software engineers contribute to OSS for reasons that have little to do with money. They publish code in the open for free in the expectation that other engineers would look at it and learn from it, and while at that appreciate the effort and expertise of the original author.

Copilot takes this away, in that a megacorp now uses the above contributions to provide a paid solution that essentially removes the need for people to even know you exist while allowing them to use the results of your work.

I don't think it's a death of open source, but it's definitely a mini-death of Github. At least myself personally I'm inclined to switch to SourceHut, which so far seems to be trustworthy, no-bullshit and with clear business model.

People could copy-paste code without following licenses before too.

It might cause some people to lose faith in license enforcement, it might cause some people to lose faith in GitHub, but I don't see a scenario where it "kills" Open Source.

This is a massive corporation (Microsoft) showing blatant disregard for license which is very different from small scale individual abuse

The abuse is mostly academic: What they're doing is _technically_ against copyright law. In practice, someone who wrote a serious JavaScript might have some utility functions "stolen" by Copilot, but it isn't as if it's going to emit forty files of cohesive and well-documented classes. Preact isn't going to suddenly appear in a proprietary codebase of IBM. Maybe their cloneElement function - which is technically copyrighted and licensed, but it isn't why those collaborators decided to spent their time on Open Source.

well, none of us stand to gain from that technicality being violated, only M$ does, and the law does thrive on technicalities.

The law thrives on technicalities, Open Source isn't doomed by them. Lawyers might make money from this, Microsoft might make money from this, but Open Source won't die from this.

if this is legal then what prevents us from scraping github, gitlab etc. and making the same product?

edit: my guess is github / gitlab etc terms of service would probably prevent this.

I didn't say legal, but yes: you can build your own product that does that. It is a hard product to build, but yes. Copilot actually has several competitors, some of them have existed for a while, but they have worse marketing than GitHub because everyone knows GitHub :)

Does use of Copilot absolve the user from dealing with copyright and license infringements?

An author of infringed source code could always make a claim.

I can't imagine that it does.

That's a problem for releasing some things made with Copilot to the public.

And it's a gray-zone for closed-source software, for which any infringement could be hidden.

I don’t care if a contribution I make is regurgitated by a model. It doesn’t affect my willingness or reason to contribute, because it doesn’t somehow make my contribution any worse. I contribute somewhere, and what I contribute to gets better.

The copyright-vs-copilot thing is legally interesting, but I really can’t see who would stop contributing to projects because of it? My amateur armchair lawyer guess is also that using a nontrivial piece of copyrighted code without permission won’t be defendable by “copilot wrote that, I didn’t”, so copilot or not, code won’t magically just be washed of copyrights.

They claim it's fair use or somehing like that.

I am not sure either way tbh. It really depends wether it repoduces code segments verbatim or just fragments. To use writing as a metaphore, does it reproduce sentences or entire paragraphs. The latter seems iffy, the former seems ok to me.

The fact that they trained on those code bases means they ingested the code. IMO they used libraries and didn't give attribution.

AI doesn't actually "learn", it makes inferences based on the reinterpretation of data fed to it by a human. their model is a derivative work.

I would think this point was already settled in the Google vs Oracle case.


> The inquiry into the “the purpose and character” of the use turns in large measure on whether the copying at issue was “transformative,” i.e., whether it “adds something new, with a further purpose or different character.” Campbell, 510 U. S., at 579. Google’s limited copying of the API is a transformative use. Google copied only what was needed to allow programmers to work in a different computing environment without discarding a portion of a familiar programming language. Google’s purpose was to create a different task-related system for a different computing environment (smartphones) and to create a platform—the Android platform—that would help achieve and popularize that objective. The record demonstrates numerous ways in which reimplementing an interface can further the development of computer programs. Google’s purpose was therefore consistent with that creative progress that is the basic constitutional objective of copyright itself. Pp. 24–28.

Creating a model out of training data is transformative. If we apply the Google assessment to Co-Pilot, then Co-pilot is non-infringing as it is only outputting jumbled snippets of source code. The expectation would be that the programmer will decide what to edit and what to compile. The only contention I could see is if Co-pilot outputs whole source code verbatim.

A year ago, Co-pilot was in beta and wasn't production-ready. I don't think it should be held against Microsoft.

what makes you think the product no longer trains on code it shouldn't be touching?

Training on open-source code isn't a problem in and of itself. As I said before, the training is likely transformative. Co-pilot operates as little more than a statistical recommendation engine.

The biggest issue is the final output. I don't know if Microsoft did anything in particular to solve the issue of verbatim outputs. But whatever the case, I'm of the view that since Co-pilot is now production-ready, Microsoft may now be held legally responsible should such problems arise.

Well I guess we have to invoke a strong copyright, that protects every piece of code, so that no one can steal any of it in any way. And then you'll have your death of open source. It's code, there shouldn't be any copyright to it.

Interesting that someone would define code being copied more broadly than it is intended to as the "death of open source". I'm not saying that licence violations are a good thing, but this is a weird take.

HN commenters cannot seem to decide if Copilot is not even worth $10 because what it can do is so basic and trivial or if it is somehow the "end of open source" because it can do so much.

The latter reaction seems hyperbolic. I have not used copilot yet so cannot comment on whether it is worth $10. It seems fairly useful but from what I have seen in demos it feels like fair use.

Open source code is open. It has likely been used to train or influence all sorts of static analysis, formatting and vulnerability tools. Are those also a problem?

Hopefully some court challenges will happen and we can get an answer as to the legality. Whenever things like this come up though, I wonder why people contribute to open source in the first place if they are going to be so bothered by their code being used by others. We all know this, or ought to, when we start contributing. There are companies that are going to make money from our contributions.

My knee jerk reaction is outrage, but the more I think about it, the more I realise the only realistic audience for a tool like this is those who are learning or just not very competent yet. If I'm happy to waste my time answering people on Stack overflow, then why does it outrage me so much when a machine takes my knowledge and uses it to help beginners on my behalf? If co-pilot was a human that read the entirety of github and then helped people code for $10/month I wouldn't have any qualms, I'd probably commend them. The reality is copilot isn't capable of creating any real IP, it's just building off the very same foundations of open-source that I have also built my career off. I'm conflicted.

That said, I do believe it's outrageous there is no way to opt-out. I know that's very unpractical given the resources required to train the model, but presumably at some point it will be re-trained and there should be the ability to opt-out of future training cycles.

The longer the product exists, the less it will exist only for the entry level developer. It's naive to assume this is its final form.

I guess we should save our outrage ("the end of open source!") for some further developments in the space then?

Seems like we haven't even reached the slope yet, much less slipped on it.

i'm an outrage angel investor

I work on the ML field so my understanding of the product is not based on naivety. It's an NLP model so all it's doing is learning what "words" (or in this case syntax, function calls etc.) go together well. It has zero understanding of what each function call does and hence is incapable of doing anything beyond a glorified, contextually aware, copy & paste. It certainly wouldn't be able to create new IP without heavy influence from the developer. Writing the actual code is the least difficult part of software development to anyone above entry level.

these are great points. perhaps the question i should have asked was:

is this the death of open source licenses?

Other than the SCO Unix/Linux suit, I cannot recall OSS license discussions around a few lines of code. It usually seems to be about direct usage of OSS libraries or applications. That is why I cannot get too worked up about Copilot yet. Just practically speaking, it seems like the scope of what it would do is pretty minor.

So my answer to this question, would be No.

The problem I have with it isn't the few lines of code it generates, its the ingestion of the millions of repos from millions of devs, creating an AI model from that free code, and then charging for the product. I'm already a paying customer, I should get free access this thing that was built on free code and free labor that gives absolutely no attribution to the original authors.

my side concern (which may be overblown) is that this is the first step to completely replacing programmers and consolidating finances and power even further.


additionally, I'm concerned about the "Death" of open source as in: if Microsoft can violate these licenses, what trust remains in the licenses? If people put restrictive licenses on their open source libraries, and now they know these can be violated, will they continue to support open source? And if so, what is the point of the license to begin with?

IANAL, but at this time, I do not see how any of the licenses have been violated. I guess we will learn more if someone steps up to challenge that. That seems to be the inflection point in your argument that makes it hard to discuss where you want to take this.

Going back to an earlier point, we do not know all of the analysis that has been done, whether academic or commercial, by examining open source code. Whether it is on issues of code quality or style, velocity, fix rates etc. Are all of those things also violating the license if that analysis leads to commercial success? What of GitHub itself and how it has improved or based features on what it learns from open source projects and how they use the service?

At this time, from what I have seen so far, I fail to see what is so different or special about Copilot. It did not illegally obtain the source code it used to train the model. Whether using the code to train the model is illegal itself, is another question but if it is then what other uses of the source code are illegal?

This is the price for free github.

All your codes are belong to us.

I feel like a broken record, but "Opensource is not Github" and "Github isn't opensource."

It's time people remember open source existed before Github and it will exist after it. If you don't agree with Github's direction and decision, please use something else.

I have been incredibly happy without Github for the last two years and never plan on going back.

Correct, but M$ is challenging all open source licenses by violating them and charging for the product.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact