I am really confused by HN's response to copilot. It seems like before the twitter thread on it went viral, the only people who cared about programmers copying (verbatim!) short snippets of code like this would be lawyers and executives. Suddenly everyone is coming out of the woodworks as copyright maximalists?
I know HN loves a good "well actually" and Microsoft is always suspect, but let's leave the idea of code laundering to the Oracle lawyers. Let hackers continue to play and solve interesting problems.
Copilot should be inspiring people to figure out how to do better than it, not making hackers get up in arms trying to slap it down.
If you're asking about the moral reaction here, I think it depends on how one views Copilot. Does Copilot create basically original code that just happens to include a few small snippets? Or does Copilot actually generate a large portion of lightly changed code when it's not spitting out verbatim copies of the code? I mean, if you tell Copilot, "make me a QT compatible, crossplatform windowing library" and it spits out a slightly modified version of the QT source code and if someone started distributing that with a very cheap commercial license, that would be a problem for the QT company, which licenses their code commercial or GPL (and as QT a library, the QT GPL forces user to also release their code GPL if they release it, so it's a big restriction). So in the worst case scenario, you can something ethically dubious as well as legally dubious.
Copilot should be inspiring people to figure out how to do better than it, not making hackers get up in arms trying to slap it down.
Why can't we do both? I mean, I am quite interested in AI and it's progress and I also think it's important to note the way that AI "launders" a lot of things (launders bias, launder source code, etc). AI scanning of job applications has all sorts of unfortunate effects, etc. etc. But my critique of the applications doesn't make me uninterested in the theory, they're two different things.
A naive developer thinks that they are the source code they write (you're not), and their source code leaking to the world makes them worthless. (Which isn't true, but being that invalidated explains a lot of the fear. Which, welcome to the club, programmers. Automation's here for your job too.)
Still, some of the moral outrage here has to do with it coming from Github, and thus Microsoft. Software startup Kite has largely gone under the radar so far, but they launched this back in 2016. Github's late to the game. But look at the difference (and similarities) in responses to their product launch posts here.
A naive developer thinks that they are the source code they write (you're not), and their source code leaking to the world makes them worthless.
Maybe Github isn't violating the licenses of the programmers who host on them. Maybe Copilot doesn't just spit out code that belongs to other people. Those are matters of interpretation and debate.
But if Github was doing this with Copilot, virtually an open source programmer would have a reason to be upset. Open source programmers don't give their code out for free they license it. This is a legal position, not a feeling. "Intellectual property" may be a pox on the world but asking open source developers to abandon their licenses to ... closed source developers, is legitimately a violation.
And before the spitting out source code problem appeared, I recall quite a few positive responses to Copilot. Lots of people still seem excited. And yeah, people are looking at the downside given Microsoft's long abusive history but hey, MS did those thing.
You've answered your own question. They went under the radar and nobody cared about them. They're not the multibillion company that sued Mike Rowe and keeps ReactOS developers awake at night.
Try doing any type of deal (fundraising, M&A) where you can't point to the provenance of your application's code. This isn't good for programmers, programmers WANT clean and knowable copyrights. This is good for lawyers, who'll now have another way to extract thousands of $$ from companies to launder their code.
If you do get sued, the Copilot page is written in a way that would make Github legally responsible for it, not you. "Just like with a compiler, the output of your use of GitHub Copilot belongs to you."
Yeah, right... This isn't going to fly in court any more than if the Pirate Bay page was written in a way that says that it's solely responsible for what you do with the magnet links that they share.
On many ML posts, you get arguments about IP, and there's a long history of IP wars on this forum, especially when licensing comes up. Then you add the popular Big Tech Is Evil arguments you see. I think it's a variety of factors coming together for people to be upset about someone else profiting from their own work in ways they didn't mean to allow.
I expect that we'll need new copyright law to protect creators from this kind of thing (specifically, to give creators an option to make their work public without allowing arbitrary ML to be trained on it). Otherwise the formula for ML based fair use is "$$$ + my things = your things" which is always a recipe for tension.
I think the real issue is less about the "copying short snippets", and more about how it was done, i.e zero transparency, default opt in without any regards to licensing (with no way to opt out??) and last but not least - planning to charge money for it.
I've always cared but never talked about it. Someone copy and pasting code from a source that is clearly forbidden (free software, reverse engineered code, leaked source code, etc) isn't an interesting thing to talk about. It's obviously wrong.
Also people rarely do it; I've caught maybe a couple instances of it in my career and I never really thought too much about them again. This tool helps make it a lot easier and more common. I have a feeling other people chiming in are also in the camp of "Oh, this is going to be a thing now, huh?"
I also can't help but to think that my negative opinion of it isn't solely based on this provenance issue. While it's cool it seems questionable about how practical it is. If the value was more clear I think I could stomach the risk a bit better.
Firstly it's important to remember that HN is not a single person with a single opinion, but many people with conflicting opinions. Personally I'm just interested in the copyright discussion for the sake of it because I find it interesting. Though, I imagine there's also an amount of feelings of unfairness.
As a mature, skilled engineer, you wouldn’t mind sharing your knowledge—but you’d really prefer to do this on your own terms.
First, you might choose to distribute your code under a copyleft license to advance the OSS ecosystem. Second, the older you get, the more experience you accumulate, paradoxically the harder it is for you to find a job or advance your career in this industry—so, to maintain at least some source of motivation for tech companies to hire you, you may choose to make some of the source available, but reserve all the rights to it.
You’re fine making the source of your tool or library open for anyone to pass through the lens of their own consciousness and learn from it, but not to use as is for own benefit.
Now with GitHub Copilot suddenly you see the results of your labour you’ve previously made (under the above assumptions) public being passed through some black box, magically stripped from your license’s protections, and used to provide ready-made solutions to everyone from kids cheating at college tests to well-paid senior engineers simply lacking your expertise.
I hope it’s easy to spot how engineer’s interests in the above example are not necessarily aligned with GitHub’s, how this may be perceived as an unfair move disadvantaging veteran rank-and-file software engineers while benefitting corporate elites and investors, and subsequently has the potential to disincentivize source code sharing and deal a blow to OSS ecosystem as a whole.
Perhaps people on HN start sensing that successors of Github Copilot will take their programming job. Rightly so.
Personally, I think that in the age of AI programming any notions of code licensing should be abolished. There is no copyright for genes in nature or memes in culture; similarly, these shouldn't be copyright for code.
> Perhaps people on HN start sensing that successors of Github Copilot will take their programming job. Rightly so.
I still think we're a long way from that. Copilot will help write code quicker, but it's not doing anything you couldn't do with a Google search and copy/paste. Once developers move beyond the jr. level, writing code tends to become the least of their worries.
Writing the code is easy, understanding how that code will affect the rest of the system is hard.
Based on the responses I've seen, people have it in their heads that Copilot is a system where you describe what kind of software you want and it finds it on Github and slaps your own license on it.
Depends on your definition of "a long way". Some of the GPT3 based code generation demos (which, explicitly, are just that - demos - we aren't shown the limitations of the system during the demo) say that's closer than I think.
> Perhaps people on HN start sensing that successors of Github Copilot will take their programming job. Rightly so.
I feel like this comment misunderstands what a software developer is doing. Copilot isn't going to understand the underlying problem to be solved. It's not going to know about the specific domain and what makes sense and what doesn't.
We're not going to see developers replaced in our lifetime. For that you need actual intelligence - which is very different from the monkey see monkey do AI of today.
The thing is that understanding the domain and thinking out a fairly efficient or elegant solution is something a lot of industry specialist and scientists can do, and only part of programming. Another part is dealing with all the language syntax and specialist lego bits/glue code, and that's something domain specialists tend to be less good at and not enjoy spending time on; it's its own craft.
Having a semi-intelligent monkey that can fetch obvious things off the shelf, build very basic control structures, and do the boring little housekeeping tasks is bad for the craft of programming but very good for the good-enough-solution situation. I can see it having the same impact as cheap and widely available digital cameras; anyone can be a kinda decent photographer now, but if you want to be a professional you're probably going to have to work a lot harder to stand out, whether that's by development of craft, development of narrow technical expertise and fancy equipment, or development of excellent business skills.
The funny thing with "good enough" solutions is that at some point it becomes unmanageable. I've basically spent a good part of my career cleaning up these solutions to make way for scalable, maintainable solutions that don't introduce security holes.
Photography is a good analogy - with everyone having fancy cameras you could think that a photographer is now not necessary. But yes there are still photographers about - they see things that the average person doesn't. The camera doesn't tell them what type of photos to take, what composition the photo should have or what poses a model should have.
You have excellently described the job of business analysts and system architects, but this is not the job of 90% of programmers today, including senior-level. Part of this is already done by other people and doesn't require specific programming skills, hence, at the very least, programmers will lose their privileged position. Another part of it is actually too hard for most people who are currently employed as programmers to do on a decent level (such as meaningfully hacking on Linux kernel).
Memes are absolutely copyrightable, heard of Grumpy Cat?
New genetic sequences are patentable, not copyrightable, but that because of the process involved in creating new genetic sequences more then the genes themselves.
Sure naturally occurring genes aren't patentable, but it's not like we have code growing on trees. So that's a terrible comparison.
The problem with Copilot is, that so far it doesn't seem to be much of an AI and more of an copy-bot. If you are just copying code, you quickly run into copyright issues with your sources. A true AI based on training on open source software would be something different.
Patents on genes actually are a thing. So that example is pretty false. Whether they should be a thing is a separate question, but right now discovery of a gene and it's usefulness can be patented and is done for medical patents.
You don't have to be a copyright maximalist to worry about a company taking snippets of code that used to be under an open license and using them in a closed-source app.
In addition, this is extremely hard to enforce. I think the amount of code running in closed systems that does not exactly respect the original license is shocking. What was the last case you know where this was a "scandal"?
It only happens at boss level when tech giants litigate IP issues.
I don't know about HN in general but my impression has been that anyone copying random code off the internet or adding dependencies without understanding the license (e.g. just blindly adding AGPL code) would be very much frowned upon in any remotely professional setting because a basic understanding of copyright and open source licensing is expected of even junior developers.
"Hackers" "playing" and ignoring copyright is fine, but Copilot isn't promoted as a toy, it's promoted as a tool for professional software development. And in that framing it is about as dangerous as an untrained intern with access to the production server.
I'm more surprised that people don't care about the telemetry aspect. It's an extension that sends your code to an MS service, and MS promises access is on a need-to-know basis.
I don't care if MS copies my hobby projects exactly, but I'm not sure my employer(defense contractor) would even be allowed to use a tool like this.
I think it looks cool though. I will probably try it out if it is ever available for free and works for the languages I use.
It's quite possible to do this on-prem and even on-device. TabNine, a very similar system with a smaller model (based on GPT-2 rather than 3), has existed for years and works on-device.
Is it really confusing? It's a rich company using the fruits of our labor, provided free TO OTHER DEVELOPERS. I have never okayed "use my code to train AIs that nobody else could". It's backhanded and unfair.
This isn't true at all. There are stories concerning code stealing that regularly lead the front page on HN and rouse a pretty intense reaction from the community. Saying that HNers have never before cared about this issue seems pretty inaccurate or disingenuous.
Copilot violates the assumptions many people made when they open sourced their code. Moving from manual to automated use feels like a privacy violation because it dramatically changes the amount of effort it takes to leverage the work in an unintended context.
> Copilot should be inspiring people to figure out how to do better than it, not making hackers get up in arms trying to slap it down.
One of the (many) problems is that GitHub/Microsoft already benefit from runaway network effects so it’s difficult to “do better”. Where will you get all of that training code if not off GitHub?
The real answer to this is to yank your projects from GitHub now while you search for alternatives.
Even if you do that, what's to stop them from using open source software from all over the web and not just what's on GitHub? The only way to stop them then is to go closed source.
I mean stop them at a larger level by threatening their success as an organization. If developers stop publishing to GitHub they have bigger problems than training ML models.
Whether or not this move is “legal”, it should serve as a wake up call that GH is not actually a service we should be empowering. This incident is just one example of why that’s a bad idea.
Copyright defends us from some of the abuse by large corporations in the form of the GPL.
Want Linux to run on your thing? You must publish driver source then or you're violating copyright law. This was less a big deal before device vendors ratcheted the pathological behavior up to 11 with smartphones and that's why far more people seem to react far more strongly now.
I know HN loves a good "well actually" and Microsoft is always suspect, but let's leave the idea of code laundering to the Oracle lawyers. Let hackers continue to play and solve interesting problems.
Copilot should be inspiring people to figure out how to do better than it, not making hackers get up in arms trying to slap it down.