For those that might not follow the link: “[ai code] is presumed to be tainted (i.e. of unclear copyright, not fitting NetBSD's licensing goals) and cannot be committed to NetBSD”.
So more of a provenance issue w.r.t licensing than an ideological one.
I thought it'd just be logistics. This is highly specialized OS code and I barely trust Copilot to do more than the most simple autocompletes. No way 99.9% of AI gen would pass a proper peer review.
Interesting stance to take. I do wonder how they'll verify it, though. There is no quantifiable way to identify if code is AI written, other than the general 'vibe'.
I'm not sure I would call it an "interesting stance", as it's barely a stance and not at all interesting IMO . It's what they think is legally required to accept code given their license.
If there's a chance the providence of the code is tainted, the only thing that makes sense that doesn't cause the possibility of immense problems later is to reject it.
Yep, the world isn't always black and white. If I were to use Copilot to autocomplete the code I would've written myself (which is VERY often the case) I don't think it'd fall under this policy unless made overly obvious in some kind of rebellion act
Their issue is "unclear copyright, not fitting NetBSD's licensing goals". Non-AI code with incompatible copyright/licensing is already banned even though there's no way to verify its provenance, so there's no change there.
Honesty boxes work with honest players. If they have to reject good code until they can trust the submitter, I'd be fine. Trust is human, and takes years, decades even, to be made.
Supply chain attack is knocking on the door. Maybe less people should work on OS kernels not more?
This is part of the contract all new committers are required to sign. Becoming a NetBSD committer requires membership of the NetBSD Foundation and there's an understanding that trust is expected.
And, although I suspect the original comment was made sarcastically, not at all at odds with the reasoning why they would reject submissions with AI generated code.
I think ultimately text is too structured by itself to be able to understand whether it's AI-generated or not. You can easily tell LLMs to write in a different style or even give examples, and they'll happily oblige. Those tools are mostly useful for filtering out the laziest users who haven't tried playing around with the models enough.
if it's good and it works well, then the human using the LLM did a good job, and need not credit their LLM of choice as one would not credit their rubber duck.
This thought process begs the question, why not just review code and deny what's bad, rather than blanket ban any code with LLM involvement? That way you retain the productivity benefits of LLMs, and can still deny flawed code.
We also need lawsuits painful enough that companies stop trying to get away with the automated plagiarism, because that's how it's being used, and they know it.
Now, should we [care about getting sued]? Take the number of [copilot users] in the field, A, multiply by the probable rate of [a user getting sued], B, multiply by the average out-of-court settlement, C. A times B times C equals X. If X is less than the [opportunity cost of missing out on user training data], we don't [care].
I'm only aware of these particular projects, but I imagine there are others. Anyone know of any FOSS projects that have explicitly said they are okay with AI-generated code?
Licensing which is already a contentious subject within the open source community just got another layer of confusion. I agree with their stance here currently but it's something that needs to be addressed as LLM coding helpers are becoming the norm.
It's a good stance to avoid being sued somewhere down the line. I don't think it's really about taking an ethical position on the matter. I could be wrong though, I've never met the fine folks at NetBSD.
Like others have pointed out, it's nearly impossible to police, and to what degree. Boilerplate code is boilerplate code for example, and whether you were assisted in writing it, or you wrote every bytes through your fingers is irrelevant, as it's an obvious continuation.
At this stage, you just have to assume that every publicly available code, regardless of the license, has been gobbled up by LLM training data, maybe multiple times over, and the cat is out of the bag. You might just want to protect your IP by being clear about keeping your hands clear of it all.
They point to copyright/licensing concerns, which I do understand considering the way datasets for most currently popular LLMs have been sourced, i.e. large-scale crawling of any accessible source. In that light, most smart code completion offerings should be fine since they historically did not rely on as broad and widely crawled a data set and were generally offered with clear licenses attached. Also, the number of lines these provided historically was lower, which appears to be part of their concern.
In the same light, I feel that LLMs that have been solely and verifiably trained on M̶I̶T̶ MIT-0 or 0BSD should be fine, and this may just be a temporary decision until those become more commonplace.
In general, though, this will be hard to enforce, but I do understand why they took this stance considering the current flock of LLMs and their training data. Drawing a line in the sand on licensing, even if they cannot truly ensure this with absolute certainty, is understandable and feels appropriate for them, especially considering LLMs have now been in common use for coding tasks for multiple years, so this does not seem to be a knee-jerk reaction.
Thanks. The number of lines aspect goes back to what GNU considers "legally significant" [0]. They are of course not the be-all and end-all in this regard, but felt like a decent reference point when making the post.
> code generated by a large language model or similar technology (e.g. ChatGPT, GitHub Copilot)
so I suspect it's actually pretty unambiguous; I've never seen an LLM that was plausibly mistakable for any other technology. (Though if this is merely my ignorance, please suggest an example)
I can understand why they made this policy but it feels a bit like the “Red flag laws” requiring motorcars to have a person with a flag walking in front.
There will be a time in not-so-distant future when people look back at this kind of policy as silly.
i doubt it. it's easy to draw an analogy to early cars, or any kind of tech. revolution and say: "ahah, these folks are just like the Luddites, pbththth"
this is fundamentally different. this is a machine, a mathematical equation, that converts enormous amounts of history and facts into a facsimile of creation, thought, expression. it does it convincingly in very small doses, but it doesn't do it all that well. unless your task is to regurgitate something relatively mechanical.
a car was something new. a mechanical horse. it wasn't some distillation and averaging of every extant horse into some "perfect" mutant horse. it was genuinely new. it also wasn't a mirage or a hallucination.
AI is not a creative force, rather reductive. it's at best a tool. so if you want to copyright something or you want to contribute to NetBSD, sure, that's fine. but copy-paste of someone else? no matter what the underlying process, it's still wrong.
Hey now, the Luddites were not anti technology! They were highly skilled laborers who wanted to combine their expertise with the autoloom to increase overall throughput and split those gains with the factory owners.
The factory owners said, "lol, nah" and fired them all to hire cheaper labor to run the autolooms so they could keep the increase in profitability to themselves.
The Luddites were rightfully pissed and wrecked shit. The factory owners, had they been less greedy, could have kept the Luddites on board.
Small note: Zermatt (the town you are probably talking about) is not the only car-free one in Switzerland.
There are quite a few others[0], 10 towns in total, according to this website.
"Not-so-distant" indeed. IMHO this sounds just like banning people from using spell checkers when writing documentation. Legal paranoia aside, it's completely untenable in even the short term.
I've found Copilot to at least be useful as an autocompletion tool. I write a method with a name that forms a pattern and then start writing a method which is just copypasta with a few things changed and Copilot is often sufficient smart enough that hitting <tab> really will save me some typing. Don't see how that's bad, or how anyone in the world is going to be able to detect that I'm using it.
[You could argue that it is bad because the pattern may change, but not having to type it out allows me to think about that instead of the typing -- and a super common bug for me to write is where I copypasta something X times and forget to fix a detail in one of them]
Good. If you don't understand the code you're submitting, you shouldn't be submitting it. If you do, then you're unlikely to submit code that looks like it's AI-generated anyway.
The stated reason is licensing, but after seeing some of the crap code that AI-crutch-coders have tried to pass as worthy in some other projects I've seen, it's good to see more pushback against drowning in mediocrity.
This is a wise short-term solution to prevent potential legal exposure. Long-term, I think we will need to figure out how to deal with IP in AI-generated code.
instead of a blanket ban, it would be better to formulate detailed guidelines for AI-assisted programming, clarifying under what conditions such tools are allowed or prohibited. Developers can be required to conduct rigorous manual review of AI-generated code and make special annotations when submitting, rather than broadly labeling it as "defective". Unconditionally rejecting AI-generated code may miss out on the benefits of technological progress.
Their stance isn't that AI-generated code is "defective", it's that its copyright and licensing status is unclear, and they don't want to be in legal hot water some years down the line because they can't be sure of the provenance of the code they commit.
I will be honest -- I have done little work or used chatgpt, co-pilot, or other [ai code] generation tool.
I can understand why they have done this, though.
I mean, any question you ask is going to be logged as well as the output/reply. I do wonder if there are some legalities to this decision not just the quality of code being (simply) copied and pasted into projects.
For example, lets say I create a game and its 90% code from chatgpt, and I sell the game on steam for some pennies. Could the company behind chatgpt claim some kind of ownership... even if I modified the generated code most of the time? Afterall, they can see the questions I have asked, and the results generated. I am sure they could figure out the projects its for and investigate, right?
Obviously free/opensource projects is a little different... but there could be other legal factors unless they specify "we do not accept ai generated code" or similar.
It is certainly something I would put in place for my own public projects.
They're free to sabotage themselves and miss out on this new wave of productivity, of course...
I wouldn't want to ever work without Copilot again. So frustrating to not have my magic autocomplete when I have to make some changes in a basic editor.
I'll really don't understand all the people flexing their copilot subscriptions. It's kind of a huge self own... Congrats, the software you write is banal enough to be AI generated.
It's like all the kids who use it to write their papers. Great, you didn't get caught, your instructor agrees that you are the worlds most average and uninspired writer.
They flex their copilot subscriptions, but they are not flexing how they’ve done advent of code with common lisp. They flex how they can chat with a book, but not what they’ve learned from it. They flex how they can generate pretty pictures, but they can’t say what it means to them.
It seems to do pretty well with zsh, go, python, and rust when I use it. Even for more complex tasks - but "context-aware universal snippets/boilerplate suggestion" already sounds pretty valuable, doesn't it?
If the rate of change in the NetBSD codebase was 1/10th of what it is now, I'd be content. The idea that LOC or speed of change is a good measure for a 40+ year old operating system doesn't work for me.
You should get committer flag in NetBSD because of how LITTLE new work you bring to the table, and because of how MUCH code you REMOVE.
Then let me remove posix semaphores and replace it all with futexes. It'd be great to not need a file descriptor every time one of my threads wants to wait on a mutex or condition variable.
If that's your first target for removal, I'd wager a large amount of money that you're [unfamiliar](https://man.netbsd.org/compat_freebsd.8) with the NetBSD code base.
I only really have experience talking directly to the kernel abis, i.e. the stuff in netbsd/sys/kern/syscalls.master. I'm using sem_open / sem_wait / etc. as my futex replacement. What abi should I be using? https://github.com/jart/cosmopolitan/blob/master/third_party...
It can be useful just for experimentation, too. You don't have to commit all the code it writes.
Most of the time it just feels like an autocomplete that is actually useful, it's scary good at writing the rest of the line exactly the way I was going to.
Anyway, your argument seems much more reasonable than the original post - they're just faffing about copyright which is completely pointless since Microsoft already declared they'll take care of any copyright lawsuits related to Copilot.
Microsoft declared they will assume liability for commercial customers who can prove they complied with all relevant requirements. You just showed why a broad policy is safer.
If Copilot significantly increases your productivity, you're either:
- statistically, a below average programmer, or
- a really, really, slow typer.
Writing code is the easy part. Copilot, and transformers in general, are only able to generate "statistically average ish" output. A statistically average ish coder is also pretty bad at coding (this stands to reason, or else programmers wouldn't complain about how bad every other programmer's code is). So if Copilot makes you more productive, that's a bit of a red flag.
If, on the other hand, you're a really slow typer, then unfortunately you're out of luck, because nobody has figured out a superior input method to pressing buttons. I think in this case, Copilot is an OK solution, but you'll find yourself correcting most of the code it writes anyways.
It's not really that clear that LLMs would be particularly useful at generating the type of code someone working on NetBSD would find useful. They don't really seem to excel at writing memory safe(ish) low-level code or solving any novel problems
(and for boilerplate there isn't any fundamental difference between copy-pasting it from ChatGPT or anywhere else copyright wise since it's practically identical and unverifiable)
If you think that what sets you apart from other developers is an AI autocompleting some of your code for you, then I'm not sure what your value proposition as a developer is. Typing the code is the easy part.
I have no problem with that. But let's not kid ourselves that 'automating' a bit of typing elevates a developer to some godlike level. A good chunk of regular developers point and click their way through their day, use dogshit editors without knowing a keyboard shortcut beyond save/copy/paste/undo, and work with tech and codebases that they actively have to fight to be productive. They still exist despite all of the other productivity boons out there.
I'm not saying copilots aren't useful. I'm saying that this "you'll be left behind" rhetoric smacks of crypto esque hype.
Or “boilerplate generators”, I’m working on a laravel project and I think 90% of the code written so far is either generated or copy-pasted (from other files, from the css framework examples, from the docs). The trick is to know what to generate, what to copy, and keep it minimal. And that’s done by pretty much reading the entire docs and some of the framework’s code to understand how the pieces fit together. And what you need to replace/add on top to get your application.
I’m not tethering myself to a distant datacenter just to be able to write some code.
indeed. vi. not vim. not neovim. not vscode or emacs or anything complicated. i can use cat when vi doesn't work. why? because everyone can learn an editor of choice, and that just happens to be the one my fingers know. not because it's somehow better or something.
"while you sit on your ivory pedestal of superiority. Your choice."
ha! if only! i actually sit in a dim room with remnants of caffeine and pizza and wonderful half-broken computers and brand new ideas and code and interesting challenges and discoveries. and friends.
So more of a provenance issue w.r.t licensing than an ideological one.