Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
NetBSD bans all commits of AI-generated code (mastodon.sdf.org)
122 points by LeoPanthera on May 16, 2024 | hide | past | favorite | 87 comments


For those that might not follow the link: “[ai code] is presumed to be tainted (i.e. of unclear copyright, not fitting NetBSD's licensing goals) and cannot be committed to NetBSD”.

So more of a provenance issue w.r.t licensing than an ideological one.


I thought it'd just be logistics. This is highly specialized OS code and I barely trust Copilot to do more than the most simple autocompletes. No way 99.9% of AI gen would pass a proper peer review.



Interesting stance to take. I do wonder how they'll verify it, though. There is no quantifiable way to identify if code is AI written, other than the general 'vibe'.

Will be interesting to see.


I'm not sure I would call it an "interesting stance", as it's barely a stance and not at all interesting IMO . It's what they think is legally required to accept code given their license.

If there's a chance the providence of the code is tainted, the only thing that makes sense that doesn't cause the possibility of immense problems later is to reject it.


Yep, the world isn't always black and white. If I were to use Copilot to autocomplete the code I would've written myself (which is VERY often the case) I don't think it'd fall under this policy unless made overly obvious in some kind of rebellion act


The old ‘You made this?…I made this’ meme comes to mind https://a.pinatafarm.com/440x959/e59eef9670/you-made-thisi-m...


I think the policy means what it says.


Their issue is "unclear copyright, not fitting NetBSD's licensing goals". Non-AI code with incompatible copyright/licensing is already banned even though there's no way to verify its provenance, so there's no change there.


Honesty boxes work with honest players. If they have to reject good code until they can trust the submitter, I'd be fine. Trust is human, and takes years, decades even, to be made.

Supply chain attack is knocking on the door. Maybe less people should work on OS kernels not more?


This is part of the contract all new committers are required to sign. Becoming a NetBSD committer requires membership of the NetBSD Foundation and there's an understanding that trust is expected.


Naturally, they'll use AI to detect it.


If this is not a joke, it might actually be a viable idea. It works pretty well for English text, for example:

https://arxiv.org/abs/2305.15047


And, although I suspect the original comment was made sarcastically, not at all at odds with the reasoning why they would reject submissions with AI generated code.


I think ultimately text is too structured by itself to be able to understand whether it's AI-generated or not. You can easily tell LLMs to write in a different style or even give examples, and they'll happily oblige. Those tools are mostly useful for filtering out the laziest users who haven't tried playing around with the models enough.


I am interested to see what Google has developed with SynthID to be able to watermark text.


if it's good and it works well, then the human using the LLM did a good job, and need not credit their LLM of choice as one would not credit their rubber duck.


This thought process begs the question, why not just review code and deny what's bad, rather than blanket ban any code with LLM involvement? That way you retain the productivity benefits of LLMs, and can still deny flawed code.


Their issue is copyright, not code correctness/quality.


You have a rubber duck that writes code?



That was the first thing I thought. I can take AI generated code and make it look like my own..that's easy


you could even train an llm on examples of ai generated code you've edited to look like your own and use that to rewrite ai written code


You can't verify it. And if you can, you can train on that. And if it's good? Good code is good code.


This is a smart and appropriate move by NetBSD.

We also need lawsuits painful enough that companies stop trying to get away with the automated plagiarism, because that's how it's being used, and they know it.


fitting this fight club quote into context:

Now, should we [care about getting sued]? Take the number of [copilot users] in the field, A, multiply by the probable rate of [a user getting sued], B, multiply by the average out-of-court settlement, C. A times B times C equals X. If X is less than the [opportunity cost of missing out on user training data], we don't [care].


Gentoo did much the same: https://news.ycombinator.com/item?id=40038372 / https://wiki.gentoo.org/wiki/Project:Council/AI_policy

I'm only aware of these particular projects, but I imagine there are others. Anyone know of any FOSS projects that have explicitly said they are okay with AI-generated code?



Licensing which is already a contentious subject within the open source community just got another layer of confusion. I agree with their stance here currently but it's something that needs to be addressed as LLM coding helpers are becoming the norm.


It's a good stance to avoid being sued somewhere down the line. I don't think it's really about taking an ethical position on the matter. I could be wrong though, I've never met the fine folks at NetBSD.

Like others have pointed out, it's nearly impossible to police, and to what degree. Boilerplate code is boilerplate code for example, and whether you were assisted in writing it, or you wrote every bytes through your fingers is irrelevant, as it's an obvious continuation.

At this stage, you just have to assume that every publicly available code, regardless of the license, has been gobbled up by LLM training data, maybe multiple times over, and the cat is out of the bag. You might just want to protect your IP by being clear about keeping your hands clear of it all.


This is a very interesting policy. Where to draw the line, though? I.e. how advanced can an auto-complete be in order it to be considered "AI"?


They point to copyright/licensing concerns, which I do understand considering the way datasets for most currently popular LLMs have been sourced, i.e. large-scale crawling of any accessible source. In that light, most smart code completion offerings should be fine since they historically did not rely on as broad and widely crawled a data set and were generally offered with clear licenses attached. Also, the number of lines these provided historically was lower, which appears to be part of their concern.

In the same light, I feel that LLMs that have been solely and verifiably trained on M̶I̶T̶ MIT-0 or 0BSD should be fine, and this may just be a temporary decision until those become more commonplace.

In general, though, this will be hard to enforce, but I do understand why they took this stance considering the current flock of LLMs and their training data. Drawing a line in the sand on licensing, even if they cannot truly ensure this with absolute certainty, is understandable and feels appropriate for them, especially considering LLMs have now been in common use for coding tasks for multiple years, so this does not seem to be a knee-jerk reaction.


Good points mostly.

Why did you say number of lines appeared to be part of their concern?

The MIT license requires attribution. You meant MIT-0 maybe?

The old requirements relied on people being honest also.


Thanks. The number of lines aspect goes back to what GNU considers "legally significant" [0]. They are of course not the be-all and end-all in this regard, but felt like a decent reference point when making the post.

> You meant MIT-0 maybe?

Absolutely, my mistake.

[0] https://www.gnu.org/prep/maintain/html_node/Legally-Signific...


Given that it's driven by copyright, the limit could be taken to be 15 lines, in line with the FSF's copyright guidance.

https://www.gnu.org/prep/maintain/html_node/Legally-Signific...


The post says

> code generated by a large language model or similar technology (e.g. ChatGPT, GitHub Copilot)

so I suspect it's actually pretty unambiguous; I've never seen an LLM that was plausibly mistakable for any other technology. (Though if this is merely my ignorance, please suggest an example)


I can understand why they made this policy but it feels a bit like the “Red flag laws” requiring motorcars to have a person with a flag walking in front.

There will be a time in not-so-distant future when people look back at this kind of policy as silly.


i doubt it. it's easy to draw an analogy to early cars, or any kind of tech. revolution and say: "ahah, these folks are just like the Luddites, pbththth"

this is fundamentally different. this is a machine, a mathematical equation, that converts enormous amounts of history and facts into a facsimile of creation, thought, expression. it does it convincingly in very small doses, but it doesn't do it all that well. unless your task is to regurgitate something relatively mechanical.

a car was something new. a mechanical horse. it wasn't some distillation and averaging of every extant horse into some "perfect" mutant horse. it was genuinely new. it also wasn't a mirage or a hallucination.

AI is not a creative force, rather reductive. it's at best a tool. so if you want to copyright something or you want to contribute to NetBSD, sure, that's fine. but copy-paste of someone else? no matter what the underlying process, it's still wrong.


Hey now, the Luddites were not anti technology! They were highly skilled laborers who wanted to combine their expertise with the autoloom to increase overall throughput and split those gains with the factory owners.

The factory owners said, "lol, nah" and fired them all to hire cheaper labor to run the autolooms so they could keep the increase in profitability to themselves.

The Luddites were rightfully pissed and wrecked shit. The factory owners, had they been less greedy, could have kept the Luddites on board.


It’s a tool. Just like a car. Not some poetic philosophy construct. Don’t overthink it.


but it's a tool for stealing other's IP, not for noisily scooting from place to place

it's not Luddite behavior for saying "no, we don't want to do that"


HN: Aaron Swartz is a hero and all knowledge should be free!

Also HN: AI is evil because it's stealing copyrighted works!


me: Aaron Swartz is a dick and shouldn't have deprived us of himself and vice versa. he'd have been a free man 10 years ago. (not judging, just sad)

also me: AI is evil because it tricks people into giving up their creativity, which isn't the boring part, it's the real part

also also me: wtf? why bring that up?


> “Red flag laws” requiring motorcars to have a person with a flag walking in front

With the amount of people getting killed by careless drivers, we should bring those back, at least in cities.


> There will be a time in not-so-distant future when people look back at this kind of policy as silly.

Only if copyright laws are abolished, or stop being enforced completely.


Laws are eventually consistent with reality. There’s not many places that ban cars, except for tiny islands like Sark and that town in Switzerland.


Small note: Zermatt (the town you are probably talking about) is not the only car-free one in Switzerland. There are quite a few others[0], 10 towns in total, according to this website.

[0] https://urbanaccessregulations.eu/countries-mainmenu-147/swi...


For the present though, copyright and ownership when it comes to generated content is very much unsettled.


"Not-so-distant" indeed. IMHO this sounds just like banning people from using spell checkers when writing documentation. Legal paranoia aside, it's completely untenable in even the short term.


I'd assume any AI-generated code simply add the following to each file in the repo "// (c) Microsoft 1977-2014 All Rights Reserved."


This seems like the only reasonable choice while law is not yet settled.


I've found Copilot to at least be useful as an autocompletion tool. I write a method with a name that forms a pattern and then start writing a method which is just copypasta with a few things changed and Copilot is often sufficient smart enough that hitting <tab> really will save me some typing. Don't see how that's bad, or how anyone in the world is going to be able to detect that I'm using it.

[You could argue that it is bad because the pattern may change, but not having to type it out allows me to think about that instead of the typing -- and a super common bug for me to write is where I copypasta something X times and forget to fix a detail in one of them]


Good. If you don't understand the code you're submitting, you shouldn't be submitting it. If you do, then you're unlikely to submit code that looks like it's AI-generated anyway.

The stated reason is licensing, but after seeing some of the crap code that AI-crutch-coders have tried to pass as worthy in some other projects I've seen, it's good to see more pushback against drowning in mediocrity.


Hacktoberfest all the year!


This is a wise short-term solution to prevent potential legal exposure. Long-term, I think we will need to figure out how to deal with IP in AI-generated code.


instead of a blanket ban, it would be better to formulate detailed guidelines for AI-assisted programming, clarifying under what conditions such tools are allowed or prohibited. Developers can be required to conduct rigorous manual review of AI-generated code and make special annotations when submitting, rather than broadly labeling it as "defective". Unconditionally rejecting AI-generated code may miss out on the benefits of technological progress.


They said tainted. Not defective. And they explained it meant unclear copyright. The guidelines you want will be written by legislatures and courts.


Their stance isn't that AI-generated code is "defective", it's that its copyright and licensing status is unclear, and they don't want to be in legal hot water some years down the line because they can't be sure of the provenance of the code they commit.


I will be honest -- I have done little work or used chatgpt, co-pilot, or other [ai code] generation tool.

I can understand why they have done this, though.

I mean, any question you ask is going to be logged as well as the output/reply. I do wonder if there are some legalities to this decision not just the quality of code being (simply) copied and pasted into projects.

For example, lets say I create a game and its 90% code from chatgpt, and I sell the game on steam for some pennies. Could the company behind chatgpt claim some kind of ownership... even if I modified the generated code most of the time? Afterall, they can see the questions I have asked, and the results generated. I am sure they could figure out the projects its for and investigate, right?

Obviously free/opensource projects is a little different... but there could be other legal factors unless they specify "we do not accept ai generated code" or similar.

It is certainly something I would put in place for my own public projects.


They're free to sabotage themselves and miss out on this new wave of productivity, of course...

I wouldn't want to ever work without Copilot again. So frustrating to not have my magic autocomplete when I have to make some changes in a basic editor.


I'll really don't understand all the people flexing their copilot subscriptions. It's kind of a huge self own... Congrats, the software you write is banal enough to be AI generated.

It's like all the kids who use it to write their papers. Great, you didn't get caught, your instructor agrees that you are the worlds most average and uninspired writer.


They flex their copilot subscriptions, but they are not flexing how they’ve done advent of code with common lisp. They flex how they can chat with a book, but not what they’ve learned from it. They flex how they can generate pretty pictures, but they can’t say what it means to them.


This sounds like you don't understand what Copilot is. It doesn't write software from scratch. It's just a fancy autocomplete.


I've used it, it's absolute trash for anything but boilerplate yavascript.


It seems to do pretty well with zsh, go, python, and rust when I use it. Even for more complex tasks - but "context-aware universal snippets/boilerplate suggestion" already sounds pretty valuable, doesn't it?


If the rate of change in the NetBSD codebase was 1/10th of what it is now, I'd be content. The idea that LOC or speed of change is a good measure for a 40+ year old operating system doesn't work for me.

You should get committer flag in NetBSD because of how LITTLE new work you bring to the table, and because of how MUCH code you REMOVE.


Then let me remove posix semaphores and replace it all with futexes. It'd be great to not need a file descriptor every time one of my threads wants to wait on a mutex or condition variable.


If that's your first target for removal, I'd wager a large amount of money that you're [unfamiliar](https://man.netbsd.org/compat_freebsd.8) with the NetBSD code base.


I only really have experience talking directly to the kernel abis, i.e. the stuff in netbsd/sys/kern/syscalls.master. I'm using sem_open / sem_wait / etc. as my futex replacement. What abi should I be using? https://github.com/jart/cosmopolitan/blob/master/third_party...


It can be useful just for experimentation, too. You don't have to commit all the code it writes.

Most of the time it just feels like an autocomplete that is actually useful, it's scary good at writing the rest of the line exactly the way I was going to.

Anyway, your argument seems much more reasonable than the original post - they're just faffing about copyright which is completely pointless since Microsoft already declared they'll take care of any copyright lawsuits related to Copilot.


> Microsoft already declared they'll take care of any copyright lawsuits related to Copilot.

How very kind of them!

edit: it's almost like you, as a user, get a taste of having so much money youre legally untouchable!


> it just feels like an autocomplete that is actually useful

And a lot of us have been saying these LLM's are nothing but advanced autocompletes... and maybe that isn't a bad thing?


Microsoft declared they will assume liability for commercial customers who can prove they complied with all relevant requirements. You just showed why a broad policy is safer.


If Copilot significantly increases your productivity, you're either:

- statistically, a below average programmer, or - a really, really, slow typer.

Writing code is the easy part. Copilot, and transformers in general, are only able to generate "statistically average ish" output. A statistically average ish coder is also pretty bad at coding (this stands to reason, or else programmers wouldn't complain about how bad every other programmer's code is). So if Copilot makes you more productive, that's a bit of a red flag.

If, on the other hand, you're a really slow typer, then unfortunately you're out of luck, because nobody has figured out a superior input method to pressing buttons. I think in this case, Copilot is an OK solution, but you'll find yourself correcting most of the code it writes anyways.


It's not really that clear that LLMs would be particularly useful at generating the type of code someone working on NetBSD would find useful. They don't really seem to excel at writing memory safe(ish) low-level code or solving any novel problems

(and for boilerplate there isn't any fundamental difference between copy-pasting it from ChatGPT or anywhere else copyright wise since it's practically identical and unverifiable)


Little to early to praise the benefits.

Maybe we just get more of the same errors in more code because they all copy from the same faulty AI.


/s ?


I'm guessing you're still writing every line of character of code yourself in emacs or notepad or something like that?

Yes, copilots are really helpful, even if you do know what you're doing.

Or just wait to get left behind, while you sit on your ivory pedestal of superiority. Your choice.


If you think that what sets you apart from other developers is an AI autocompleting some of your code for you, then I'm not sure what your value proposition as a developer is. Typing the code is the easy part.


It's also the boring part, so we might aswell automate it, right?


I have no problem with that. But let's not kid ourselves that 'automating' a bit of typing elevates a developer to some godlike level. A good chunk of regular developers point and click their way through their day, use dogshit editors without knowing a keyboard shortcut beyond save/copy/paste/undo, and work with tech and codebases that they actively have to fight to be productive. They still exist despite all of the other productivity boons out there.

I'm not saying copilots aren't useful. I'm saying that this "you'll be left behind" rhetoric smacks of crypto esque hype.


it's already automated with a gizmo called "snippets".


Or “boilerplate generators”, I’m working on a laravel project and I think 90% of the code written so far is either generated or copy-pasted (from other files, from the css framework examples, from the docs). The trick is to know what to generate, what to copy, and keep it minimal. And that’s done by pretty much reading the entire docs and some of the framework’s code to understand how the pieces fit together. And what you need to replace/add on top to get your application.

I’m not tethering myself to a distant datacenter just to be able to write some code.


Yeah, not having to be attached 24-by-7 to some (pay-a-subscription) technology to just do a simple job is a great point.


I always find this ‘left behind’ thing funny - as if whipping out a credit card for an ai tool is a skill only gained by early adopters.


indeed. vi. not vim. not neovim. not vscode or emacs or anything complicated. i can use cat when vi doesn't work. why? because everyone can learn an editor of choice, and that just happens to be the one my fingers know. not because it's somehow better or something.

"while you sit on your ivory pedestal of superiority. Your choice."

ha! if only! i actually sit in a dim room with remnants of caffeine and pizza and wonderful half-broken computers and brand new ideas and code and interesting challenges and discoveries. and friends.

i did choose.

correctly.


This is pretty dumb.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: