Hacker News new | past | comments | ask | show | jobs | submit login

But the thing is that we explicitly allow humans to learn and develop their own skills learning from other humans, but we have our own taboos around directly copying peoples work without permission and passing it off as your own. The debate is that copilot isn’t a human, it’s a machine that outputs copied work on a statistical basis.

Humans are allowed to be unoriginal, uncreative, boring, mediocre, and all sorts of things. But they’re not copying whole cloth the way copilot is.




> But they’re not copying whole cloth the way copilot is.

Stack Overflow content is CC-BY-SA 4.0 yet I can bet most corporate codebases include tons of code snippets without a link or citation to the original answer


Don't you have to work pretty hard to get copilot to reproduce snippets verbatim? My understanding is that, while its possible to make copilot reproduce snippets so long as they appear in a large number of files, this basically never happens under normal usage.


This argument doesn't work.

I don't know whether copilot is giving me infringing content or not. I'm always at risk that my question was one of the ones that trigger infringing replies.


Whenever I've used Copilot it never seems to copy whole sections of code. Can you provide examples of this?

From what I've seen it is producing fairly generic boilerplate that has been modified based on the rest of the code in my repo so that it works with the other functions and even incorporates other pieces of my code in the same style that I'm using. The boilerplate aspect makes sense because this would be the most common sequence of tokens that it observed during training. It's somewhat miraculous that it can incorporate code on the fly from my repo. I've never seen anything that looks like a direct copy paste from elsewhere though. If you have a different observation I'd love to see it.


Behold: https://twitter.com/StefanKarpinski/status/14109710611816816...

Probably helps that this is from a codebase that's been forked quite a bit.


Yeah, I wanted an example from a real project not a one file demo. The high fork number and probably also its existence in thousands of other projects likely results in this behaviour if you have no surrounding context.

This is also easily solved by checking the box in Copilot that says not to produce any code matching public code.


Can I circumvent the new OGL revocation by training an AI on 100 copies of the D&D rulebook, and using its output?


You can't even code search in forked repos so maybe forks were excluded (besides commits on top of the fork)?


Most forks probably happened before GitHub existed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: