Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Humans are just compression with extra steps by that logic.

There's a fairly simple technical fix for codex/copilot anyway; stick a search engine on the back end and index the training data and don't output things found in the search engine.



If I were to memorize my employer's IP then reproduce it (almost) verbatim and give it to a competitor, then I would be setting myself up for a world of legal hurt.

So yes, it is like how human memory is compression with extra steps.


I dont think that would work very well because there are not infinite ways to succinctly solve most programming problems. In fact the majority of solutions will look exactly the same.

The real solution is very, very simple. Only use opt-in training data. Don't acquire codebases from people who didn't agree to it.


opt-in is complicated.

If I own a repository on github and I have received contributions from other people, or included a .h file from mpv (thing that I have done), do I still have the right to click the opt-in button? I didn't ask the other contributors.

But github is in a position to scan my code and see if there are copy paste bits and disable the opt-in button in that case.

Except they act in bad faith so they wouldn't do that.


> I dont think that would work very well because there are not infinite ways to succinctly solve most programming problems. In fact the majority of solutions will look exactly the same.

Algorithms can't be patented or copyrighted, as they are pure mathematics. If an implementation of an algorithm has no creative content because it is succinct then it likely doesn't deserve copyright.


That feature already exists, you can turn it on here:

https://github.com/settings/copilot

More info:

We built a filter to help detect and suppress the rare instances where a GitHub Copilot suggestion contains code that resembles public code on GitHub. You have the choice to turn that filter on or off during setup. With the filter on, GitHub Copilot checks code suggestions with its surrounding code for matches or near matches (ignoring whitespace) against public code on GitHub of about 150 characters. If there is a match, the suggestion will not be shown to you. In addition, we have announced that we are building a feature that will provide a reference for suggestions that resemble public code on GitHub so that you can make a more informed decision about whether and how to use that code, as well as explore and learn how that code is used in other projects.

https://github.com/features/copilot#what-can-i-do-to-reduce-...


Until this is turned on by default it's not sufficient.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: