Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The second tweet in the thread seems badly off the mark in its understanding of copyright law.

> copyright does not only cover copying and pasting; it covers derivative works. github copilot was trained on open source code and the sum total of everything it knows was drawn from that code. there is no possible interpretation of "derivative" that does not include this

Copyright law is very complicated (remember Google vs Oracle?) and involves a lot of balancing different factors [0]. Simply saying that something is a "derivative work" doesn't establish that it's copyright infringement. An important defense against infringement claims is arguing that the work is "transformative." Obviously "transformative" is a subjective term, but one example is the Supreme Court determining that Google copying Java's API's to a different platform is transformative [1]. There are a lot of other really interesting examples out there [2] involving things like if parodies are fair use (yes) or if satires are fair use (not necessarily). But one way or another, it's hard for me to believe that taking static code and using it to build a code-generating AI wouldn't meet that standard.

As I said, though, copyright law is really complicated, and I'm certainly not a lawyer. I'm sure someone out there could make an argument that Copilot is copyright infringement, but this thread isn't that argument.

[0] https://www.nolo.com/legal-encyclopedia/fair-use-the-four-fa...

[1] https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_...

[2] https://www.nolo.com/legal-encyclopedia/fair-use-what-transf...

Edit: Note that the other comments saying "I'm just going to wrap an entire operating system in 'AI' to do an end run around copyright" are proposing to do something that wouldn't be transformative and therefore probably wouldn't be fair use. Copyright law has a lot of shades of grey and balancing of factors that make it a lot less "hackable" than those of us who live in the world of code might imagine.



Google copied an interface(declarative), not code snippets/functions(implementation). Copilot is capable of copying only Implementation. IMO that is quite different and easily a violation if it was copied verbatim.


If you can read open source code, learn from it, and write your own code, why can't a computer?


I think the core argument has much more to do about plagiarism than learning.

Sure, if I use some code as inspiration for solving a problem at work, that seems fine.

But if I copy verbatim some licensed code then put it in my commercial product, that's the issue.

It's a lot easier to imagine for other applications like generating music. If I trained a music model on publicly available Youtube music videos, then my model generates music identical to Interstellar Love by The Avalanches and I use the "generated" music in my product, that's clearly a use that is against the intent of the law.


Many behaviors which are healthy and beneficial at human-level scale can easily become unhealthy and unethical at industrial automation scale. There's little universal harm in cutting down a tree for fire during the winter; there is significant harm in clear-cutting a forest to do the same for a thousand people.


Exactly. This comes up with personal data protection as well. There's no problem in me jotting down my acquaintances' names, phone numbers, and addresses and storing it in my computer. But a computer system that stores thousands of names, phone numbers, and addresses must get consent to do so.


Because computers did not win a war against humans, so they have no rights. Only their owners have rights protected.


The AI doesn't produce its own code or learn, it is just a search engine on existing code. Any result it gives exists in some form in the original dataset. That's why the original dataset needs to be massive in the first place, whereas actual learning uses very little data.


If I read something, "learn" it, and reproduce it word for word (or with trivial edits) even without referencing the original work at all, it is still copyright infringement.


As the original commenter said, you have the capability for abstract learning, thought, zand generalized learning, which the "AI" lacks.

It is not uncommon to ask person to "explain in your own words..." - as in use your own abstract internal representation of the learned concepts to demonstrate that you have developed such an abstract internal concept of the topic, and are not merely regurgitating re-disorganized input snippets.

If you don't understand the difference...

edit: That said, if you can create a computer capable of such different abstract thought, congratulations, you've solved the problem of Artificial General Intelligence, and will be welcomed to the Trillionaires' Club


The AI most certainly does not lack the ability to generalize. Not as well as humans, but generalization is the key interesting result in deep learning, leading to papers like this one: https://arxiv.org/abs/1710.05468

The ability to generalize actually seems to keep increasing with the number of parameters, which is the key interesting result in the GPT-* line of work that Copilot is based on.


I've seen some very clever output from GPT-*, but zero indicating any kind of abstract generalized understanding of any topic in use.

Being able to predict the most likely succeeding string for a given input can be extremely useful. I've even used it with some success as a more sophisticated kind of search engine for some materials science questions.

But I'm under no illusions that it has the first shadow of a hint of minor understanding of the topics of materials science, nevermind any general understanding.

It seems we're discussing different meanings of the word "generalize".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: