As programmers we take pride in being DRY. Copilot is helping us not reinvent th...

thesuperbigfrog · on Nov 17, 2022

>> As programmers we take pride in being DRY. Copilot is helping us not reinvent the same concept 1000 times.

That's what libraries are for.

Copilot is just copy / paste of the code it was trained on.

When the code it was trained on is later discovered to have CVEs, will it automatically patch the pasted code?

With a library, you can update to the patched version. Copilot has no such feature.

lolinder · on Nov 17, 2022

> Copilot is just copy / paste of the code it was trained on.

Every time I hear someone say this, I hear "I've never really tried Copilot, but I have an opinion because I saw something on Twitter."

Given the function name for a test and 1-2 examples of tests you've written, Copilot will write the complete test for you, including building complex data structures for the expected value. It correctly uses complex internal APIs that aren't even hosted on GitHub, much less publicly.

Given nothing but an `@Test` annotation, it will actually generate complete tests that cover cases you haven't yet covered.

There are all kinds of possible attacks on Copilot. If you had said it can copy/paste its training data I wouldn't have argued, but "it just copy/pastes the code it was trained on" is demonstrably false, and anyone who's really tried it will tell you the same thing.

EDIT: There's also this fun Copilot use I stumbled across, which I dare you to find in the training data:

    /**
    Given this text:
 
    Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.

    Fill in a JSON structure with my name, how much money I had, and where I'm going:
    */

    {
        "name": "Ishmael",
        "money": 0,
        "destination": "the watery part of the world"
    }

visarga · on Nov 17, 2022

It can even read an invoice, you can ask it "what is the due date?" It's a system that solves due date and Ishmael questions out of the box. And everything in-between.

thesuperbigfrog · on Nov 18, 2022

>> It can even read an invoice, you can ask it "what is the due date?" It's a system that solves due date and Ishmael questions out of the box. And everything in-between.

That's cool.

But emitting copyrighted code without attribution and in violation of the code's license is still copyright infringement.

If I created a robot assistant that cleans your house, does the shopping, and occasionally stole things from the store, it would still be breaking the law.

visarga · on Nov 18, 2022

> occasionally stole things from the store

It's fascinating to see how stretchy the word "steals" is nowadays. You can make anything be theft - copying open online content and sharing? theft, learning from data and generating - also theft. Stealing from a physical store - you guessed it.

thesuperbigfrog · on Nov 19, 2022

>> It's fascinating to see how stretchy the word "steals" is nowadays. You can make anything be theft

Theft has a definite legal meaning. So does copyright infringement.

The court can decide if it is copyright infringement or fair use:

https://githubcopilotlitigation.com/pdf/1-0-github_complaint...

throwaway675309 · on Nov 18, 2022

While I do enjoy everybody acting as armchair lawyers.... until we get an actual legal ruling, the general consensus seems to be that it is sufficiently transformative as to be considered fair use.

thesuperbigfrog · on Nov 17, 2022

>> If you had said it can copy/paste its training data I wouldn't have argued, but "it just copy/pastes the code it was trained on" is demonstrably false, and anyone who's really tried it will tell you the same thing.

So if "it could commit copyright infringement, but does not always do so" is good enough for your company's legal review team, then go for it.

visarga · on Nov 18, 2022

Has anyone tried to see how similar is their manually written code to other codes out there? I bet small snippets 1-2 lines long are easy to find. It would be funny to realise that we're more "regurgitative" than Copilot by mere happenstance.

thesuperbigfrog · on Nov 19, 2022

Will the court believe that Copilot created an exact copy of Tim Davis's code "by mere happenstance"?

https://twitter.com/DocSparse/status/1581461734665367554