I don't fundamentally disagree with you, but what you are saying doesn't hold wa...

carom · on June 1, 2023

There is quite a bit of precedent for "making copies of digital things is copyright infringement". Look at lawsuits from the Napster era. [1]

What makes the use improper? Licenses. Terms of service. Mostly licenses though. For example, all the images on Flickr that were uploaded under Creative Commons licenses (e.g. non-commercial) have now been used in a commercial capacity by a company to create and sell a product.

Similarly, code is on Github with specific licenses with specific terms. Copilot is a derivative work of that code, the license terms of that code (e.g. GPL, non-commercial) should extend to the new function that was derived from it.

The reason I mention competition with the original is the fair use test (USA). When courts decide whether something is fair use they consider a few aspects. Two important ones are whether it is commercial, and whether it is a substitute for the original. When art models output something in the style of a living artist, it is essentially a direct substitute for that person.

Sure, I can make a shirt with Spider Man on it and give it to my brother, but if a company were to use what I made or I tried to sell it, I would expect a cease and desist from Disney.

Training the model may very well be a copyright issue. The images have been copied, they are being used. Whether that falls under fair use will likely be determined on a case by case basis in court. I do not believe closed commercial models like Copilot or Dall-e will pass a fair use test.

There is a lot of money involved here though, so we will need to wait for years before we have answers.

1. https://www.theguardian.com/technology/2012/sep/11/minnesota...

bastawhiz · on June 1, 2023

> to create and sell a product.

This is not model training.

> Copilot is a derivative work of that code, the license terms of that code (e.g. GPL, non-commercial) should extend to the new function that was derived from it.

But the very act of training copilot is not problematic. And in fact, if GitHub never did anything with Copilot, the physical act of training the model is not problematic at all. And that's what at issue here. How Copilot is used is orthogonal to the article.

> Sure, I can make a shirt with Spider Man on it and give it to my brother, but if a company were to use what I made or I tried to sell it, I would expect a cease and desist from Disney.

Yes. And training the model isn't the part where you sell it. It's the part where you make it.

> Training the model may very well be a copyright issue. The images have been copied, they are being used.

What do you think "being used" means here? If I work for a company and download a bunch of text and save it to a flash drive, have I violated copyright? Of course not. If I put that data in a spreadsheet, is it copyright infringement? Of course not. If I use Excel formulas on that text is it infringement? Still no.

And so how can you claim in any way that the creation of a model is anything more than aggregating freely available information?

I don't disagree with you about the use of a model. But training the model is just taking some information and running code against it. That's what's important here.