Really? You have had multi discussions about fair use and yet you weren't aware ...

lelanthran · on Feb 16, 2023

> Here it is, since you were not aware: "the amount and substantiality of the portion taken".

Yes. If you use only 1% of a work, then you are not using a substantial or large amount of the work and it is considered fair use.

But training doesn't use 1% of the work, it uses the entire work. No one is using 1/100th of an individual image to train, nor are they using 1/100th of a codebase to train, etc.

They're using entire individual works, and all those factors that are applicable are evaluated collectively, not in isolation.

Besides, all those factors become irrelevant if "...On the other hand, it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticize, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy ." (https://en.wikipedia.org/wiki/Fair_use)

It's hard to claim that the owners of ChatGPT and similar are not trying to supercede the works it is fed as input. They state as much everywhere.

> Instead, what I am saying, is that if there is a model, trained on millions and millions of images, the output of the model is fair use, because it is not taking significantly from your individual work.

Whether the output from the model is fair use or not is irrelevant to whether the input falls under fair use.

I must say your take is certainly novel, and no, I haven't seen anyone try to make that claim before; each time I have asked I have gotten a different answer.

I think a better case to be made is that ChatGPT is transformative, which would make it fair use.

If you read through the entire wikipedia article I linked above you'll see that:

1. All the factors are evaluated collectively, in relation to each other, not individually.

2. The burden of proof lies with the defendant, not the claimant. IOW, the court starts off with "prove that the use is fair", and not "prove that the use is not fair". From wikipedia "This means that in litigation on copyright infringement, the defendant bears the burden of raising and proving that the use was fair and not an infringement. "

In short, when the license says "not to be used as training data or learning data for any machine model", and it is ignored, the defendant is already in violation. If sent a cease and desist with a request for royalties, the defendant is already presumed to be in violation", and will have to prove fair use, which will (in order of factors) mean that they have to answer "No" to all of the following questions in court:

1. Is the output product being used for commercial purposes and/or profit?

2. Is the input work a freely available fact, or is of a nature that it's in the public interest to reproduce.

3. Is the proportion that is used of the input work insignificant (typically less than 1/100th) of the input work?

4. Does the output work harm the market for the input work?

The owners of ChatGPT are unable to answer YES to any of the above.

stale2002 · on Feb 16, 2023

Gotcha, so then can you give me a date/time limit on when I am allowed to make fun of you, if zero people lose court cases on this?

I am more than happy to put this on my calendar here.

I just need an exact date, on when I can come back to your comments, and make fun of you for being completely wrong, when nobody loses any court cases on this topic.

Give me a date, and please describe specifically the exact words I am allowed to use to describe someone who would make such a mistake.

And if you refuse to give an exact date, I will assume that it is both the dates 6 months, and exactly 1 year from now, and I will check in with you on exactly those dates to see if you will admit that you were wrong (spoilers... You won't!)

lelanthran · on Feb 16, 2023

> Gotcha, so then can you give me a date/time limit on when I am allowed to make fun of you, if zero people lose court cases on this?

Well, people have already lost fair-use defenses because they failed on ONE of the four factors. Some cases were lost due to commercialisation, some were lost because too much of the original work was used, some were lost because of monetary or distribution harm to the original author.

So, when you say "like this" you mean "commercial mass harvesting of copyright works to produce a new work"?

> I just need an exact date,

The onus is on the AI owners to prove fair use, and you want a date when that defense will lose?

Just how new are you to copyright and law? Who knows when court cases end? We cannot tell in advance when cases (hearings) may actually start (can be up to two years, sometimes), when they will actually end (another two years?).

How about this instead - we wait for the first judgement that rules on a fair use defense for training machine models?

We set a specific wager, I propose "Fair use is not a significant defense against usage of works to train machine models". That's binary - there's no shades of grey there.

I'm betting on that statement being true, you're betting against that statement being true.

Loser has to post in one of HN or r/programming a link to the first post in this thread, along with a small and short exercise in humility, admitting, "Yes, I was wrong about this call that I made in a public forum"?

It's a friendly wager, if you are willing I'd put it up on my site somewhere (or a google spreadsheet, which is better) so you and I can both update it regularly with suits-in-progress and suits-completed, excluding appeals (otherwise this wager will take multiple decades to settle).

Happy? DM (or email me - my HN username at gmail) and we can both save this link to our emails :-)