I don't think it's that clear cut. The functional parts probably aren't copyrightable, only the stylistic ones. It's going to be a mix of courts applying laws in new ways that hasn't been done before and fact specific questions about what actually persisted through the LLM if it goes to court.
I'd be fascinated to see what happens if it does. Both in the analyses that we'd get of what the LLM did to the codebase and on the legal decisions on what the copyrightable creative elements in code actually are.
If I was the author though... there would be no way that I would be volunteering to be a test case like this. Also seems just rude for no reason.
It probably would have been less bad if he had chosen MPL-2.0 or LGPL-2.1-or-later. But he chose MIT, which cuts at the core of the intent of licensing the project with a share-alike license.
Tell me, can I create a copyrighted video that's not GPL licensed using ffmpeg?
Now tell me how creating a rust library using the git test suite is different?
But for the sake of argument: The test suite itself is copyrighted. To the extent the resulting work is a derivative of the test suite it is possibly infringing. For example you might example that the agent would derive variable names, function names, structure sequence and organization of the code from the test suite. It might even copy comments wholesale. Those are copyrightable things. (Which is of course just the first step in analyzing if it is infringement, there would be interesting fair use, de-minimis copying, etc arguments following a conclusion that any of those were copyrighted. A product produced this way definitely could be infringing given the right facts though).
yeah fair - the "The canonical Git source code we're targeting to replicate the functionality of is in the git/ subdirectory." part makes this hard to argue against.
> To the extent the resulting work is a derivative of the test suite it is possibly infringing
It's this bit that I have a problem with. If I run the test, it fails and reports a failure. Now I write code and run the test again. What is the theory there that code that I wrote infringes.
Doesn't infringe upon copyright period, because there's no creative element in that work.
Imagine a more substantial example though. Perhaps you have a test that checks that some file written in a binary format is correct, and gives names (creative elements) to each field of the format that it prints when you mess up the field, and has comments describing why the bytes are laid out like they are (the comments being copyrightable even if the facts they describe aren't), and the LLM copies those field names and comments verbatim... Now it's quite likely that the LLMs work is a derivative of the test suite.
For that assertion in particular I believe I'm practically parroting a ruling by the district court in Oracle vs Google about some extremely simple Java functions that Oracle claimed Google copied. Though I can't say I checked to make sure I'm remembering right.
You're recalling it right, but there's a nice quote from Judge Alsup in that case that talks about this exact situation:
> “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification...”
Here given that this is rust and the original expression is C, the implementations cannot be the same by definition.
I'd challenge you here to think about this in terms of the legal aspects rather than reaching specifically for similarities as similar is often meaningless in the law or contracts when specific acts are codified rather than generalized ones.
I'd say what we're talking about here is probably a fair bit different to modding a game in most aspects.
I haven't followed any relevant cases but I would be surprised if there's any serious dispute that the common methods of modding games generally create derivative works. I think the dispute would be downstream of that as to whether or not the mods are covered by fair use.
If you did it in a loop until the test passed, maybe?
Your result is essentially impossible without the original. With ffmpeg, your result does not depend on ffmpeg specifically - you can use any video creation tool.
Repetition isn't really a factor in deciding whether something is infringing or not - check the copyright law in your jurisdiction. Here if you look specifically at what an LLM's sampling stage is doing, it's choosing non infringing tokens (i.e. rust source code) over infringing ones (i.e. C source code). So it's making an intentional choice to do something similar rather than creating something that has the same expression. That doesn't seem like it's copyright infringement to me.
A GPL tool that processes data doesn't virally transfer the license to its output. Copyrighted ffmpeg code isn't incorporated into the video output. The LLM didn't just conjure up equivalent behavior to git without ingesting the code and transforming it as new output. There is no other behavioral description that would reproduce all needed functionality.
It would be reasonable to point an LLM at these and use them with a basic knowledge of git to produce a rust version of git in a non-infringing manner.
If you did this manually it would take a long time.
Substitutibility probably doesn't apply here in the way you're implying and if it did it would likely be hampered by the 9th circuits findings about transformation in sony v connectix. Arguments here likely would look at rust not having a stable ABI, and hence not being inherently substitutable as a libray (grit-lib), less clear as an executable (grit-cli) on that side
basics of copyright law - the fundamental thing being protected is the expression... is a rust program's expression the same expression as a c program? I'd say generally not.
The test suite could test aspects of the architecture/design of the codebase that are not necessary for interoperability and constitute novel expression of a piece of software in a way that is not at all language specific.
By definition a test suite is about testing interoperability with the test suite. An HTTP test suite should likely test for whether response code 418 is implemented a particular way and while humorous it would still be an interop test no?
Because compilers and LLMs do different things, and what is done matters, so you can't reason by stepping from one to the other.
Compilers don't axiomatically yield derivative works, they simply in practice do because for non-trivial programs they preserve copyrightable elements of the work in the output.
Well compilers are a mechanical transformation and if that were sufficient to free you of IP law then IP law wouldn't work.
An LLM is also a computer program which takes input and produces output related in some way to that input. However I don't think most people would view it as a "mere" mechanical transformation. One could tautologically argue that an LLM blends the user input with the training inputs which is a sort of transformation and further that the LLM itself is a computer program thus it is mechanical in nature. However it should be immediately obvious that such an overly literal interpretation is in danger of subsuming human work as well. Where the boundary lies is an unanswered question.
Related, compilers can pose a problem depending on what the output includes. For example common lisp compilers that aren't under a permissive license are a minefield because regardless of what anyone might say the image that gets output includes (approximately) the full language implementation verbatim in addition to the user's program.
functional parts not being copyrightable means that you can't claim a program is a copyright violation based on the fact it does the exact same thing based on compatibility reasons (you can copy what the program does). E.g. git stores refs in .git/refs, so does grit, that's not a violation. You still can't copy the program.
Yes... and now we get to the fact specific question of "did they copy the program". Or actually the answer to that is plainly "no" - they made something similar from it - and didn't run ctrl-c ctrl-v in an unlicensed manner, but "did they copy the relevant facets of the program into the new similar thing".
No. You're allowed to make a similar tool, the functional elements are not copyrightable. There's a long history, predating LLMs by many decades, of doing this in the software industry.
My use of the word "similar" does not imply here that I think it's obvious that they are "similar" in any copyrightable elements - whether they are or not is one of the interesting questions I think this case would have to resolve.
Incidentally you're also allowed to make similar creative elements so long as they aren't copies and you did so independently... which could actually come up in a case like this (imagine the LLM produced a similar function to some function in the original... but the original wasn't in the context window at the time. Not at all unlikely with code where there often is only one or two natural ways to write something).
I suspect that the issue is more likely that the LLM code doesn't have an author and hence some parts of it can't be licenses, it's less likely that it's infringing on git's copyright for various reasons. (I am not a lawyer, but I do read copyright law for funsies).
> It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements. This can include situations where a human-authored work is perceptible in an AI output, or a human makes creative arrangements or modifications of the output, but not the mere provision of prompts.
What makes you think that's what the article says that it did? There's a lot of specific nuance and it doesn't say that anywhere. In fact it speaks of making a test suite pass only. This is the classic cleanroom bios from specs approach but no need to extract it as the test is available to run and there's nothing in the GPL that suggests that running a test suite infects software that you run it on.
You've read books and they are in your brains corpus. You only infringe copyright if you reproduce the same actual words from the books in your memory (and then do infringing acts defined by copyright laws with that output).
Here that's not happening. The code being produced by the LLM is Rust, not C.