"Finally, to receive copyright protection, a work must be the result of at least some creative effort on the part of its author."
So the question is, who is the author and if it was machine generated, did it take creativity?
The algorithm took a lot of creativity, but did the output that is being copyrighted?
I mean I would come down on the side of yes, but it makes for an interesting case.
In the US, this question has been settled since at least the 1990s (e.g., in the context of videogames). The output of algorithms is, in general, copyrightable, although there are some rather common-sense exceptions.
The question isn't whether you can copyright the output of an algorithm. The more salient question, in my mind, is whether the output of ML algorithms belongs to the owner's algorithm or to the owner of the training set.
Not all that different from a pop song made by editing together licensed samples, no?
In that case, the song is certainly a derivative work of the samples, and so the producer of the song needs to get derivative-works-allowed licensing from the samples’ authors (which is what you must necessarily get when buying samples from a sample library, for them to be of any use at all.) The produced song is then its own work with its own copyright. Sometimes, larger samples (like reused vocal performances) require payment in, essentially, “equity”—a percentage of the song’s royalties are transferred as royalties to the sample. But in most cases, the sample is purchased for a flat fee, and there is no ongoing relationship between the revenue of the song and the revenue of the sample.
Is anything different if you replace “song” with “news article” and “samples” with “training set”?
We can have the output for the cost of the energy, or we can perpetually (AIs never die!) pay tax to a wealthy capitalist and have the same output; why is the latter better?
Yes, we reward AI makers by giving them copyright protection over their work, we don't - and shouldn't in my personal opinion - reward machines. Why would we, what's the benefit in human terms? There's no moral hazard in turning a machine on and off when we need creative works that the machine is programmed to make or don't need more of such works.
Copyright protections that serve the wealthy owners of AIs whilst they simultaneously undercut creative people producing simulated culture (cheaper than actual culture) would not serve the demos.
One possible legal theory: because the algorithm was trained on a text corpus upon which the algorithm's owner has no legal claim.
In this particular case, I don't think that theory would hold much water.
However, consider, e.g., a model that produces encyclopedia entries and is trained on a half dozen existing encyclopedias. IMO, if that model is using techniques similar to SoTA and isn't producing utter garbage, then the owner of that model should have a very difficult time claiming that the output of their model is anything more than a sophisticated round-about way of copy/pasting from existing encyclopedias.
But still, in that case, the output is still covered by copyright. It's just that the owner of the training set -- not the owner of the algorithm -- is the one with the valid claim to copyright.
The same can be said about human writers: they learn to write based on thousands of "training examples" - the articles and books they read thorough their life.
Or rather, Who knows? Maybe. But certainly, at least today, a SoTA model generating a quality encyclopedia certainly is not doing what human writers do, and is certainly effectively copy/pasting.
Maybe in 50 years -- or 10 years with a major breakthrough on the level of general relativity -- that statement might be true. but it's certainly not true of today's deep NLP systems.
The exact same set of facts, that were obviously reported originally by a single individual, then rearranged, reworded, and republished by 100's of "reporters"/"bloggers", (sometimes) with an attribute of origin.