It also seems like a self-evident ruling. I don't see much difference, except a matter of degree, between computer-generated prose in this case, and something like LaTex. Page composition (graphic design) is generally covered by copyright . In fact just about every creative content has some form of technological intermediary. An "AI" algorithm, or even something more basic, performs touch-up work on a photo. The photographer didn't do it, they just hit the button, just like whoever setup the system for Tencent. So I just don't see how courts could hold that copyright wouldn't apply in a case like this when so much content depends on computational generation of some sort.
Our society is built on the assumption that attackers are inefficient or dumb.
On the other hand, if Tencent owned the corpus, that isn't really an issue. Similarly, there have been automated finance articles for more than a decade using knowledge extraction algorithms against things like earnings reports, and copyright of those reports has not been an issue. Admittedly, that may only be because those releasing the reports do so in part to get the word out, and so they want reporting to be done on them. Even if they had a copyright claim that it wasn't fair use, they may not have the incentive to enforce it.
Regardless, this all opens up some fascinating discussions of the agency of AI, what constitutes true AI vs. a simple program or algorithm, assignment of who actually would own the copyrights of computer generated content... I think it's going to take some time for the law to catch up to technology on these topics.
Bots don’t care about those incentives. Once built, that’s all they do.
Thanks for such an insightful idea! I'm not 100% sure which side I come down on, but it's very thought provoking, the sort of discussion that draws me in to HN.
I'd say that depends. If something is created by AI randomly, owners of the AI didn't create it. It's all about the level of autonomy. If AI simply follows instructions and is a dummy tool - then yeah. But if it does stuff on its own and isn't a conveyor of the owners creative intent - then owners of the AI didn't make it, they shouldn't own the result.
And besides, if you claim it's doing it as "part of the job" and that's why owners of the AI should get it, then give the AI job related rights first as well ;)
If I use the shell to generate a very long random number, I'd expect the output to belong to me.
Secondly, no, it doesn't depend on who run the AI. Because running the AI isn't called creating the result. The one who run it didn't provide creative input. How is it different from telling someone else "go create"? You didn't create the result either.
Another point - copyright by definition is given as an incentive to create. I don't see much need for an incentive to press a button and do nothing after that.
My understanding is that's not true, see: https://en.wikipedia.org/wiki/Illegal_number. I agree it's stupid but not up to me.
> Also, consider a generative algorithm, that creates every possible combination. Do you expect to potentially own everything that wasn't yet created? Because at some point, that algorithm will get to it.
Well yes, given infinite time, you could write a program to generate every possible sentence. If I saved those sentences on an infinitely large hard drive, yes, I would expect to own the rights to each one.
Note, that doesn't mean I'd be able to sue anyone else who ever speaks. The color of bits  matters.
IMO, the alternative of assigning copyright to the algorithm's creator gets really messy. If an author uses GPT-2 for writing inspiration, and at some point, they copy out a GPT-2 paragraph verbatim, does the author not own that paragraph? Same question for music which incorporates AI-generated samples (or just randomly generated samples).
No, you should not expect it. Because the process wasn't creative. And there is no incentive needed for it either, because it's automated. Check again the definition of copyright, and why it's given in the first place. That should be the way to analyze, whether it's applicable or not.
It's not different from saying that monkey with a camera can have copyright, or someone giving monkey a camera gets it, if monkey makes a picture.
I still think the law has much gray are in it though. In between Monkey and Human Photographer with a finished, edited photo, there are many shades of gray with any number of technology-mediated transformations of the work. AI-driven methods of sharpening a photo, for example . Why should the photographer own the resulting AI-adjusted photo's copyright? Certainly they own the original, but it was AI that made the end result. Not unlike a writer making a corpus of text for Tencent's algorithm to learn how to write its own articles, I think.
I'm asking the question, because I honestly don't know: Where would/should the line be drawn?
Do any of us have any clue how Chinese copyright law works? It's probably not the same as American copyright law. What are the requirements set out by Chinese law to be copyrightable?
> The TRIPS Agreement [which the USA is part of] requires that copyright protection extends to databases and other compilations if they constitute intellectual creation by virtue of the selection or arrangement of their contents, even if some or all of the contents do not themselves constitute materials protected by copyright.
So I don't quite understand whether data sets like OpenStreetMap or Google Maps are copyrightable in the USA. (Note that this concerns the underlying data, not the graphics like map design or street/satellite pictures.)
1. Copyright is generally held in "works of authorship", which is to say, the work of an author. Whether or not a computer program (AI or otherwise) is an author is the first question in this case.
2. Copyright law provides protection against unauthorised copying of a work. Independent creation is not a copyright violation, for all those who've suggested composig all possible (or for the more efficiently-minded, probable) works of a given length. If another party independently creates a similar work (in whole or part), there is no copyright violation.
3. Copyright persists only in expression and not in the meaning or function of a work. This is in particular contrast to patent and trade secrets law.
The Chinese ruling, as described, fails multiple tests and would not qualify under present general copyright law. Though the possibility of the law changing given changing uses and practices does exist. I don't expect in the near term that this case will have much significance.
By way of highlighting the ... interesting dynamics ... posed by increasing use of AI in creating content -- various systems creating de novo faces, images, audio, or video, as well as text, as examples -- does give some pause. What are the implications of creating such content via AI where the content itself is entirely outside the scope of copyright law?
1. US-centric, though generally applying to WIPO / Berne rules. Not legal advice.
2. 17 USC 102(a) https://www.law.cornell.edu/uscode/text/17/102
So they simply do not apply even broadly to a case that was processed in China, especially not in the level of deep US-centric scrutiny you applied to it.
That said, I don't see any of the "multiple test" this ruling fails. It has simply posited that verbatim copying of the article published on one website to another website without prior agreement is still copyright infringement, regardless of the fact that the article itself was generated by software/AI. Nothing more, nothing less.
I find it hard to imagine that a court in any other country would rule any differently.
The actual legal code conforming to treaty requirements is a matter for countries to write and adopt, but generally that's occurred.
So, disagreement in part with your categorisation of WIPO/Berne as "guidelines", which I feel grossly understates their status.
China is a member of WIPO: https://www.wipo.int/members/en/
The Berne Convention of 1971 (I'm fairly certain this is supersceded in at least part) does not appear to include authorship or originality in its properties of covered works:
The expression "literary and artistic works" shall include every production in the literary, scientific and artistic domain...
Though article 3 provides that protections apply to "authors":
Now as an employee my ip belongs to my employer, but I am paid for this.
A robot couldn't enter into a contract without 1) being paid (consideration) or 2) the intention to create a contract. Therefore if robots are to assign copyright they are going to need training about contracts.
In the US, there are three factors that govern whether or not someone is an employee. These are:
> Control by the employer over the work. For example, the employer determines how the work is done, has the work done at the employer’s location, and provides equipment or other means to create the work.
> Control by employer over the employee. For example, the employer controls the employee’s schedule in creating the work, has the right to have the employee perform other assignments, determines the method of payment, or has the right to hire the employee’s assistants.
> Status and conduct of employer. For example, the employer is in business to produce such works, provides the employee with benefits, or withholds tax from the employee’s payment.
Looking at those factors, it is really hard to argue that an AI is not an employee, as far as US law is concerned.
An AI is just a computer program. It’s an intangible asset. Any IP it generates was the work of the AI programmer, or perhaps also the operator who provides it with input and instructions on how to process it.
It's not currently understood that the result of compiling code implies a claim to copyright or derivative works of any resulting compiler output, which is GP's point.
Actually, there's an argument that, even if GCC and its support libraries did not have the compiler exception clause, then there still would be no GPL "infection" of compiled code. License to use a software program implicitly carries with it a right to make any necessary copies that are functionally required, even if the copies would otherwise be infringing. So the compiler inserting copies of itself into your program would enable you to distribute those copies with your program without any further agreement (i.e., licenses) required, so long as you had legitimate license to the compiler itself.
I would consider the robot to be a kind of pencil or typewriter.
A robot duck that can sit on a nest, make duck sounds and get food for the babies is still a robot, not a duck. Even punting in the whole consciousness debate, I think this example is easier because we generally think of ducks as simply living to reproduce and if a robot isn't organically reproducing and passing genetic material onto it's offspring, it's missing a huge part of what it is to be a duck.
Human level AI just got a lot easier to create.
A series of if/else blocks is not a "robot", or a legal person in any sense, any more than a paintbrush or a saxophone is.
Look at all the AI-driven visual art we're seeing happen. Copyright goes to the artist who set up the system and then selected the work, because that's the actual creative activity.
Creators have always used tools. The tool isn't what gets copyright, no matter how elaborate the tools. Once the tools have consciousness, drive, and self-determination, we might think otherwise. But that's not where we are today.
12 trillion dollars to the first person who creates an AI that creates every permutation of basic writing.
Can we please just throw out all copyright/patent laws and start from scratch with reasonable terms? (like 5 years, or maaaaybe life of the creator if they keep filing paperwork to confirm they want it.)
I could easily have a websites where any URL you went to was a valid article. For example, example.com/my-article-text-is-here would map to "my article text is here" but the point of creation isn't when you randomly enter in the URL, the point of creation is when some selection of tokens is chose for the actual article, and realistically the combinatorial explosion of human language has us covered.
I’ll admit I am actually not sure if those claims are real. How did you hear about them?
...which contains all texts up to 400 characters.
Suing her under that premise would be almost certainly unsuccessful.
Keeping workshop notes, early revisions; getting works notarised; defensive publication; et cetera -- these are all means people use to guard against false allegations of tortuous infringement.
"Finally, to receive copyright protection, a work must be the result of at least some creative effort on the part of its author."
So the question is, who is the author and if it was machine generated, did it take creativity?
The algorithm took a lot of creativity, but did the output that is being copyrighted?
I mean I would come down on the side of yes, but it makes for an interesting case.
In the US, this question has been settled since at least the 1990s (e.g., in the context of videogames). The output of algorithms is, in general, copyrightable, although there are some rather common-sense exceptions.
The question isn't whether you can copyright the output of an algorithm. The more salient question, in my mind, is whether the output of ML algorithms belongs to the owner's algorithm or to the owner of the training set.
Not all that different from a pop song made by editing together licensed samples, no?
In that case, the song is certainly a derivative work of the samples, and so the producer of the song needs to get derivative-works-allowed licensing from the samples’ authors (which is what you must necessarily get when buying samples from a sample library, for them to be of any use at all.) The produced song is then its own work with its own copyright. Sometimes, larger samples (like reused vocal performances) require payment in, essentially, “equity”—a percentage of the song’s royalties are transferred as royalties to the sample. But in most cases, the sample is purchased for a flat fee, and there is no ongoing relationship between the revenue of the song and the revenue of the sample.
Is anything different if you replace “song” with “news article” and “samples” with “training set”?
We can have the output for the cost of the energy, or we can perpetually (AIs never die!) pay tax to a wealthy capitalist and have the same output; why is the latter better?
Yes, we reward AI makers by giving them copyright protection over their work, we don't - and shouldn't in my personal opinion - reward machines. Why would we, what's the benefit in human terms? There's no moral hazard in turning a machine on and off when we need creative works that the machine is programmed to make or don't need more of such works.
Copyright protections that serve the wealthy owners of AIs whilst they simultaneously undercut creative people producing simulated culture (cheaper than actual culture) would not serve the demos.
One possible legal theory: because the algorithm was trained on a text corpus upon which the algorithm's owner has no legal claim.
In this particular case, I don't think that theory would hold much water.
However, consider, e.g., a model that produces encyclopedia entries and is trained on a half dozen existing encyclopedias. IMO, if that model is using techniques similar to SoTA and isn't producing utter garbage, then the owner of that model should have a very difficult time claiming that the output of their model is anything more than a sophisticated round-about way of copy/pasting from existing encyclopedias.
But still, in that case, the output is still covered by copyright. It's just that the owner of the training set -- not the owner of the algorithm -- is the one with the valid claim to copyright.
The same can be said about human writers: they learn to write based on thousands of "training examples" - the articles and books they read thorough their life.
Or rather, Who knows? Maybe. But certainly, at least today, a SoTA model generating a quality encyclopedia certainly is not doing what human writers do, and is certainly effectively copy/pasting.
Maybe in 50 years -- or 10 years with a major breakthrough on the level of general relativity -- that statement might be true. but it's certainly not true of today's deep NLP systems.
The exact same set of facts, that were obviously reported originally by a single individual, then rearranged, reworded, and republished by 100's of "reporters"/"bloggers", (sometimes) with an attribute of origin.
Incidentally, I think this can actually be a very low standard, depending on context. We tend to think of the Turing test as being performed by academics, in a lab, where everyone is Very Serious. But if you put real humans on i.e. Omegle (with no video), they typically type with poor grammar and say "random" sounding things that quite plausibly could be said by an AI. Additionally, the preponderance of spam scammers demonstrates that many people are quite gullible, unable to differentiate between a Nigerian scammer and a legitimate representative of their bank. Given this, I think we already passed the Turing test, not by bringing AI up to the level of humans, but by bringing humans down to the level of AI.
Adherence to arbitrary conventions in situations where they’re unnecessary is, on the other hand, a poor measure of grammar ability.
Do you really believe that most people who write “u” in an SMS are unaware that it’s written “you” according to formal conventions?
Just publish every letter combination possible somewhere.