Hacker News new | past | comments | ask | show | jobs | submit login
Shenzhen court rules AI-written article has copyright (ecns.cn)
90 points by blacktulip 9 days ago | hide | past | web | favorite | 87 comments

I didn't see anything that specified who owns the copyright. Tencent, I presume?

It also seems like a self-evident ruling. I don't see much difference, except a matter of degree, between computer-generated prose in this case, and something like LaTex. Page composition (graphic design) is generally covered by copyright [0]. In fact just about every creative content has some form of technological intermediary. An "AI" algorithm, or even something more basic, performs touch-up work on a photo. The photographer didn't do it, they just hit the button, just like whoever setup the system for Tencent. So I just don't see how courts could hold that copyright wouldn't apply in a case like this when so much content depends on computational generation of some sort.

[0] https://www.commarts.com/columns/is-it-true-that-copyright-d...

The issue is that the AI can generate all viable versions of the article by front-running someone else’s style.

Our society is built on the assumption that attackers are inefficient or dumb.

If the AI is generating content using a corpus which 1) has copyright attached (say Matt Levine's content for Bloomberg) and 2) isn't owned by the person/corporation that developed the AI, then you're absolutely right and waters are muddied. An argument could be made that the end-product is transformative enough to be its own things, eligible for copyright, but use and, presumably, copying of the original content into a corpus would seem a violation of that copyright.

On the other hand, if Tencent owned the corpus, that isn't really an issue. Similarly, there have been automated finance articles for more than a decade using knowledge extraction algorithms against things like earnings reports, and copyright of those reports has not been an issue. Admittedly, that may only be because those releasing the reports do so in part to get the word out, and so they want reporting to be done on them. Even if they had a copyright claim that it wasn't fair use, they may not have the incentive to enforce it.

Regardless, this all opens up some fascinating discussions of the agency of AI, what constitutes true AI vs. a simple program or algorithm, assignment of who actually would own the copyrights of computer generated content... I think it's going to take some time for the law to catch up to technology on these topics.

Copyright’s stated purpose was to promote the Progress of useful arts and science. That’s why the monopoly was secured to the authors.

Bots don’t care about those incentives. Once built, that’s all they do.

Yes, but the people writing and running the bots still care about those incentives. So it makes sense that they should own the copyright to the articles they generate with their bot.

I disagree — because the goal was to promote a laborious activity. Once we reach a whole new level, we no longer need to give the same incentives. It’s clear that people would be doing this even without copyright.

Now THAT is the first reasonable argument I've heard against copyright for such things. Once it becomes easy, trivial, and for most purposes "free" to hit button and make a best-selling novel, a temporary monopoly on that work wouldn't seem to benefit the goal of incentivising future work. Of course, the person/people that built the system may still need some type of temporary monopoly to incentivize refinements, increased quality, genre variation, etc., maybe even the ability to tailor made a novel to a specific individual's tastes for them alone... but I'm not sure "copyright" would then be the best tool for this. I think there would need to be something new and, given that I think copyright terms are already extremely too long, much shorter than traditional copyright.

Thanks for such an insightful idea! I'm not 100% sure which side I come down on, but it's very thought provoking, the sort of discussion that draws me in to HN.

When a person creates X as part of their job, the company owns the IP. When an AI creates X as part of its job, the company owns the IP. When a person creates X on their own, they own the IP. When an AI creates X on its own... we aren't there yet.

> When an AI creates X as part of its job, the company owns the IP

I'd say that depends. If something is created by AI randomly, owners of the AI didn't create it. It's all about the level of autonomy. If AI simply follows instructions and is a dummy tool - then yeah. But if it does stuff on its own and isn't a conveyor of the owners creative intent - then owners of the AI didn't make it, they shouldn't own the result.

And besides, if you claim it's doing it as "part of the job" and that's why owners of the AI should get it, then give the AI job related rights first as well ;)

Shouldn't it depend on who ran the AI?

If I use the shell to generate a very long random number, I'd expect the output to belong to me.

First of all, a number can't belong to anyone. Also, consider a generative algorithm, that creates every possible combination. Do you expect to potentially own everything that wasn't yet created? Because at some point, that algorithm will get to it.

Secondly, no, it doesn't depend on who run the AI. Because running the AI isn't called creating the result. The one who run it didn't provide creative input. How is it different from telling someone else "go create"? You didn't create the result either.

Another point - copyright by definition is given as an incentive to create. I don't see much need for an incentive to press a button and do nothing after that.

> First of all, a number can't belong to anyone.

My understanding is that's not true, see: https://en.wikipedia.org/wiki/Illegal_number. I agree it's stupid but not up to me.

> Also, consider a generative algorithm, that creates every possible combination. Do you expect to potentially own everything that wasn't yet created? Because at some point, that algorithm will get to it.

Well yes, given infinite time, you could write a program to generate every possible sentence. If I saved those sentences on an infinitely large hard drive, yes, I would expect to own the rights to each one.

Note, that doesn't mean I'd be able to sue anyone else who ever speaks. The color of bits [0] matters.

IMO, the alternative of assigning copyright to the algorithm's creator gets really messy. If an author uses GPT-2 for writing inspiration[1], and at some point, they copy out a GPT-2 paragraph verbatim, does the author not own that paragraph? Same question for music which incorporates AI-generated samples (or just randomly generated samples).


[0]: https://ansuz.sooke.bc.ca/entry/23

[1]: https://www.vox.com/future-perfect/2019/8/30/20840194/ai-art...

> Well yes, given infinite time, you could write a program to generate every possible sentence. If I saved those sentences on an infinitely large hard drive, yes, I would expect to own the rights to each one.

No, you should not expect it. Because the process wasn't creative. And there is no incentive needed for it either, because it's automated. Check again the definition of copyright, and why it's given in the first place. That should be the way to analyze, whether it's applicable or not.

Must the AI itself, otherwise it makes no sense. Though it makes none even for AI (at least at present), since it's far from human comparable.

It's not different from saying that monkey with a camera can have copyright, or someone giving monkey a camera gets it, if monkey makes a picture.

And in fact animal art, in the US, has been deemed ineligible for copyright [0]

I still think the law has much gray are in it though. In between Monkey and Human Photographer with a finished, edited photo, there are many shades of gray with any number of technology-mediated transformations of the work. AI-driven methods of sharpening a photo, for example [1]. Why should the photographer own the resulting AI-adjusted photo's copyright? Certainly they own the original, but it was AI that made the end result. Not unlike a writer making a corpus of text for Tencent's algorithm to learn how to write its own articles, I think.

I'm asking the question, because I honestly don't know: Where would/should the line be drawn?

[0] https://www.swansonlawmn.com/blog/2014/08/23/artwork-created...

[1] https://topazlabs.com/let-ai-sharpen-your-photos/

I suppose at some level of autonomy. A dummy tool is clearly conveying author's intent. Autonomous AI clearly breaks that authorship chain. So where exactly is an interesting question. I'd say it also has to tie into incentive to create which is the basis of copyright.

Chinese court rules...

Do any of us have any clue how Chinese copyright law works? It's probably not the same as American copyright law. What are the requirements set out by Chinese law to be copyrightable?

I don't know but since they seem to be a member of the Berne Convention, it doesn't differ largely from of US?

The Berne Convention isn't that precise. For example, consider that collections of information are copyrightable in most of Europe and not copyrightable in the US.

That's distinct from copyright, and even named a "sui generis right" (meaning "of its own kind").

While at the same time:

> The TRIPS Agreement [which the USA is part of] requires that copyright protection extends to databases and other compilations if they constitute intellectual creation by virtue of the selection or arrangement of their contents, even if some or all of the contents do not themselves constitute materials protected by copyright.


So I don't quite understand whether data sets like OpenStreetMap or Google Maps are copyrightable in the USA. (Note that this concerns the underlying data, not the graphics like map design or street/satellite pictures.)

Also, we don't have fair use here in Australia. My partner loves reminding me that the development of a search engine like google is illegal here in Australia.

The easiest way to escape AI hype-thinking is to replace "AI" with "computer program" in your mind whenever you see "AI" in an article.

A few relevant points regarding copyright law.[1]

1. Copyright is generally held in "works of authorship", which is to say, the work of an author. Whether or not a computer program (AI or otherwise) is an author is the first question in this case.[2]

2. Copyright law provides protection against unauthorised copying of a work. Independent creation is not a copyright violation, for all those who've suggested composig all possible (or for the more efficiently-minded, probable) works of a given length. If another party independently creates a similar work (in whole or part), there is no copyright violation.

3. Copyright persists only in expression and not in the meaning or function of a work. This is in particular contrast to patent and trade secrets law.

The Chinese ruling, as described, fails multiple tests and would not qualify under present general copyright law. Though the possibility of the law changing given changing uses and practices does exist. I don't expect in the near term that this case will have much significance.

By way of highlighting the ... interesting dynamics ... posed by increasing use of AI in creating content -- various systems creating de novo faces, images, audio, or video, as well as text, as examples -- does give some pause. What are the implications of creating such content via AI where the content itself is entirely outside the scope of copyright law?



1. US-centric, though generally applying to WIPO / Berne rules. Not legal advice.

2. 17 USC 102(a) https://www.law.cornell.edu/uscode/text/17/102

WIPO / Berne rules are guidelines. Not even in the Western Europe which aims for "feature parity" with US copyright laws are things the same (case in point: treatment of copyright and patent ownership of things invented while employed, yet not at work or in relation to work).

So they simply do not apply even broadly to a case that was processed in China, especially not in the level of deep US-centric scrutiny you applied to it.

That said, I don't see any of the "multiple test" this ruling fails. It has simply posited that verbatim copying of the article published on one website to another website without prior agreement is still copyright infringement, regardless of the fact that the article itself was generated by software/AI. Nothing more, nothing less.

I find it hard to imagine that a court in any other country would rule any differently.

Both Berne and WIPO have produced multiple treaties, which go well beyond the notion of "guidelines", and are ratified by signatories within their respective national frameworks (e.g., ratified by the US Senate, as with the Berne Convention Implementation Act of 1988).

The actual legal code conforming to treaty requirements is a matter for countries to write and adopt, but generally that's occurred.

So, disagreement in part with your categorisation of WIPO/Berne as "guidelines", which I feel grossly understates their status.

China is a member of WIPO: https://www.wipo.int/members/en/

The Berne Convention of 1971 (I'm fairly certain this is supersceded in at least part) does not appear to include authorship or originality in its properties of covered works:

The expression "literary and artistic works" shall include every production in the literary, scientific and artistic domain...


Though article 3 provides that protections apply to "authors":


There are some subtleties. Models like GPT2 are trained using many copyrighted documents whose authors could claim it is derivative of their work.

Shouldn't the copyright therefore belong to the robot? The company is therefore not party to the ip infringement case.

Now as an employee my ip belongs to my employer, but I am paid for this.

A robot couldn't enter into a contract without 1) being paid (consideration) or 2) the intention to create a contract. Therefore if robots are to assign copyright they are going to need training about contracts.

I don't know about Chinese law, but the US concept is "work made for hire". This is satisfied by one of two alternatives, the first being "a work prepared by an employee within the scope of his or her employment." If you believe that an AI has sufficient personhood to be capable of authorship, then it is not unreasonable to presume that it is being employed by whomever is running it.

In the US, there are three factors that govern whether or not someone is an employee. These are:

> Control by the employer over the work. For example, the employer determines how the work is done, has the work done at the employer’s location, and provides equipment or other means to create the work.

> Control by employer over the employee. For example, the employer controls the employee’s schedule in creating the work, has the right to have the employee perform other assignments, determines the method of payment, or has the right to hire the employee’s assistants.

> Status and conduct of employer. For example, the employer is in business to produce such works, provides the employee with benefits, or withholds tax from the employee’s payment.

Looking at those factors, it is really hard to argue that an AI is not an employee, as far as US law is concerned.

This line of reasoning just seems weird an unnecessary to me.

An AI is just a computer program. It’s an intangible asset. Any IP it generates was the work of the AI programmer, or perhaps also the operator who provides it with input and instructions on how to process it.

We don't generally consider compiler output to be under the copyright of the author of the compiler.

GCC has a license exception specifically to allow compiling programs with it without "infecting" the resulting output with the compiler's GPL. https://www.gnu.org/licenses/gcc-exception-3.1-faq.html

That's more to deal with the fact that compilers insert calls to various support routines to implement certain functionality--for example, there are division routines to support architectures without a division operator.

It's not currently understood that the result of compiling code implies a claim to copyright or derivative works of any resulting compiler output, which is GP's point.

Actually, there's an argument that, even if GCC and its support libraries did not have the compiler exception clause, then there still would be no GPL "infection" of compiled code. License to use a software program implicitly carries with it a right to make any necessary copies that are functionally required, even if the copies would otherwise be infringing. So the compiler inserting copies of itself into your program would enable you to distribute those copies with your program without any further agreement (i.e., licenses) required, so long as you had legitimate license to the compiler itself.

I’m not sure that’s entirely analogous. Open source and commercially licensed compilers would have licensing agreements governing how they can be used. But I imagine if a person made their own compiler, and didn’t license it to anybody, but somebody somehow gained access to it and used it to compile their code, that the resulting output would infact be infringing the IP rights of the compilers author. I’m sure you could also publicly release a compiler with a license that assigned copyright of the output to the compiler’s author (though I doubt you could convince anybody to use it).

Famously, early pc assembler a86/d86 claimed to able to identify unlicenced used from byte-patterns


This is incorrect, "work for hire" only helps determine who owns the copyright on something, not whether that thing is actually copyrightable.

GP is suggesting that the copyright should belong with the robot, I'm pointing out that work for hire would mean that it does not belong with the robot.

Surely it would fall foul of anti-slavery legislation though? Unless the Robot is paid?

Perhaps more analogous to employing a minor (without pay). But who is the guardian of the robot who can approve their employment?

> Shouldn't the copyright therefore belong to the robot?

I would consider the robot to be a kind of pencil or typewriter.

While it's fun to think of AI as humanoid, it's not. It's just a new kind of software.

A robot duck that can sit on a nest, make duck sounds and get food for the babies is still a robot, not a duck. Even punting in the whole consciousness debate, I think this example is easier because we generally think of ducks as simply living to reproduce and if a robot isn't organically reproducing and passing genetic material onto it's offspring, it's missing a huge part of what it is to be a duck.

Neuter a duck and it's a robot, got it.

Human level AI just got a lot easier to create.

>Shouldn't the copyright therefore belong to the robot? The company is therefore not party to the ip infringement case.

A series of if/else blocks is not a "robot", or a legal person in any sense, any more than a paintbrush or a saxophone is.

You’re being reductionist. At a certain point humans could also be argued as being composed of if/else blocks.

Only if the robot has its own corporate entity. Otherwise, like any machine, it's not a person under the law.

Then how could it produce a copyrightable work? The Berne convention only applies to 'Nationals and residents' of a country. Therefore if a court has held that the robot produced copyrighted material, then the robot must be a person.

Because under the law, it didn't produce the work.

Look at all the AI-driven visual art we're seeing happen. Copyright goes to the artist who set up the system and then selected the work, because that's the actual creative activity.

Creators have always used tools. The tool isn't what gets copyright, no matter how elaborate the tools. Once the tools have consciousness, drive, and self-determination, we might think otherwise. But that's not where we are today.

Oh god.

12 trillion dollars to the first person who creates an AI that creates every permutation of basic writing.

Can we please just throw out all copyright/patent laws and start from scratch with reasonable terms? (like 5 years, or maaaaybe life of the creator if they keep filing paperwork to confirm they want it.)

Eh. That's essentially just rand() with a fat disk. The real task is choosing which permutations to select and publish.

I could easily have a websites where any URL you went to was a valid article. For example, example.com/my-article-text-is-here would map to "my article text is here" but the point of creation isn't when you randomly enter in the URL, the point of creation is when some selection of tokens is chose for the actual article, and realistically the combinatorial explosion of human language has us covered.

You have a point. But with a “the right” AI (presuming humans can cause it to be made), you would instead patent the essential building blocks. Like patenting the arrangements of atoms that make up proteins instead of patenting every single possible arrangement.


Well played. I’ll admit I am actually not sure if those claims are real. How did you hear about them?

There's also Library of Babel: https://libraryofbabel.info/

...which contains all texts up to 400 characters.

Looking forward to the day AI is used to generate billions of books and then compare them to every new release by a human. Then the patent/copyright trolls come out and sue human authors for copyright infringement because their awful AI book that is 50% gibberish has all the same elements and character names as the human author's book

Those billions of books probably violate the copyright of other books themselves.

I don't think this is the caveat you think it is.

That's not how copyright works. In your example, the human author did not copy anything, and are thus legally in the clear.

In practice it is my understanding that this is how copyright works in the US. There doesn't need to be any intentional infringement - just the fact that it infringes is enough. If someone came up with "Harry Potter" as a main character name on their own they still wouldn't be able to publish a novel with that title.

So are you saying that J.K. Rowling could be sued by the copyright holders of the 1969 film "Carry On Camping"?


FWIW there's no copyright in names.

Suing her under that premise would be almost certainly unsuccessful.

This is copyright, not patent or registered mark. Copyright, by definition, restricts redistribution and copying, not independent recreation.

How does one prove independent recreation? It seems virtually impossible.

Copyright is a tort, the usual measure of proof is "balance of probabilities", you don't have to prove it, the plaintiff has to show that on balance the most likely truth is that the defendant copied.

Keeping workshop notes, early revisions; getting works notarised; defensive publication; et cetera -- these are all means people use to guard against false allegations of tortuous infringement.

Simple: You do not presume guilt, hence independent recreation is the default assumption and it is copying/redistribution which needs to be proven.

A billion is probably a very small number when compared to the cardinality of the set of all possible/reasonable books.

Why wouldn't this be the case? In what sane legal system would one not own the work they published just because it was generated in part or full by an AI? If company B lifts company A's AI and starts generating articles with it, the suit would then be about the code itself, just like with websites, video games, etc.


"Finally, to receive copyright protection, a work must be the result of at least some creative effort on the part of its author."

So the question is, who is the author and if it was machine generated, did it take creativity? The algorithm took a lot of creativity, but did the output that is being copyrighted? I mean I would come down on the side of yes, but it makes for an interesting case.

> if it was machine generated, did it take creativity?... but did the output that is being copyrighted?

In the US, this question has been settled since at least the 1990s (e.g., in the context of videogames). The output of algorithms is, in general, copyrightable, although there are some rather common-sense exceptions.

The question isn't whether you can copyright the output of an algorithm. The more salient question, in my mind, is whether the output of ML algorithms belongs to the owner's algorithm or to the owner of the training set.

> It's just that the owner of the training set -- not the owner of the algorithm -- is the one with the valid claim to copyright.

Not all that different from a pop song made by editing together licensed samples, no?

In that case, the song is certainly a derivative work of the samples, and so the producer of the song needs to get derivative-works-allowed licensing from the samples’ authors (which is what you must necessarily get when buying samples from a sample library, for them to be of any use at all.) The produced song is then its own work with its own copyright. Sometimes, larger samples (like reused vocal performances) require payment in, essentially, “equity”—a percentage of the song’s royalties are transferred as royalties to the sample. But in most cases, the sample is purchased for a flat fee, and there is no ongoing relationship between the revenue of the song and the revenue of the sample.

Is anything different if you replace “song” with “news article” and “samples” with “training set”?

Copyright isn't a natural right, and giving rights to computational algorithms isn't a normal legal act - how does it benefit society to do that? Is the deal good for the populous as a whole?

We can have the output for the cost of the energy, or we can perpetually (AIs never die!) pay tax to a wealthy capitalist and have the same output; why is the latter better?

Hmm, yeah I can see the argument there. I guess it boils down to whether one believes in a transitive property of creativity. I think it applies until the B in "A -creates-> B -creates-> C" is deemed to have certain rights, which is going to be the really interesting question with all this. AI will be like a child prodigy with exploitative parents

AIs don't need to "eat", therefore they don't need copyright protection, if someone duplicates the work the AI produces it doesn't jeopardise that AIs livelihood as it doesn't have a livelihood. Copyright is a bargain intended to enlarge the public domain and reward creative people for the creative works they make.

Yes, we reward AI makers by giving them copyright protection over their work, we don't - and shouldn't in my personal opinion - reward machines. Why would we, what's the benefit in human terms? There's no moral hazard in turning a machine on and off when we need creative works that the machine is programmed to make or don't need more of such works.

Copyright protections that serve the wealthy owners of AIs whilst they simultaneously undercut creative people producing simulated culture (cheaper than actual culture) would not serve the demos.

The creator of the AI still needs to eat. Your suggesting that AI developers should effectively have none of the existing legal protections for software and other creative works. Also, the "bargain" clearly applies to AI applications. Why/how would anyone start a business like https://brandmark.io/ if the generated logos have no legal protection?

> Why wouldn't this be the case?

One possible legal theory: because the algorithm was trained on a text corpus upon which the algorithm's owner has no legal claim.

In this particular case, I don't think that theory would hold much water.

However, consider, e.g., a model that produces encyclopedia entries and is trained on a half dozen existing encyclopedias. IMO, if that model is using techniques similar to SoTA and isn't producing utter garbage, then the owner of that model should have a very difficult time claiming that the output of their model is anything more than a sophisticated round-about way of copy/pasting from existing encyclopedias.

But still, in that case, the output is still covered by copyright. It's just that the owner of the training set -- not the owner of the algorithm -- is the one with the valid claim to copyright.

>> One possible legal theory: because the algorithm was trained on a text corpus upon which the algorithm's owner has no legal claim.

The same can be said about human writers: they learn to write based on thousands of "training examples" - the articles and books they read thorough their life.

Not at all.

Or rather, Who knows? Maybe. But certainly, at least today, a SoTA model generating a quality encyclopedia certainly is not doing what human writers do, and is certainly effectively copy/pasting.

Maybe in 50 years -- or 10 years with a major breakthrough on the level of general relativity -- that statement might be true. but it's certainly not true of today's deep NLP systems.

A better example is the "copy and paste" news articles that saturate feeds everyday.

The exact same set of facts, that were obviously reported originally by a single individual, then rearranged, reworded, and republished by 100's of "reporters"/"bloggers", (sometimes) with an attribute of origin.

That would be a problem, but would be a data licensing issue, which is distinct. It's more analogous to "Blurred Lines" infringing on "Got To Give It Up" or w/e.

That’s your definite Turing test. Stamped, sealed, delivered.

The Turing test as defined by Turing is interactive and conversational, not about an AI producing a particular artifact.

The original Turing test had the AI attempting to imitate a woman, which is not only demonstrates conversational ability, but an understanding of complex ideas like gender.

Incidentally, I think this can actually be a very low standard, depending on context. We tend to think of the Turing test as being performed by academics, in a lab, where everyone is Very Serious. But if you put real humans on i.e. Omegle (with no video), they typically type with poor grammar and say "random" sounding things that quite plausibly could be said by an AI. Additionally, the preponderance of spam scammers demonstrates that many people are quite gullible, unable to differentiate between a Nigerian scammer and a legitimate representative of their bank. Given this, I think we already passed the Turing test, not by bringing AI up to the level of humans, but by bringing humans down to the level of AI.

That's a fairly depressing way of looking at it. https://xkcd.com/1414

That XKCD comic makes a good point: grammar ability is a poor measure of intelligence. Perhaps I was wrong to think of it as "bringing humans down".

Grammar ability is a fine measure of intelligence.

Adherence to arbitrary conventions in situations where they’re unnecessary is, on the other hand, a poor measure of grammar ability.

Do you really believe that most people who write “u” in an SMS are unaware that it’s written “you” according to formal conventions?

Could you not just brute force write any article possible and own the copyright on any possible article ever written?

Just publish every letter combination possible somewhere.

1. How much storage would that take? 2. Anytime you think you've found a clever "gotcha" that nullifies an entire area of legal doctrine, smack yourself in the face and then repeat "the laws are made and interpreted by people, not machines" one hundred times.

If they own articles then they may own patents. I view some current contradictions in what ai owns or not.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact