Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What does copyright say about generative models? (oreilly.com)
65 points by BerislavLopac on Dec 15, 2022 | hide | past | favorite | 83 comments



> What was originally intended to protect artists has turned into a rent-seeking game in which artists who can afford lawyers monetize the creativity of artists who can’t.

It's rather a rent-seeking industry; the vast majority of artists benefit only marginally from copyright; an original intent, to give composers an income who would otherwise be out of the monetary loop, is long forgotten; instead, early on it was all about protecting the publishers' business by restricting copies; ironically, composers (or musicians in general) today earn best, on average, when they work as a clerk for a copyright collecting society; I don't think patent and copyright law can be fixed; they can only make it even more complicated and unwieldy, so that it gets even further away from composers and authors, and instead plays into the hands of trolls or monopolists.

> fixing copyright law to accommodate works used to train AI systems, and developing AI systems that respect the rights of the people who made the works on which their models were trained

And then also charge each student of art for studying original works with the intent to create new works based on what they have learned to make a living? This idea can be extended in any direction and quickly leads to a system that is rather against an open society where people benefit from each other.


ML isn't the same thing as human students, it's just computational statistics with some impressive results.

Consider it a black box where copyrighted work goes in and similar work pops out. Conflating what is essentially a function approximator with human beings when it comes to law is a stretch.


> a black box where copyrighted work goes in and similar work pops out

If a generated work is similar to an already existing copyrighted work, it is not relevant whether the work was generated by a DNN; the author of the original work can file a claim anyway. On the other hand a work generated by a DNN by itself is not copyrightable, since a human author is required by copyright law. The use of training data for the DNN is fair use according to jurisdiction in comparable cases.


And in fact- I think something that would be really helpful is if these models shipped with another model that detected how similar an output image is to a specific image in the training set

You could get this score alongside the results if you wanted to generate images but know which ones probably shouldn't be used commercially, or you could choose to filter your results by some similarity threshold


> The use of training data for the DNN is fair use according to jurisdiction in comparable cases.

That's more of an open question given that the lawsuits concerning exactly this haven't made it before a judge yet nor have concluded.

There are some very sound arguments against this particular case being considered fair use, and they haven't seen the inside of a courtroom to be tested.


There are enough similar cases for all aspects that are touched here, so that an assessment is quite clear. But of course, every court case has a certain element of surprise.


> On the other hand a work generated by a DNN by itself is not copyrightable

I don't really understand the logic here. I have heard there is also some legal precidence to this claim. But surely using photoshop to create an image doesn't mean that you didn't create the image, but photoshop did? So why should this apply to tools that use neural networks?


This is simply the law; as we know, laws or jurisprudence are not always logical, at least not from every perspective. From my point of view, it makes sense to limit copyright to the minimum necessary. And in the case of computer-generated art, it will be difficult to judge anyway whether it actually comes from the artist who published it, or whether it - or parts of it - have merely been generated by a DNN.


> ML isn't the same thing as human students

Are you sure about that? https://www.smbc-comics.com/comic/themes ;)


The system sure seems to have created a ton of music. I know many people who have benefited from being able to actually earn money from their work and would strongly disagree with you. I respect your opinion, but that is all you have written here, your opinion, not some great truth about copyright.


It's not just an opinion; I'm a musician myself and had many production contracts in the past; I also studied law and am well familiar with the business and the legal issues.

There are indeed some people who were able to make a good living from the money they received from the copyright collecting society, though they are a small minority; all musicians I know who are members of a copyright collecting society get a few hundered Swiss francs a year and don't live from their music.

Some years ago I analyzed the yearly business report of SUISA, the Swiss copyright collecting society responsible for musicians, and found that of the 120'000'000 CHF payed by the society only one sixth was given to Swiss composers ("Urheber"), and that the probability of any of those composers would get enough money from SUISA to make a decent living is less than 0.1%; for comparison, the probability of being seriously injured or killed in a traffic accident in Switzerland is > 0.2%; the vast majority (> 97%) gets nothing to less than 1900 CHF.

In comparison the administrative cost to run SUISA in the same year was 26 mio CHF (compared to the 20 mio CHF payed to musicians), and the publishers received 54 mio CHF. SUISA payed ~19 mio CHF to their 166 employees; in 2006 357'000 CHF was payed to the director alone.


Is it the system, or is it the access to technology and education? Tools like the internet allow pretty much anyone to learn so much about how to make music and not only that but being able to publish your music very easily as well as forming communities, meeting other artists, etc.

I'm not sure if it's copyright that has really allowed for music to flourish, I think that's a claim that requires a bit more research behind it.


You present a false assertion: that owning the copyright is a predicate to receiving money for creative works.


It is.

You sell your music for $1. I sell your music (but without copyright, it's my music, really) for $0.50 in all the same locations as you do. I'll win any race to the bottom, because my cost in time is absolutely less than yours in the effort to sell my music.


I'd like to present a counterexample: SNKRX[1]. It costs 3€ and is under the MIT license, you can find it on github[2].

It sold >100,000 units[3], and AFAIK no one's massively undercut the author. I guess he's kind of undercutting himself by making it available on github, but I bet most people would rather pay 3€ and have nice steam features than download and compile a game off a site they likely don't know.

And yes, I do realize he's still the copyright holder for SNKRX, but he's not really enforcing it with a permissive license like MIT.

[1]: https://store.steampowered.com/app/915310/SNKRX/

[2]: https://github.com/a327ex/SNKRX

[3]: https://www.a327ex.com/posts/2022/


>I guess he's kind of undercutting himself by making it available on github, but I bet most people would rather pay 3€ and have nice steam features than download and compile a game off a site they likely don't know

Well, if you don't have copyright, anybody could also publish the game on steam or in any other shop with the same "nice steam features", and get the money over him. It would just be up to the audience charity ("let's give our money to the original author instead of buying the cheaper same featured alternative release")


What is stopping someone from publishing SNKRX to a site like itch.io and selling it for 1€ instead? And if nothing is stopping it, why hasn't someone done it yet (that I know of).


>What is stopping someone from publishing SNKRX to a site like itch.io and selling it for 1€ instead?

Nothing, that's my whole point.

>And if nothing is stopping it, why hasn't someone done it yet (that I know of).

Not everything that can happen has happened (or will happen).

But many programs in similar open status were indeed taken and sold this way. And sure, it's legal and a particular creator might be fine if that happened to his program. But that wasn't what we were discussing in this subthread, but the fitability of this model for people wanting to make a living, e.g. the proposition that: "I guess he's kind of undercutting himself by making it available on github, but I bet most people would rather pay 3€ and have nice steam features than download and compile a game off a site they likely don't know".

Well, first, the undercutters might put their version in a nice site like Steam as well. And that's just one case (or e.g. Steam discourages them, even though what they'd be doing is legal), there's also games/apps that they are sold in the author's website or on iOS/Android app stores - and for those anybody undercutting them with their own build/copy would have an equally convenient/prestigious delivery trivially.

Now, there are some cases where the original creator would fare better, but not because of the basic mechanics of the model, but because of other factors: e.g. a creator could have a nice following and be well loved in the community, to the point that most users would go out of their way to buy their version for support, and not the third party ones, even if the latter are undercutting it.

For the average program and the average creator without fans though, if they are concerned about their own sales, that's more of a "put it out in the open and pray you don't get undercut" proposition...


The MIT license explicitly lets someone do exactly that.

> including without limitation the rights to [...] sell copies of the Software.


I know, that's my whole point.


Cool! I'll change a few strings and sell it under a public license on Steam and itch.io and the Epic Game Store for 1€, under a slightly different name. Maybe SNKRF. Probably see if I can compile it for mobile and throw it up for free on the app stores with tons of ads.

The OC can spend their time on the updates, and I'll spend my time on marketing the new version. I'd probably want to report the original as a clone, because I can and it could help boost my own version's rankings. If the OC wants to fight that, it's his time, so cool.

And should the OC actually succeed in getting that clone off the store, the other 10 I've put up while they're busy fighting against the first clone can take its place easily enough.

After all, without IP - without copyright - it's mine to sell however I want to. Nobody can stop me, nobody can even really chastise me because nothing was taken and nothing was lost.

That copyright you're dismissing is worth a lot. It states that it is, indeed, the owners' work. It raises potential legal ownership issues with the compiled work (which isn't under the same license as the code) and store listings. Even if he's not actively using it, it's still busy protecting his work.


Well, first, let's note that you can do all of those things now, legally. It's MIT licensed. If your argument is that without a strong intellectual property discipline in place, this would inevitably happen, your burden is to explain why this isn't happening.

Some potential arguments you might use are that it does happen, and will happen eventually to this project, that it's simply a matter of time. Another is that it currently isn't worth it for the amount of work you might put into doing this for this project, but as soon as it is worth something, it will be. Which implies in this case, the cost of all of this activity would not be covered by the return, even if you didn't have to pay the creator.

There's almost certainly a market efficiency in scaling up this packaging activity -- so that the original creator could give up some of his copyrights to pay someone to do the packaging, as long as that packager is doing it for lots and lots of other projects. That's how IP really works.

But in a world of explosive creative access -- where making copies, and creating works is extremely popular and hypercompetitive, the chances are that copyright as a sellable, alienable right isn't going to be the thing that pulls in the money for the artists. It's a thing that maybe you speculatively sell to that large-scale packager, that promoter, in a buyer's market, for a vanishingly small amount. Which makes the contrast between a world without copyright, and a world with, far less diabolically opposed. IP gets you a pittance, but for the vast majority of artists, it's not nearly enough. It may actually be kind of cruel to incentivize artists to produce with an endlessly drawn out false promise of IP riches.

Could we do better? Well, one thing is that we could stop what you're describing without an alienable right to copy, more like a system to prevent the kind of fraud or misrepresentation you describe. There are components of this in our existing creative incentive laws, but they're actually some of the least "property"-like attributes, and do not revolve around making copies -- moral authorship, some elements of trademark. But it involves a lot of work to try and reform our outlook to come out with policies that genuinely rewards artists and enables innovation. I don't see it arising from those praising current intellectual property models, and arguing for their expansion.


>Well, first, let's note that you can do all of those things now, legally. It's MIT licensed. If your argument is that without a strong intellectual property discipline in place, this would inevitably happen, your burden is to explain why this isn't happening.

Because you cherry-picked an example. This (separate publishing profiting from the same work) has been happening time and again to musicians and authors - even to those under copyright protection -- the only difference is that those have some legal recourse, and have been shutting down those alternate editions.


Don't most artists make the majority of their money from live performances, not recording contracts?


No. Most artists by quantity don't do live performances. And live performances are a lot of work for not a lot of money, unless you're Taylor Swift or one of her peers. Ticket sales end up getting split about 20 different ways.

For example, Miracle of Sound doesn't do live performances, because his music is all electronic and created by one person.

Plus, I can take my music and do "live" performances of it too. I can also hire folks (say, DJ's) to do these "live" performances on my behalf.


Lives were traditionally a loss leader for album sales for the majority of artists (to get their name out). Some big names with big followings that charge a premium do make profitable tours, but those are the exception.


>The system sure seems to have created a ton of music

Because of it, or in spite of it?


Copyright vs. AI

I'll grab some popcorn. This could be one of the all time epic battles.

Earlier someone posted, and then deleted, an Ask HN: "Are artists fighting AI Art repeating Metallica versus Napster?"

This is actually an entertaining question, because it brings in the power of the entertainments business.

Where is Napster today? Didn't Metallica win that one? It might not be a great comparison, but what if RIAA, MPAA, Sony and the game industry decide that "generative AI" occupies the same threat space as "piracy"?

It was of course Metallica and an army of ten thousand lawyers and goonies from a vast, wealthy, moribund industry that actually did manage to block the road of progress and frighten the genie back into the bottle.

In fact, the power of the film and music industry to shape technology has been so immense, you have to wonder whether they could do it again over "AI".

Right now I think the entertainments industry is shitting itself over LLM technology, but is split over whether it can gain enough control to allow it on it's own terms, or mobilise to fight it. We haven't yet reached the stage of commodity proliferation. That will be the watershed.


To the artists it didn't really matter whether they were bitten by the cats (the record labels) or by the dogs (the pirates), they ended up losing out anyway. This is because artists are less motivated than either pirates (who have nothing better to do) or record label executives and their cronies (who are protecting their income stream). Artists are a rule are happy to be making enough money to be able to continue to make art. And that makes them the vulnerable party in these transactions. Witness the countless accounts of abuse by record companies of bands and individual artists, including bookkeeping tricks, contractual abuse and so on.


The artists didn't exactly win Metallica vs. Napster either.

The end result was a compromise that still funneled some degree of money into the labels, with the artists getting an equally raw deal as before proportionally speaking--but now with a micro share of your $10/mo to Spotify or wherever instead of their share of $10+ for a single album.

I'm not saying the prior business model was sustainable (at least ethically) but at the end of the day, "if you can't beat them, join them" is still one hell of a compromise to make.


Yes I think you're right. The technology was tamed and brought to heel. It starts out looking "disruptive". What will that look like when Big Media figure out how to take legal control of generative AI and become effective arbiters of all that can be cheaply, mechanically created?


Was it truly "brought to heel"? P2P file sharing is still very much around, and many people don't see it as inappropriate - especially among those under 40.


They took the cake from the pirates and gave it to the privateers. The majority of the musicians get nothing and the cats (labels) and the tech companies get paid.


> I'll grab some popcorn. This could be one of the all time epic battles.

No, it will be boring, the outcome is clear: those with the deepest pockets will win this battle.

Just look at Disney who had copyright law changed back in the previous century.


> those with the deepest pockets

But you merely beg the question. Which pocket is the deeper? The left pocket of a network of industries that stand to dominate the world via AI? Or the right pocket of the same interests who will be destroyed by it?


I can see a symbiotic relationship between large rightsholders and AI-related industries developing, perhaps where they license their corpus to be trained on, and maybe even share rights/profits/etc with AI companies when it comes to commercial use of the AI's output.

In such a world, human creators would be paid a pittance by large rightsholders to create, and then those rightsholders would license those creations to be trained on, and then the human would be cut out of the loop for subsequent creations. In this world, creators wouldn't be much different than the data labelers that currently tag ML datasets. They'd probably be paid the same, as well.


I see this going one of two ways:

* States take a laissez faire approach to regulation these networks, we get an open source community of people developing them publicly for anyone to run on their local machines, run small clusters etc.

* States take a heavy handed approach and prevent anyone distributing a model trained on copyrighted works without agreement from the copyright holder, or paying some form of tax that gets redistributed to them. We end up with an oligopoly of ad-supported, data-collecting, heavily censored SAAS models which we cannot download, modify, or run locally.


Copyright as we know it is, simply put: entirely broken.

There is no world imaginable where the concept of owning ideas should be protected. Imagine if chefs had to license recipes…

The arts communities need to take cues from the scientific community where citations are the cultural norm. And as a society we need to figure out how to protect artists by protecting and celebrating the expression of ideas. Not the ideas themselves.


> There is no world imaginable where the concept of owning ideas should be protected.

Copyright does not apply to ideas, it applies to specific expressions/implementations of those ideas, which is what it seeks to incentivize. The fact that many companies (particularly Disney) sued people who only copied the ideas and not the expressions shows a failure of the legal system, not copyright.


Why, then, is one artist even able to bring a case against another artist for composing a song with a similar riff as one they published? It seems it's not so simple. Perhaps I should have been more specific to say that copyright law as it exists in society today is rather broken so as not to disparage the original intent of copyright as a concept. But, I feel like we're just talking semantics, really. The point remains that the implementation is broken, copyright as we know it.


You can bring such a case, but cases I'm aware of that were brought over two songs having similar riffs were cases that were lost.


"Blurred Lines" versus "Got to Give it Up"?

> On March 10, 2015, a jury found Thicke and Williams, but not T.I., liable for copyright infringement. The unanimous jury awarded Gaye's family US$7.4 million in damages for copyright infringement and credited Marvin Gaye as a songwriter for "Blurred Lines". In July 2015, the judge rejected a new trial and the verdict was lowered from US$7.4 million to US$5.3 million

https://en.m.wikipedia.org/wiki/Blurred_Lines


I was thinking of the Ed Sheeran and Martin Gaye lawsuit.


Because riffs aren't ideas, they're expressions of ideas. This seems pretty obvious? The concept of a riff is not copyrighted, but a specific one apparently can be. Obviously copyright overreach is a problem today, but you're attacking the wrong problem. Musical works can be plagiarized by other musicians and it does happen, sometimes by accident (due to modern technology and attribution problems), sometimes on purpose.

The developer of a notable mobile game had to pull some music from their title a couple years ago because the composer they hired blatantly plagiarized some other works, for example.


> but you're attacking the wrong problem

No, I am not.

I do not believe it is possible to accidentally plagiarize something (it cannot be by definition). I do not wish to live in a society where you can face legal consequences for picking up a musical instrument and simply expressing what's on your mind. Or as another commenter put it, dropping bricks on a piano. It's insane but it's rather familiar, sadly.

It's in fact you who are falling victim to the common misconception that copyrighting the performance of a riff by recording it grants you rights to attack others' performance of the same riff. And that if I perform the same riff you performed I have somehow plagiarized it. To use your words, if you copyrighted the expression of one idea, then how can you possibly attack someone else's expression, that is not your expression, of the same idea, hmm?

I am lamenting the fact that it's even possible to become so confused with the current state of copyright law (and juries do). And that it's even possible to bring a case against someone in the first place (it is). That is a problem and that's what I'm attacking.

As a thought experiment, consider: https://libraryofbabel.info. Should the author be entitle to copyright over every combination of characters imaginable simply because they wrote them down?


"Accidental plagiarism" is cases where you heard it before and are imitating it without intent. Intent does matter, but copying is copying!

This is not a "common misconception". It's how the system works! I'm not saying it should work that way, but that's how it works.

A series of notes is not an idea, it's a composition. That's just how it is. You can have the idea for a composition, perhaps even for a note progression you want to work into it in your head, but once you've made a composition, it is no longer just an "idea". Obviously there has to be a line somewhere, you shouldn't be able to copyright a sequence of 3 notes, but that's a question of law. Are you a lawyer? If so, I'll defer to you on this.

> Should the author be entitle to copyright over every combination of characters imaginable simply because they wrote them down?

Writing down a series of characters is kind of the definition of authoring a work, yeah. So I would expect they have the copyright if they wrote it down, unless someone else wrote them down first. Should it work that way? Maybe not.


> A series of notes is not an idea, it's a composition.

You're missing the nuance. You should watch Adam Neely's latest video on music and copyright.

To elaborate slightly, there's the idea of the series of notes, and there's the actual composition of the series of notes, and there's a performance by some artist of the series of notes.

1. The idea of the series of notes is not copyrightable. If it was somehow an invention it might actually be patentable, but intellectual property is not exactly what we're discussing.

2. The composition of the series of notes, written down or fixed to some media in some form, is copyrightable. And the author of the composition owns the copyright as soon as they generate the work.

3. The performance of the composition, if recorded, is copyright of the person making the recording.

Now here's my commentary.

On 1: some people think that the idea of the series of notes is what you copyright when you write the notes down. This is false. But society is incredibly confused on this topic, as the existence of multiple copyright cases regarding this topic demonstrates.

This may be the fault of outdated legal doctrine, influence from large publishing conglomerate, etc., and in reality probably all of the above. But that's besides the point.

On 2: you compose some music and print it out. This is your work. I am not entitled to take that sheet of paper and run it through a copy machine. I must obtain your permission. Even if I buy that sheet of paper from you, I am not entitled to make copies of it unless I have sufficiently licensed the work form you for that purpose.

I am allowed, however, to create an identical composition. I am even allowed to create an identical composition after having seen yours or heard a performance of it. The law doesn't say I can't plagiarize. Society generally does not value plagiarized works, though, because they lack any sort of creativity.

I am further allowed to sell the piece of paper you sold me. You don't own the item for which you hold the copyright after the first sale. This is how used book stores work and it's getting fucked up as we make things digital. I am also allowed to rent the piece of paper out, sharing your work with someone else. This is how libraries work. I own the paper.

On 3: the record of a performance is the property of the entity that recorded it. I'm sure there are plenty of clauses regarding who is allowed to record performances for popular artists and you probably waive certain rights when you purchase tickets and enter into an agreement with the venue, but that's not exactly copyright law either.

I am allowed to listen to F.U.N.'s Carry On in concert and then belt it out in the car on the way home, even if there are passengers and my windows are rolled down and I'm driving through a crowded public street. I can cover the performance of a copyrighted work. I am not generally considered to be plagiarizing because I am creating a derivative work by performing my own rendition of the song.

Now there are formal agencies that try to govern the performance rights of a composition and try to rent seek on all aspects of the copyrighted work and govern how the work may or may not be used in derivatives. And they are successful in doing so. I don't have the legal training to know how much of this is legal vs how much of it is simply "good" manners, but I do know that, to keep the machine oiled, artists generally acquire mechanical use licenses from other artists before covering one of "their" songs.

That's the extent of how the "system" is specified to work today, as I understand it.

> Obviously there has to be a line somewhere, you shouldn't be able to copyright a sequence of 3 notes, but that's a question of law.

The law deliberately does not specify where this line is. Probably because it's impossible to do so.

So the things I'm fussing over:

a) interested parties construing the idea of copyright to include the, well, idea of the copyrighted work, not just the actual manifestation. This is poisonous and the fact that people are confused is an indication to me that the current system is not clear and specific enough and needs work. As a society we cannot thrive when people have been convinced that a sequence of 8 notes is inexpressible and cannot be repeated simply because someone else did it, or simply thought of it, first. This goes against everything beautiful about creating art in the first place and it is beyond silly that we are wasting legal resources entertaining the folly of large record labels and publishing organizations.

b) it stems from (a), the idea that a copyright holder is entitled to full creative control over the use of a fairly acquired copyrighted work. Even if we conceed the unadulterated original spirit of copyright is perfectly fine and the "system works", it does not extend any sort of authority to the copyright holder to govern how their work should be derived, expressed, interpreted, etc. It is insane that the industry norm right now is such that artists feel they need to obtain a special license of some copyrighted work in order to create a derivative work that incorporates it, or even just to perform their own rendition of it. I don't personally find any reason to fundamentally or axiomatically bestow this sort of creative authority upon people who simply come up with ideas and write them down.

c) finally, an attack on copyright itself. It's a dated concept from an age where copying was hard and doing it was a meaningful function in society. We have generated all text strings and melodies that ever were and can ever possibly be. Our tools have advanced to a point where someone can actually write everything imaginable and claim copyright. Our culture has advanced beyond sheets of paper and the media we use regularly makes copies of data all the time in order to even operate. We have developed algorithms that remix content and generate new content that is at times indistinguishable from the original. Is it a copy? That's the question of the article.

I suspect we all agree that humans who pursue creative endeavors are valuable to society and should be supported. I am simply saying that copyright is a very poor implementation of how our current society might do that, not least because it doesn't actually end up supporting the content creators very well in the first place and has since been coopted by self-serving publishing companies for profit. But also because it doesn't fit with our current understanding of a digital society nor does it mesh with the technology we've developed (the web, computers, and content generating algorithms). I don't really wish to see society stifle the possibilities we unlock with "AI" generated content because of dated notions that we need to protect original creators ability to dictate the copying of their work.


And you end up with $7.4M in damages and 50% of songwriting credits for a song that the melodies, chord progressions and rhythms are completely different (Blurred Lines and Got to Give It Up) because the song as a "similar feel."


> And as a society we need to figure out how to protect artists by protecting and celebrating the expression of ideas.

Github copilot would (until they manually changed it) spit out the famous fast inverse square function word for word - comments and everything which is more than just the idea (why would you want the comments with them swearing?)


I'll admit, writing something down is where it gets very murky.

On one side, you have music and musicians, where the written form is somewhat of meaningless tool used to produce creative expression, which is the performance of sound on an instrument.

On the other hand you have authors (and recently, software engineers), where the creative expression as most understand it, is in the choice of arrangement of words (or statements).

You can't really resolve the two. And so I lean towards a stance that it is fraught to try and build a flourishing creative society on the idea that simply writing an idea down (or saving a file of data) means you own it and can legally bring a case against someone else who happens to write the same idea down, be it music or a book.

And then there's https://libraryofbabel.info, which contains all possible combinations of characters that every has been, or ever will be, written. Should that be copyrightable?

Anyway, I am of the opinion that Copilot does infringe on copyright as we know it. But I'm also of the opinion that simply giving individuals universal claim to any written text they produce is problematic. And to answer your question, maybe I do want the comments. Maybe they communicate something that the code cannot, swearing be damned.

I'd prefer a culture of citations for written/stored/saved works. And a culture of celebrating performance of the arts, not storage of them.


It's interesting watching AI turn into RMS's the Right to Read. When things were just a human endever alone there was a lot of pushback to prevent ownership of culture and words, but the AI battle looks like it's going to erode that


> Does copyright law protect “in the style of”? I don’t think anyone knows.

I think the is the crux of the issue. I think it would not. IMO the spirit of copyright law protect certain essential elements of a work being reproduced identically or nearly-identically (1), but not a composition that is reminiscent of a general style of some other work, and certainly not some collective body of work. To me the source of the training data should not matter insofar that it does not matter if an artist studies a body of work to create pieces in the style of X. Does it concern copyright law that an artificial rather than biological neural network has been trained on a body of work? Would a copyright judgment ever be made on the basis that an artist studied too much of another person's work, and not simply the similarities between the piece they created and another specific work?

It is against copyright law to release either a computer generated lowfi cover or a human generated violin cover of stairway to heaven. On the other hand you could go and buy an amp, guitar, pedals, etc. to perfectly recreate a classic 'jimmy page sound', and whatever you compose would be fine as long as the song does not borrow specific melodies found in led zeplin songs. See any Greta Van Fleet song. Moreover, that GVF reminds everyone of led zeplin is no coincidence. From wikipedia: "Greta Van Fleet is often compared to Led Zeppelin. Jake said he went through a year of really intensely studying what Jimmy Page did to the point I knew how he thought."

At the end of the day I think there is no basis to claim a copyright over a general "style", and that's a good thing.

(1) https://www.briffa.com/blog/classic-copyright-cases-ice-ice-...


>> Does copyright law protect “in the style of”

Well, pastiches are a thing, and I've read and watched a few. Never heard about anyone getting into trouble for them.


> Copilot itself is a commercial product that is built a body of training data, even though it is completely different from that data. It’s clearly “transformative.”

Is it, though?

The article wrestles with the notion of the gap between "idea" and "expression." To me, I wonder if this is the same gap. The training data is equivalent to the "idea," and the output of using that training data in a particular way is the "expression."

In this view, the result of your training isn't transformative, and it might not even something you can claim copyright over. What is it other than a particular arrangement of facts that have been feed into it? Merely adding weights in a highly dimensional space does not seem "transformative."

This article feels like it's wrestling with the wrong side of the problem.


> Copilot itself is a commercial product that is built a body of training data, even though it is completely different from that data. It’s clearly “transformative.”

I don't buy this, given that NNs can and do encode data from their training sets in the network itself, similar to compression, and they can regurgitate that code when given the right incantations. They can also encode "the heart of the work" when it comes to algorithms from copyrighted work being recycled in the NNs output.

And I don't buy that it's transformative, either, given that it's taking someone's code as input and just outputting more code. The purpose is literally the same, building software for profit. It's not like the code is being used in an art project, it's being used for the same purposes that the copyrighted works were created. This is not like making a collage out of magazine cut outs, it's like taking "People" magazine's content and rearranging it and selling it as "Persons" magazine for profit.

When it comes to fair use, transformative derivative work is not the only factor considered. Another factor is the purpose and character of use, especially whether or not the derivative work is used for commercial purposes. Code regurgitated by these models are used for the same thing as the code they were trained on: building software commercially.

Another factor is "the effect of the use upon the potential market for or value of the copyrighted work". It's beyond obvious that laundering copyrighted code for commercial use affects the potential market for and value of the copyrighted work.


I feel sympathy for the artists, and who knows what jobs AI will leave!

But I don't think the artists I'm seeing on Twitter understand that, even in their best case, where training on a copyright image is considered total infringement, this genie still isn't going back in the bottle.

Models are getting better and training more efficient all the time.

Soon you'll be able to train a model entirely on public domain content and have great generative art output.

It might be harder to express the style you want than just sharing a name of an artist in that style, but that won't hold things back.

With this worldview the discussion about whether training on something is infringing its copyright fast becomes irrelevant, for better or worse.


Absolutely true. It won't be long before models trained on purely public domain works will be able to oneshot style transfer from a piece of art fed to it.

We need IP law reform and UBI/strong social safety net, the problem isn't automation, it's that we allow people to own the automation and charge rent on it.


This seems to be rather US-centric. Here's the EU version: https://copyrightblog.kluweriplaw.com/2021/02/17/tdm-excepti...

Singapore version: https://www.twobirds.com/en/insights/2021/singapore/coming-u...

Other countries have already made copyright exceptions for AI training.


I'm not sure I buy this line of reasoning about "inputs" rather than "outputs". If there is some prohibition about using an image as an input, regardless of whether any vestige of that image exists in an output, doesn't that equally prohibit using an image to train neural network that just says whether an image is a cat or a dog? Or how about a network that just tries to denoise a photo from your camera?


Ends up worse then that: 8a707ed2cb8e1aa2e54c1b68d78e76b51d05f0a23c8a8b9d2f7f04aca01b3a96

This is the SHA256 of Captain America - The Winter Soldier. Pulled from my Blu-Ray copy of it. It can only be produced (in the current universe) if you have access to the complete data from the Australia Blu-Ray release of that movie.

So to create that number, every single frame and pixel of the original movie was included in some way.

Is the number itself copyright? After all, every byte of its input definitely was - there's no way to produce it without those bytes.


> Is the number itself copyright?

For a work to be copyrightable (including derivative works), there needs to be a quantum of creativity. Since there's no creativity in the process of constructing a cryptographic hash, there's nothing copyrightable there.


A good rule of thumb is if the derivative work is drawing sales away from the original work. Nobody is buying Captain America - The Winter Soldier to calculate the SHA256 so the movie studio sustains no losses from it being shared, so it shouldn’t count as copyright infringement.


> A good rule of thumb is if the derivative work is drawing sales away from the original work.

That's a horrible rule of thumb. The canonical example of a derivative work is adaptation--the film of the book, for example--which frequently increases sales of the original work.

Drawing sales away from the original work is one of the factors used in fair use, and frequently ends up being the most important one.


You already get protection for characters and stories so it's not like ai will destroy publishing and it's not like existing content is not protected already.

Copyright doesn't protect skill, as it doesn't protect algorithms, because these are tools to create and not finished products.


Isn't this just the riff sampling thing? So depending on the output you could be infringing.

Like how Vanilla Ice "stole" Queen's bass line.

https://en.wikipedia.org/wiki/Sampling_(music)


Which is ridiculous. You can't copyright 8 notes of music. This is the festering disease we need to rid ourselves of.


Robin Thicke wasn't exactly a sympathy-inspiring celebrity, but the Blurred Lines case was another egregious example.

As someone who grew up in the 80s and 90s, I also really wish we'd gotten the explosion of almost completely sample-based music we were starting to see with bands like the Beastie Boys, the KLF, and Pop Will Eat Itself. The Biz Markie sampling case basically shitcanned an entire nascent subgenre.


This comment made me want to listen to "Can I Kick It?" by A Tribe Called Quest.

https://www.youtube.com/watch?v=D-uV8TGjaGU

Which sampled the guitar melody from Lou Reed's "Walk On The Wild Side."

https://www.youtube.com/watch?v=oG6fayQBm9w

And the "Rebel Fanfare" from the Star Wars Theme composed by John Williams.

https://youtu.be/_D0ZQPqeJkk?t=278

Which sampled this part of "The Sorcerer's Apprentice" by Paul Dukas. (Okay, not quite but it's easy to guess the inspiration.)

https://youtu.be/wneUNq_Ndbw?t=495


I didn’t realize until I thought back over music history and the Biz Markie case that hip hop’s East Coast “jazz and chill” phase that TTQ, De La Soul, and other Native Tongues artists brought to the table probably resulted directly from samples going away.

Without samples and sequencing being the primary production technique, suddenly real instrumentation and organic musicality became the thing. Hip hop started building on covered riffs and musical passages more than straight up samples (I’m guessing the licensing was easier).

That also gave us Check Your Head from Beasties, as the follow up to the pioneer sample-laden album Paul’s Boutique, where they famously played their own instruments again and became even more popular. The Native Tongues scene also gave us Prince Paul/Dr. Octagon, which gave us Deltron 3030 and Dan the Automator, which gave us Gorillaz, which…

There really was a lot I suspect would’ve gone quite differently without that court case.


Damien Riehl and Noah Rubin have all the melodies on a hard drive. (More info in this Adam Neely video: https://www.youtube.com/watch?v=sfXn_ecH5Rw and the TED talk linked in a nearby comment.) So depending on a court's kinda arbitrary definition of creativity on any given day, all of them may or may not be copyrighted already.

An infringement being accidental also doesn't seem to stop copyright holders from successfully suing people. I think this means you could technically be successfully sued for dropping some bricks on a piano, since no matter where they land, the action constitutes a public performance of a copyrighted work. Fun stuff.


Which is most certainly not the original intention of copyright, if we are to even give the concept credence. I'm of the opinion that the entire system of western copyright law is absolutely broken. You definitely can't infringe on copyright by dropping bricks on a piano, in the world with a flourishing creative arts scene, anyway. Something went terribly wrong.

PS: I also second the Adam Neely video. I was actually wondering if someone would link it (:


Thanks for your note about our project! To learn more about the ~471 billion melodies we created — then copyrighted — then put in the public domain (to protect songwriter "you stole my melody" defendants), more information is here:

http://allthemusic.info/faqs/


Well, now I'm disappointed. I was going to point to this TED talk (https://www.ted.com/talks/damien_riehl_copyrighting_all_the_...) but it appears to have been removed as the video no longer loads for me. However, this (https://www.ted.com/talks/damien_riehl_why_all_melodies_shou...) appears to be an abridged version of the same talk.

The disappointing bit: I say it's an abridged version because I distinctly remember him talking about how he actually claimed copyright for all 8-note melodies under a permissive license, which I can't find in the one that actually loads. He "brute forced" every 8-note melody with a program, saved them to a disk and claimed copyright under (I believe) the MIT license. (I can see legal issues with doing that, of course, so it's not hard to imagine possibly why the video was replaced with a different one.)


What's the threshold you can copyright? 9 notes? 16? Why is that the threshold and not 8?

I may not agree that it should be 8, but I don't see any rigorous or well-reasoned model here to explain why "you can copyright 8 notes" is a festering disease when it seems like people are not reasoning through why something should or shouldn't be copyrightable.


Existing laws were written before this technology existed.

Copyright isn't a law of nature, but a tool to balance needs of creators and everyone else. With this tech the power has definitely shifted, so the existing law may not be adequate any more.


why does society and technology have to constantly adapt to ancillary legal requirements, while the laws themselves rarely adapt (e.g. never expire)?


Creating a new technology and using it doesn't require the consensus of an entire society. Changing the law does, and so takes forever by comparison.


A good piece that express a balanced point of view on the question, even if I'm not convinced by the conclusion and solution proposed


I suspect we'll see "trade dress" lawsuits, because copyright doesn't apply. Probably from Disney.


It should say that the notion of intellectual property is nonsensical.


>But how much of a song or a painting can you reproduce?

The reason why fair use is vague is specifically to confuse people who ask these kinds of questions. The Supreme Court needed a tool that artists could use to legally smack down people who republish fragments of other people's work, but didn't want to abolish the 1st Amendment in the process. So basically judges have the final say as to whether or not something is novel creativity or in debt to the original. Any hard-and-fast rule beyond "binding precedent applies" is effectively copyright abolition by degrees.

>We lost most of Elizabethan theater because there was no copyright. [..] Without some kind of protection, authors had no interest in publishing at all, let alone publishing accurate texts.

This is a dated example, if only because creative works leave a lot more evidence now than they used to. People today will act to preserve art against the artists own wishes and at great personal risk.

>and it’s easy to suspect that the actual payments will be similar to the royalties musicians get from streaming services: microcents per use

Given the amount of data these systems need (read: more than humanity can provide) I'd say microcents is arguably too high. Remember that you can't actually derive a clear chain of value between one particular training set entry and one particular execution of the model. It's all chucked into a blender that runs on almost-linear algebra and calculus. At best you can detect if parts of the image resemble specific training set examples[0] and pay people slightly more if the model regurgitates training set data.

Let's also keep in mind that a good chunk of the licensing system is based on being able to say no to specific users, or write very tailor-made licensing agreements for specific works or conditions. That's still going to be threatened, even if we can pay sub-Spotify-tier royalties every time a model trains itself on your work.

>It is easy to imagine an AI system that has been trained on the (many) Open Source and Creative Commons licenses.

Working on it: https://github.com/kmeisthax/PD-Diffusion

The thing is, we already have a good database of reusable, public-domain, no-attribution-necessary images; it's called Wikimedia Commons. I really can't fathom why OpenAI didn't start there, other than just an assumption that they were entitled to larger datasets or a feeling that they could get established before anyone sued.

Even then, OpenAI already tried this with computer code and they're getting sued for it anyway, because they never bothered with attribution in the case of training set regurgitation.

[0] This is possible because part of the prompt guidance process involves a thing called CLIP which can do both image and text classification in the same coordinate system.


Just an FYI but your link 404's. I assume it is a private repo.


Aww dang, I can't believe I forgot to publish it




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: