Hacker News new | past | comments | ask | show | jobs | submit login
An unwilling illustrator found herself turned into an AI model (waxy.org)
726 points by ghuntley 86 days ago | hide | past | favorite | 751 comments



I'm thinking a little bit of empathy doesn't hurt. Reason from Hollie's point of view. She didn't ask for this and was working on cool stuff:

https://holliemengert.com/

Next, somebody grabs her work (copyrighted by the clients she works for), without permission. Then goes on to try and create an AI version of her style. When confronted, the guy's like: "meh, ah well".

Doesn't matter if it's legal or not, it's careless and plain rude. Meanwhile, Hollie is quite cool-headed and reasonable about it. Not aggressive, not threatening to sue, just expressing civilized dislike, which is as reasonable as it gets.

Next, she gets to see her name on the orange site, reading things like "style is bad and too generic", a wide series of cold-hearted legal arguments and "get out of the way of progress".

How wonderful. Maybe consider that there's a human being on the other end? Here she is:

https://www.youtube.com/watch?v=XWiwZLJVwi4

A kind and creative soul, which apparently is now worth 2 hours of GPU time.

I too believe AI art is inevitable and cannot be stopped at this point. Doesn't mean we have to be so ruthless about it.


There's nothing inevitable about it. Laws exist to protect people.

It's a bit like saying we can't stop music piracy, now that Napster exists.

Remember the naive rallying cry among those who thought everyone should have the right to all music, without any compensation for the artist?

"Information wants to be free!" (https://www.theguardian.com/music/2013/feb/24/napster-music-...)

Napster was a peer-to-peer file sharing application. It originally launched on June 1, 1999, with an emphasis on digital audio file distribution. ... It ceased operations in 2001 after losing a wave of lawsuits and filed for bankruptcy in June 2002.

Use of the output of systems like Copilot or Stable Diffusion becomes a violation of copyright.

The weight tensors are illegal to possess, just like it's illegal to possess leaked Intel source code. The weights are like distilled intellectual property. You're distributing an enormous body of other people's work, to enable derivative work without attribution? Huge harm to society, make it illegal.

If you use the art in your product, on your website, etc., you risk legal action. Just like if I publish your album on my website. Illegal.

The companies that train these systems can't distribute them without risking legal action. So they won't do it. It's expensive to train these models. When it's illegal, the criminals will have to pay for the GPU time.

It will always exist in the black-market underground, but the civilized world makes it illegal.

That's where this is going, I hope. Best case scenario.


>It's a bit like saying we can't stop music piracy, now that Napster exists.

We effectively didn't, though, at least as far as artists are concerned. Streaming revenue is abysmal for artists. https://www.latimes.com/entertainment-arts/music/story/2021-...

Piracy made music acquisition too convenient for the consumers, so an alternative had to be created - but this alternative really only helps the labels, and not the people actually making the music.

It's not clear to me that the streaming world is better for artists than the Napster one. At least anyone wanting to legally listen to music then would buy albums, rather than just having a spotify subscription. Not that royalties on physical CDs were great, but my understanding is they did work out better for most artists than we see with streaming royalties.

I don't know what a potential analogy would be here with stable diffusion or dall-e or whatever, but I don't know that people were able to immediately identify the potential downsides with "winning" against piracy, either.


> We effectively didn't, though, at least as far as artists are concerned. Streaming revenue is abysmal for artists.

But that's not Napster's fault. Spotify pays a lot of money for playing of a song, of that the artist only sees a tiny percentage due to music middlemen trying to relive the 90's.


Sure.

And that's why I buy music off Bandcamp whenever I can, and thankfully most of the music I listen to is on smaller labels, so usually even more money goes to artists.

I'm just saying that the solutions that pop up once you "win" are not necessarily ones that provide a win for the people you are trying to protect.


Spotify pays a lot of money for playing a song? Out of a $10/month fee, that clearly can't be true.

If P2P was somehow impossible, we would still be buying DRMd tracks for $1 each on iTunes. Spotify would at best, cost $80/month.


> Spotify pays a lot of money for playing a song? Out of a $10/month fee, that clearly can't be true.

Spotify take in €9.5 billion and pay out 70% of that as fees to record companies who take their slice before giving the artist nothing.

I would consider that a lot of money, even if you disagree.


I distribute my music through CDBaby, and looking at transaction history I've been getting $3.65 per thousand streams. That's not nothing, and is much higher than I'd get from radio.

Spotify is taking in a lot of money and paying 70% to labels, which adds up to a lot of money for artists depending on their agreement with their label/distributor. But the per stream rate is still very low because there are trillions of songs streamed annually.


I'm sorry, I don't want to sound dense but I'm not clear what your point is.

Are you saying that CD Baby is a better distribution technique than standard labels because you get good margins? I didn't know CD Baby until I just looked them up but they appear to be a distributor, so your $3.65 metric is still being paid by Spotify/Amazon/Apple. Please correct me if I'm wrong but that is much higher than the normal published numbers by 10-100x.

Is this an RIAA moment where labels are trying to make other people look like jerks rather than accepting what they do, or are people using the "per 1000 streams" poorly because they will always look worse on successful platforms?


> that is much higher than the normal published numbers by 10-100x.

$3.65 per thousand is at the low end of what I see elsewhere. For example https://twostorymelody.com/spotify-pay-per-stream/ has $3-$5 per thousand.

> I'm not clear what your point is.

I think the distribution of streaming revenues is generally reasonably fair, and people who say things like "Spotify pays artists nothing" are confused about either (a) how much money there is to divide up or (b) where it is going.


You are right on those numbers, I miscounted when I looked at those numbers.

I'm glad that there is someone creative on here that can educate and confirm.


The point is that 1/3 cents a play isn't a lot. Argue against that claim if you like, but engage with it instead of using weird rhetoric to avoid it.


I don't think that was the point that anyone was making.

In fact Jefftk has stated the exact opposite to your position.


Anything is a lot of money if you offer no basis for comparison. A million dollars seems like a lot until you say that it's what you paid to build a downtown skyscraper.


The math here makes a flawed presumption: that you play a song only once after buying it from iTunes.

I obviously don't know your listening habits, so for you that may be the case. But people will listen to a single song far more often than once. Or otherwise there'd had to be 77.946.027 new users on spotify[1] last month all playing Ed Sheeran once. Clearly nonsense.

If you play every $1 iTunes song eight times on spotify, the costs (and therefore fees) will be on par: $10/month.

[1] https://open.spotify.com/artist/6eUKZXaKkcviH0Ku9w2n3V


70% percent of Spotify revenue goes to the artists (content owners to be precise), I doubt that was the case when you bought a CD (I have found numbers closer to 40%). It is not abysmal, revenue never seemed to be better.


> 70% percent of Spotify revenue goes to the artists (content owners to be precise)

That's pretty important difference. If it is going to recording company that pays pittance to artists no wonder "streaming revenue is bad"


From the article I linked:

>The actual recording artists? “They’re keeping anywhere between 5% and a quarter.”

That is a far cry from 70%


That is not Spotify's fault. It is a deal made by artists that splits revenue between them and producer, record label etc.

70% of Spotify's revenue is to be split that way compared to ~40% of revenue of CD sales.


It doesn't matter if it's Spotify's fault - I'm not saying they are the evil empire. I am saying that streaming is how we "beat" piracy, and it was not a panacea for the people it was supposed to protect - unless we consider the labels the people it was supposed to protect.

You're also comparing apples to oranges on revenue. 40% of the revenue of a $10-$18 CD sale is a lot different than 70% of a $10/mo subscription being split out over however many artists someone might listen to on spotify.

https://www.nytimes.com/2021/05/07/arts/music/streaming-musi...

Lots of artists talk about how they simply can't make a living off streaming royalties - artists that were able to do in the era of album sales. Obviously any sort of comfortable living requires merch sales and touring.


Comparing artists that could make record sales earlier to all current streaming artists is comparing apples to oranges.

I agree that Spotify's revenue split is not perfect (it is in fact worse than what you described), but it is still much more fair than record sales. I consider having tape/CD record sold in physical shops (i.e. not during concerts) would be a success on its own in the 90s.

Now every artist can publish their work on Spotify and start to earn money, possibly getting noticed through it. It is much feasible to not be a part of record label now.


>unless we consider the labels the people it was supposed to protect

You've hit the nail on the head there. The same record companies that got up to very dodgy copy protection schemes [1].

[1] https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootk...


But even taking those %s at face value: it's 40% of a much larger amount of money (a whole CD versus some streams).


When considering revenue of 1999 as 100% CD sale and 2021 as 100% streaming we got something like (in inflation adjusted $bn).

(1999) 40% of 23.7 = 9.48

(2021) 70% of 14.9 = 10.43

So while gaming got much more popular during that time, music still brings a lot of revenue to the artists thanks to the streaming.

(1) https://www.statista.com/chart/17244/us-music-revenue-by-for...


> The weights are like distilled intellectual property. You're distributing an enormous body of other people's work, to enable derivative work without attribution? Huge harm to society, make it illegal.

The thing is that you're distributing only the instructions for making other peoples' work. There are art books and articles explain the styles of certain artists and what techniques they use to achieve it; you could probably recreate "the Mona Lisa but in the style of Marc Chagall with cool glasses on" with real paint if you had previously stared at both the Mona Lisa and Marc's art for hours at a time. Are you infringing upon either of their copyrights by combining them? Probably not. But if you just recreated the Mona Lisa after having stared at it for hours, and it turned out nearly identical, then it would be. So where is the line?


Well think if you took the left side of Mona Lisa and combined it with Van Gog's Starry night? Would that be ok? Of course not.

But if you took 100 paintings from 100 artists and clipped all into small jigsaw pieces then recombined them randomly to create a new picture, that would probably not be considered "derived art".

What matters is how much creativity you put into the process. Does the creative esthetics of the work derive from your efforts, or from those of the existing author's existing art.

But if I took 100 paintings from a single artist and combined them all into new works of art, that would probably be copyright infringement in my view, and is what seems to be happening here.

Consider the history of trademark lawsuits. You can be sued if you create something that somehow resembles an existing trademark, say use the exact same color as Coca-Cola for something similar.

So I think the guiding principle is or should be whether what you create can be confused with the work of some other highly original artist. It doesn't matter if you painted it all, if it looks similar enough that people could confuse it with the original artist's work, you are infringing.


>But if I took 100 paintings from a single artist and combined them all into new works of art, that would probably be copyright infringement in my view

What makes you say that? A work is considered derivative with regards to another work, not to an author. If we take your jigsaw example and accept that the final result would not be derivative of any of the works that contributed each individual piece, and then pretend as if in actuality all the sources were from the same artist, what would change that would suddenly make the result derivative from some or all of the original works?


You are probably right there, I was just assuming that courts would consider it a factor if all pieces came from the same author.

As I understand it copyright infringement is not just a pure "crime" in itself. It is about the financial harm caused. I think the word they use is "tort". It is always about violating somebody else's right(s).

Oracle sues Google for Java copyright infringement. It is not just about "Hey here's a copyright infringement ... put Google in jail".

It is about "We lost a billion dollars, because of your infringement". So Oracle claims not just a single copyright infringement of a single work, but billion dollars worth of infringement. It is not black and white, it is quantitative. How much there is of it determines the seriousness of the violation.

I'm not a lawyer of course, don't take my advise.


That's kind of hilarious because ALL artists copy/imitate other artist' styles during their learning process before settling into a style all of their own.

The number of people who learned to draw by redrawing/imitating Disney stuff is countless.

The thing people aren't seeing with AI art is that it's the same as mass manufacturing, compare: buying a mass produced knife vs buying a handmade artisanal knife. I think exactly the same thing applies; generating machine made art in a given style vs buying/commissioning an artist.

I think taking someone's work to train an AI is fine, as long as you obtained legal access to the material in the first place. There is no copyright for art styles, if there was we would have no artists because even this artist in question would've started out by imitating other artist' styles.

As an update after taking a closer look at the article rather than the discussion: her art style is 100% inspired by Disney (and a few others) and there is nothing wrong with that.


It seems very strange to use the existing rules of copyright as a defense of the use of this new technology.

The concept of copyright was created in response to the development of the printing press. It was a reaction to a disruptive technology. It was possible to laboriously copy written works before the printing press existed, but the new technology made it incomparably cheaper and faster to do so, and societies reacted by creating new protections for content creators.

We are now at the threshold of a new disruptive technology that is likely to bring about profound economic changes in the arts. It makes no sense to me to take the old rules and try to use them to justify this disruptive technology, when the old rules were initially created in response to a different disruptive technology.

It seems uncontroversial that this new generative technology is built on the backs of human artists. It only functions by drawing from their works. Is it so unconceivable that we might need a totally new set of protections for those human artists?


It is true that generative ai technology is often trained on human artists' work. But how is that different from human artists taking inspiration/learning and adapting the style of other human artists? I suppose the argument is that humans should get special treatment in the copyright domain?

I wonder if it is possible to get a machine to learn a style without input. Likely a room full of typewriter monkeys searching for Shakespeare scenario, but a human would still be involved in the loop to "confirm" the desired style - which is technically a creative decision in itself.

Which I guess shows the true nature: machines could generate stuff for machines without any external input. But we built them, so we've tasked machines to generate stuff for humans. And therein lies the answer I guess.

I 100% believe machines can be creative. Creativity isn't something unique to humans or to living things. For me it's a concept.


>It is true that generative ai technology is often trained on human artists' work. But how is that different from human artists taking inspiration/learning and adapting the style of other human artists?

It's different in the same way that making a copy of a book by hand, where it might take weeks or months to make a single copy, is different than making a copy with a printing press in a few minutes. It was the technological development of the latter process which lead to the concept of copyright being created in the first place.

There is a fundamental difference between a human being taking years to acquire artistic skill, then using that artistic skill to create individual works inspired by other artists, vs. using a generative AI system to "learn" a particular artist's style in a minutes or hours, then create infinite iterations of that style nearly instantly.

There's a tendency for people in tech to search out broad, overarching, universal principles that can be applied to all behavior. But sometimes, simply being able to do something tens of thousands of times faster or tens of thousands of times more cheaply is enough of a difference to require new rules, new moral frameworks, new modes of thinking.

"The computer is just doing what a human could do" simply isn't a compelling enough argument, any more than "the printing press is just doing what a scribe could do" would be.


You raise some interesting points.

> The concept of copyright was created in response to the development of the printing press. It was a reaction to a disruptive technology.

Absolutely, one of the major factors was that it allowed individuals to benefit directly off someone elses work without having made substantial changes. The protection was intended for the original works it self and derivatives too close to the original content.

> We are now at the threshold of a new disruptive technology that is likely to bring about profound economic changes in the arts.

This already happened with photography taking over portraits and tracing, the response wasn't to outright ban it, or really prevent it either. When technology made photography more accessible, to the point it was going to be disruptive to professionals in the field, the response again wasn't to outright ban it, or really prevent it either. This is despite the fact that it has litterally destroyed a significant amount of jobs to achieve conviniences that we now all enjoy.

I feel like the AI issue is a parallel to above situation. People are now given better tools to generate/create art themselves and as long as it isn't blatant copies, derivatives too close to the original content, it probably should be have similar rules in my opinion.

> It only functions by drawing from their works.

You can train AI models by taking photos and then vectorizing/toonifying/paintify etc. depending on what you're aiming for with various wildly available non-AI filters. Stylistic ideas are possible to implement into these filters, I have some experience having done so with making plugins for processing my photos. So, that isn't even a strict requirement for generation. So, even in the case where you ban AI from learning from people made art (even in the situation where they would allow it), there are ways to still train the AI models regardless to achieve a similar result.

There is another problem that hasn't been discussed, enforcement is going to be a very interesting problem considering how international borders for information/data are virtually non-existent now and it's becoming relatively difficult to even distinguish if a piece was generated by an AI or by a person. The economic changes are likely coming in regardless from my point of view. It's going to be either people are using it illegally if banned regardless or people using it legally if it isn't -- I just do not see this changing either way.


> Taking someone's work to train an AI is fine, as long as you obtained legal access to the material

That is the big question here, what kind of legal access does Copilot etc. have to the training materials. When they use the training material they must copy it to their computer. According to most open source licenses they then also have to retain the copyright notice to wherever they copy it. But now it seems that Copilot skips that part. It copies everything else but not the copyright notice.


You can trademark a style, you can't copyright it. IANAL but that is what my corporate IP compliance training tells me. As long as am regurgitating non legal advice, I suspect half mona lisa half starry night might be considered a transformtive work. If a single human artist painted both perfectly onto the same canvas it could be construed as a statement about changes in the culture between the two contexts, so if you do it with Photoshop, it might very well get ruled the same way.

As for the morality of it? I don't like the idea of copilot replacing me, but I don't think it was wrong to make it. I'll eventually have to retrain myself to retrain copilot models I suppose. Or we'll have to decide to care for each other as we all go unemployed.


If you gave the co-pilot the license to copy your code, part of that license is they have to include your license and copyright notice in every derived work they make.

And Copilot, it doesn't just copy "style", it copies code.


Andy Warhol did something very similar to this with Avril Harrison's computer illustration of Venus. He just used the clone tool to add a third eye then called the result his own work. It even still had her signature on it.


>But if I took 100 paintings from a single artist and combined them all into new works of art, that would probably be copyright infringement in my view, and is what seems to be happening here.

If I look at 100 paintings by Pablo Picasso and then paint a new one in his style, did I commit copyright infringement?


That's a good question. My immediate answer would tend to be no.

But consider you produced a comic-book about Mickey Mouse where the character Mickey Mouse looked exactly like the one in the several Disney books and movies. You would probably get sued. Right?


Trying to take a strong form of OPs position, one obvious line would be the automation and mass reproduction aspects, in addition to how much unique creativity you specifically added to the process.

It gets of course harrier because what happens to experts in the field who are able to do that and then just use this as a boosting tool. Still, I don’t think copyright law has clear bright lines so much, but more guidance that the courts just try to muddle through as best they can. Certainly one can make an argument that just stealing an artists style like this could be considered a copyright violation, just like sampling even a few seconds of someone else’s track can be in music.

Again, not saying these are net good for society, but clearly existing copyright laws do try to take this tack. I think the one thing working against her favor is that a) early days so laws haven’t caught up b) us laws generally favor corporate interests over individuals so she might never get any relief even if deeper pockets start to protect themselves as this becomes a bigger problem for them.


The process for a human to copy her style in an original work would be similar, and legal. I don't think it's a good idea to prevent the automation of human-capable tasks, because it's anticompetitive: it protects an industry (albeit a small one of starving artists) at the cost of consumers.

The harms to artists are obvious and immediate, but limited and small. The benefits of letting an ML model train in the same way as a human are vague and in the future, but might be capable of massive transformative changes in the way we work. I think it's right to be careful about "protecting" a limited number of people at the cost of enormous future potential.


Enormous future potential for derivative work gets created. Enormous future potential for original work gets erased.

Why would anyone in their right mind choose to put effort into creating original art if there is "one easy trick" to get around copyright by simply turning their art into a model that can be used to churn out things they could have produced?


>Why would anyone in their right mind choose to put effort into creating original art if there is "one easy trick" to get around copyright by simply turning their art into a model that can be used to churn out things they could have produced?

Why do some artists still paint on canvas instead of using photoshop or krita, where you can easily ctrl+z any mistake, never need to mix any paint, can move layers up and down, etc. etc. etc.?

Why do some photographers still shoot anything smaller than large format with film when medium format and full frame digital cameras exist?

Why do some people still use analog synths when Native Instrument's Komplete exists?

Why do some guitarists still use amplifiers when they could use an AxeFx/Kemper/Neural DSP?

Most of those options are also more expensive, on top of being more inefficient/difficult/generally burdensome, yet people still do them.

People do a lot of things that are not necessarily the most efficient way to do something. They like the minor differences, or enjoy the process, or many other things.

I also don't see how SD and similar get around copyright. Even if training these models on copyrighted images is legal, that doesn't mean that the output they produce necessarily is. It doesn't matter how I create a depiction of Iron Man, be it SD or a paintbrush and canvas, I do not have the rights to reproduce him. And for things that can't be protected by copyright, such as style, I am not hindered by it no matter if it is created with SD or colored pencils on a sketchpad.


If you think about future business cases, my guess is if I'm in the content creation business I'd hire some artist to create inputs for my ML model to train. And I'd be the only one with access to these inputs (in the beginning). Or think about it the other way around. If I'm an artist I buy a commodity AI-art-generation-engine and feed it with my work and I can create infinite items in my own style for (digital) sale.

It'll all be about time to market and brand building. I could even see a world where the originals of the input creator would sell quite well as classic modern artworks. Imagine for a second a world where 3D assets get created this way. I'm pretty sure fans of popular games would shell out good money for originals from "the artist behind the Witcher 7 asset engine" if the trajectory of human development goes as I see it going.

Also...artists are going to create art no matter if it makes financial sense or not. In fact I'd argue that's the difference between art and design :P


> if I'm in the content creation business I'd hire some artist to create inputs for my ML model to train

That's a reasonable way to go about things. The problem is that right now the status quo is that you just take artists' work without their consent and use it to train your model.


Because they want to do it? The motivation for creating art isn't purely financial.

Plus, we humans all built our skills and works on the shoulder of giants. Artworks and cultural artifacts are never created in a vacuum. Maybe it's time to acknowledge that.


> The motivation for creating art isn't purely financial.

Yeah, but getting financial compensation can certainly help. The opportunity cost of putting bread on the table means that the output of most professional artists today would drop significantly, if they needed to pick up another profession (especially full time).

> Plus, we humans all built our skills and works on the shoulder of giants. Artworks and cultural artifacts are never created in a vacuum. Maybe it's time to acknowledge that.

You're acting as if artists don't already acknowledge and understand this. https://www.muddycolors.com/2017/12/some-thoughts-on-master-....


Financial compensation does help. But certain industries become marginalised or relegated to history given enough time. People then keep them alive because they choose to.

Where are the tears for horseback couriers? Or blacksmiths? Or thatchers?


That just proves the original poster's point, that the potential for future original work will be erased.

How often are we seeing innovative advancements in the field of horseback couriers, blacksmithing and thatching nowadays?


I guess you didn't get my point which was: those industries died apart from specialists keeping them alive today and that's just the nature of the world.

The same thing will happen to human generated creative content whereby it becomes something that people are involved in because they want to be, not because it's a necessity/it's the only way to do it.

Yes the potential for future art work done by a human today will be erased in the future when it can be performed by a machine, but that has always happened & yet somehow it's surprising to people.

An artist being indignant towards machine generated art yet using mass produced tools, eating food farmed by mechanised equipment, wearing clothing woven by automatic looms, taking a digital photo themselves instead of hiring a portrait painter, owning a car instead of a horse that supports many sub-industries, sending emails instead of letters is just hypocrisy.

Technology has always brought us forward and these new AI powered tools will assist us as the tools we produce have always assisted our species. And as always those who refuse to change will eventually be left behind.

And yes, if this was happening to the industry I'm in I would currently be going through the 5 stages of grief about it, too. But then I'd just have to change up what I'm doing to reflect the changing times. As she herself said, it still doesn't capture what she puts into her art & so there is still that avenue to pursue.


That's begging the question. I don't agree that a model is one easy trick to get around copyright, any more than paying another animator to draw in the same style would be.

In terms of creating original art, I think that in ten to twenty years artists will see models as another tool for creative expression; one that lets an individual artist be more productive but can produce a generic feel, like thin-line animation or sticking difference clouds everywhere or using a palette of pre-made drag and drop body parts.


People will still put effort into creating original art in styles that don't yet exist.


> The process for a human to copy her style in an original work would be similar

It wouldn't be similar at all. It takes years to get skills good enough to even copy stuff like that. With AI person who never did any art in their lives can get hundreds of copies in few hours.


The engineer stumbled onto the least sympathetic, least transformative, most obnoxious use case for the AI. He was trading on the artist's name, confusing people and even arguably devaluing her work by reproducing it in a clumsy and low-value way. Folks in the industry would do better to acknowledge, as he did, that this was wrong and establish standards so everyone knows this is not considered a proper practice.


Taking longer doesn't mean it's dissimilar.


Here's why I believe it is inevitable...

It's already out there. On people's local computers and soon their mobile devices. People are tinkering with it at warp speed. This point addresses that it technically cannot be stopped.

I don't expect it to be possible to detect that the art is AI-generated. This becomes further impossible when using a personal input image as well as many follow-up edits or composite works. It blends into normal image creation. The only way to prove that it's not AI-generated is to record in-progress "human art" as is sometimes done in art contests, but this isn't reasonable to legally require of every single piece of art to be created.

You can't enforce what you can't detect.


As a society we have gone through great pains to protect the software developer's incompe and job, source code was given copyright and patent protection - no other industry gets both protections at once.

Now you seek to deny others such protection while taking advantage of it yourself.


Let's not kid ourselves. Those protections exist for the benefit of corporations, not software developers. If those corporations could have robots write the software and copyright that software and patent it, they absolutely would.


Ironically, the high salaries the software industries is able to pay is precisely because of the copyright protection afforded that prevents the value of the software from being diluted by way of rampant copying.

This is also the same reason due to which open source projects often struggle with funding, and why many databases (among other OSS software) are moving towards stricter licenses such as the AGPL.


I don't think so. The highest paying companies don't distribute software, they provide access to a remote service. Even if copyright didn't exist, you couldn't copy the Google executable.


> Even if copyright didn't exist, you couldn't copy the Google executable.

If copyright did not exist, you could take it as a google employee and start your own google without going to jail.


Pretty sure you could sue them using other means, such as through contract law, if they signed an appropriate contract. You really don't need copyright if you aren't broadly distributing information.


if Guy A copied your code, and i got it off him, you can sue him for violation of contract but you cant stop me, a third party.

Your ability to sue him will be limured, he cant go to jail, he can deckare bancrupsy, and you have to be spesific about what is protected, 'idea' is a vague term that cant be protectedm


You're making my point though. The fact that Google is successful, in part, is because of the fact that you can't copy their trade secrets and methods; which is one reason no solid competitor has come up to challenge Google.

(There are infrastructure challenges as well, but this thread is about intellectual property.)


My understanding is that trade secrets are distinct from patents. For patents, you tell everyone how to do it but they're not allowed to for 20 years. For trade secrets, you don't tell anyone how to do it, but if someone else figures it out for themselves, it's fair game. Most of Google's search IP is protected as trade secrets rather than patent/copyright, I believe.


Your point is that you can't copy Google's secrets because of copyright, and therefore copyright is valuable to its employees. I'm saying that the reason you can't copy that information is different from copyright.


Many (most?) software developers believe that software shouldn't be patentable at all. Many also believe that copyright terms should be way shorter.


They only believe this because this won't (immediately) affect their material prospects.


Or maybe they genuinely believe that the content they make today shouldn't be copyrighted for the next 100 years? Lifetime of the author +70 years is a very long time.


Given a binary choice between unemployability and extending patent protection to (even) snippets of code, I am quite confident that 90+% of salaried developers today will chose the latter.


thats not the argument we are discussing - the question is should these protections exist at all. We are talking about denying the artists all protection, only fair to confront developers with the same dialemma.

Whether they are 3 years or 300 is a finer point, and is only worth discussing after nessesity and legitimacy of sich protections is established


What a load of nonsense. The last 40 years has made the barrier of entry to software development lower than it's ever been.

We got applications like unity for game generation, low code solutions that let you generate a crud dashboard for a database in a few clicks, etc.

As a developer with an actual degree in comp sci, I can guarantee I'd be a helluva lot better paid if everyone had to do their software development in low level C.


That wasn't to protect software developers that was to protect companies & corporations. Twisting the law/politics is what capitalism has always had companies try to do.

Software developers aren't protected at all, we are simply just in demand for the time being. There will probably be a time in the far flung future where our jobs are phased out, too.


It doesn't matter what any of us think. The genie is out of the bottle and cannot be put back because unlike pirating existing media, this new style pirates the whole style.

Visual arts as a career will be dead soon. Visual arts as a hobby will live on.


>Visual arts as a career will be dead soon

Put a fine of 10% of annual turnover if a company cannot prove that images used in its product/ads are human generated. Make payment to visual artists part of the process of proving it. Boom, visual arts as a career saved.

It's one thing to say we shouldn't do it, it's quite another to say we cannot.


That's a profoundly stupid idea. There are low code solutions that generate fully compilable pieces of programs and applications, so let's make sure we ban those.

Plenty of music is procedurally generated so we gotta make sure we ban those as well.


What's profoundly stupid is to assume every piece of technology is good so we should do nothing about its proliferation. Yes, we should ban low-code tools based on deep learning over "fairly used" (not) datasets and we should ban the same for music, writing, whatever. AI bros can go cry in the corner, I don't care.


Think of it from a non-monetary way and ignoring job security for a moment. Why would anyone (artists/programmers) spend their time doing something a machine can do ? It would be a terrible way to spend ones life. Perhaps the datasets can be licensed (if that's the sticky point) and embrace the AI ?


>Why would anyone (artists/programmers) spend their time doing something a machine can do

Why do humans play chess after 1997? Why do they play checkers after the 70s?

AI capability is destroying human enjoyment of activities because it also destroys the economic rationale for engaging in them and/or allows other humans to cheat.

The obvious conclusion of this position is that we should just all kill ourselves if strong AI ever starts to exists. No, thank you, I'd rather do everything possible to prevent it from being created.


> The obvious conclusion of this position is that we should just all kill ourselves if strong AI ever starts to exists.

Indeed that is the logical conclusion. I expect some people advocating this line of logic to follow through


It's the difference between a bus and hiking: With a bus, you can arrive at lots of places fast but you will never experience the place as the hiker does.


With the amount of power the copyright lobby has, who knows it might be true. However advertising is not the only industry where visual arts will be affected. Gaming, Comics, Animation, VR, Movies etc will be affected as well. And regulation isn't going to keep up across all the countries.

Even if we assume the above rule is made, there is no way to prove it because even if a human does it, he might still use the help of photoshop etc which are planning to integrate such tools currently. Most favorable outcome I see out of this is for famous/competent artists to license out their style of art to these generation companies which then train their models on them (which sounds win-win but won't be as profitable as it is today)


>Even if we assume the above rule is made, there is no way to prove it because even if a human does it, he might still use the help of photoshop etc which are planning to integrate such tools currently.

It's unclear to what extend tool producers will support AI in this context at all. Also, this is a problem for safeguarding the integrity of the product of artists, not for safeguarding their income.


The companies that pay artists' salaries won't be willing to secretly break the law to save money.

Microsoft won't legally be able to continue its abuse of GitHub. Copilot will be dead. OpenAI and Stability will not legally be able to profit from large-scale intellectual property theft. All these violations will end.

These are the most significant digits. The residual amateur piracy doesn't matter. It doesn't matter if random guy gets some leet neural net warez and uses it to make his desktop wallpaper.


First, you didn't address my point that you cannot detect AI output.

I already have Stability running on my PC. I generate an image with it. How do you know the output comes from Stability? Answer that, please.

Second, a worldwide draconian ban on AI image training and generation just isn't going to happen. Very few legal things are coordinated worldwide, and copyright law is incredibly low on the list.

Even training without consent can be addressed. Google trained some of their AI from Google Photos. Which they made free for unlimited use so that us fools would produce billions of images, accept the terms we don't even read and voila: AI legally trained.


Just because you can buy a knife to kill someone discreetly, doesn't mean that we should just give up on the idea of making murder illegal.

Of course "killing someone" and "copying someone's artwork" are at different levels, however, I'm sure artists would beg to differ.


I don't think there is any work on that yet, but if the model is known it should be possible to derive the probability that a particular image is the output from it.


Ive produced over 5000 images on my local SD install. Currently, there are many dead giveaway if you produced the art with it. Specifically around hands, feet, holding things, pupil directions. Of course these things will get better with time, but currently there are many things that exposure generative art.


It's inevitable that someone comes up with methods to detect generated images, since a lot of political (Edit: and financial) capital hinges on that. If AI image generation is inevitable, then methods to analyze images wrt. known generators are even more inevitable.


> GAN

And your point is invalid in long term.


I've noticed there's a lot of newly created luddite accounts.

Reddit luddite brigade?


I'm sure I've still got a bunch of MP3 files from Napster on a hard drive somewhere, but yeah, that genie was put back in the bottle. This one can too.


It was put back in the bottle by Apple legitimating the piracy business model by cutting desks with the major labels to digitize their music.

The analogue here is AI art week be legitimized, and the artists who profit will be the ones who let their work be used as input for a cut of the profits. And nobody will be able to compete with them, and the owners of the machine will be able to set the profit rate as they choose.

... That does sound like a new stable equilibrium, actually.


And as we all know, music piracy over the Internet ended the day Napster stopped working. /s


How was it put back in the bottle? The Pirate Bay still exists. Bittorrent tools still exist. Name any song and I'm sure we can find it.


Napster was on millions of machines and now it isn't. It's like you skipped that part.


Napster, the centralized service, was stopped.

Napster, P2P sharing of music, was not.

Piracy is held at bay only by the ease & affordability of legally obtaining media and difficulty accessing the technical means to pirate.

... and well-packaged piracy solutions and modern broadband bandwidth likely sink the maximum price (the only remaining term) below the cost of production.

It took me 20 minutes to go from nothing to an entire season saved locally and streaming to a Roku. That's finding the software, installing, configuring, finding torrents, downloading, and then playing. And that's not having pirated in a decade or so.


Napster has single points of failure and future p2p had poisoned seeds for tracking.

Stable Diffusion is math and cannot be stopped now the toothpaste is out. You can attempt to regulate, assign draconian requirements by force of law, but ultimately these are as unenforceable as regulating that pi=3.

Ironically, what could help is NFT type tech. Signed with a private artist key, your copy is "original". Even if knockoff generative copies are produced, the digitally signed produced-bys are still authentic.


>Ironically, what could help is NFT type tech. Signed with a private artist key, your copy is "original". Even if knockoff generative copies are produced, the digitally signed produced-bys are still authentic.

That solves a completely different problem, though. I don't think anyone is saying that the problem is one of false attribution, where people are claiming generated images are the work of a particular person. What's being discussed is artists having less work because people generate art computationally rather than commission artists to do it.


Aye, and on your concern about the different problem, the toothpaste is out of the tube never to truly be returned.

We can evolve the market (in my view, into luxury goods with NFT type tech) or we can wait for artists to truly starve. I'm a proponent for solving the problem that can be solved to help folks move forward.


We can try to evolve it, sure. I don't think that's an option that will interest enough people to matter.

While it's possible that these AI tools will leave some (certainly not all) artists without work, what I think is really going to happen is that artists will harness them to do new things that were simply impossible before, or to make their work easier. Technology rarely destroys jobs; it more frequently changes their nature. Just like how at some points animators needed to know how to use 3D tools when in previous decades they didn't, in the near future graphical artists will need to know how to use AI. It's possible that where there were previously two artists working there will then be only one, but such is life. Demand for art is finite.


I worry that the ability to effortlessly conjure a "good enough" image may drown out efforts to thoughtfully create a great one.

Your comparison to computer animation is apt. Rewatching animated classics recently, I can see what we've lost now that every film is plastic.


I agree, traditional animation was better than modern CGI, but I don't think it's as simple as CGI being an inherently worse medium, but that films are produced more cheaply. Some weeks ago a friend and I were watching and comparing some scenes of Snow White and Cinderella in English and Spanish and were stunned by the singers in both languages. How often do you hear actual opera singers in modern Disney films?

So, yes, what you say may definitely happen, but it's a trend some graphical industries have been on for decades. It's why there are so many fewer professional animators anymore. I wouldn't be surprised if some techniques of traditional animation have been lost by now.


Yeah, sure, but that's because streaming was merely more convenient than Napster. No more downloading bad songs with bad metadata. No more lugging around and hand curating an mp3 library. And for a lot of people: no more having to choose what to listen to.

In two years people will be generating novel music of any style with any vocals and vocalists they want. That'll be even more of a fit to consumers' wants. They'll never run out of music that will appeal to them.

I'm currently working in this space and it's wild the things that are possible.


True, but the post-Napster media negotiations very much priced it in.

Nobody is paying $10 per CD anymore.


Looks like Taylor Swift's new release on CD is $14 at Target right now and $12 for digital download.

They're not paying $10, they're paying more.


According to this site [1], $14 in 2022 Dollars is $7.19 in 1995 Dollars.

So, about $3 less.

[1] https://www.usinflationcalculator.com/

(Who knows if that site is reliable though, the point is inflation.)


> It's a bit like saying we can't stop music piracy, now that Napster exists.

Curious choice of example, because it was never stopped. It just went somewhat out of the mainstream because the industry offered pricing options, like Spotify, who were acceptable for most people so they no longer had an incentive to resort to piracy. Not wanting to pay at all was always a minority position, most people just found it ludicrous to pay full album price for one of two songs that they liked.

And still, if you do want song X for free you can still obtain it easily. The industry just no longer makes a fuss about it.


> There's nothing inevitable about it. Laws exist to protect people.

Amen.

I think one thing we're going to have to look at is having the expectation of a separate agreement for having ones work go into a training set. Maybe equity should even be the standard here.

And informed consent associated with it. People need to know they're training something else to do their job as well as doing the job, selling the cow instead of the milk.


> There's nothing inevitable about it.

Everything you can come up with as a "solution" is really just a stop gap measure. Instead of specifying the style by name, you could specify it by example image. Instead of training the AI on her images directly, you could train a second generation AI on images drawn in that style generated by another AI that was trained on her images. Thus your second generation AI would be free of any copyrighted work. And of course the whole copyright thing only comes into play when people redistribute the AI. If AI is easy to train yourself locally, even that doesn't matter anymore.

If you want to go all Butlerian Jihad on the world, you might be able to stop it. As long as AI is allowed, this ain't going away, it's only getting easier, cheaper and faster.


For all we know, this could already be happening. Every digital image produced yesterday could have been AI-generated for all anyone knows. The original artist in this story could already be using AI to create their own work. Of course, I don't actually believe that's what's happening in this case, but the fact that it could means it's probably impossible to return this genie to its lamp.


It honestly sounds like you're making a good argument for a Butlerian Jihad in that case.


HN suddenly filled with a bunch of crazy luddites. Why don't we instill the death penalty for artists who has taken inspiration from other artists while we're at it.


I know right? I think all those FBI warnings worked maybe, and the new generation of geeks think IP is actually a moral thing instead of a corporate money-grab.

Also, Herbert wasn't against AI I don't think. I suspect he simply recognized he couldn't comprehend the world that far in the future if AI was a part of it. Instead, he used space magic to explore his very present reality of resource wars, and went on to make a point I'm not sure I understand, about too much political order and resultant stagnation causing self annihilation.


I was joking, but HN at the same time is filled with people that believe regulation only stifles innovation.

So just because AI is inevitable doesn't mean that we should abandon all regulation. There would be good merit in slowing down some progress, so we can actually maintain a good transition to new industries.


I'm pretty sure if you steal a diamond from a diamond thief it is still considered theft from the original owner.

Your "indirection" hasn't ever been a valid argument.


Except you're not stealing anything and the "thief" didn't steal anything, either. You both just made a copy.


In fact, in that case, the indirection does the opposite of protect: even receiving the diamond from the original thief as a gift is illegal.


> Use of the output of systems like Copilot or Stable Diffusion becomes a violation of copyright.

That really should depend upon the output.

Many, if not most, people learn an art by imitating the style of established artists. Some will carry on with that style. Others will develop their own, though it will probably always carry elements from those they imitated. Should injecting a machine into the process automatically make it illegal?

There are going to be clear cut cases where it should be, cases where so much is imitated that it goes beyond style and into substance. Yet that means we should have a human looking at the output to determine if it is too close to a copy, rather than banning AI generated art altogether. To do so would put the creative process in peril. This is not because machine learning reflects our definition of creativity. Rather it is because it is difficult to define what human creativity itself is.

(That said, I do believe that using the artist's name as a way of promoting their own work is stepping over a line.)


> That really should depend upon the output.

Except, in copyright law it depends on the _input_.

These models would not exist if they were not first fed the source material.

Until we have systems that are not trained on a pre-existing corpus this will remain true. No matter how clever the algorithm, without the source material you have no output. Zilch. Nada.

Now, when the source material is someone else's property this means that without - someone else's property - you would have had no output.

So, when you want to use someone else's property, which you do not own, the general rule is that you a) first ask them if you may and b) pay them for the right to use their property.

In this sense it is no different than using a photocopier.

It's the copyright ownership of the material you put into the machine that will interest the judge not the quality of the copy.

I'm really looking forward to the first court cases and predict that much hilarity will ensue!


Trained models don't have the actual images inside, they have summed up gradients. So what they are doing is far from a copy&paste job, it's more like decomposing and recomposing from basic concepts, something made clear by the "variations" mode.

Among the things the model learned are some un-copyrightable facts, such as the shapes of various objects and animals, how they relate, their colours and textures - general knowledge for us, humans. Learning this is OK because you can copyright the expression, but not the idea.

Trained models take little from each example they learn. The original model shrunk 4B images to a mere 4GB file, so 1 byte/image worth of information learned from each example, a measly pixel. The DreamBooth finetuning process only uses 20-30 images from the artist, it's more like pinning the desired style than learning to copy. Without Dreambooth its harder but not impossible to find and use a specific style.

And the new images are different, combining elements from the prompt, named artists and general world knowledge inside. Can we restrict new things - not copies - from being created, except in patents? Isn't such an open ended restriction a power grab? To make an analogy: can a writer copyright a style of writing, and anything that has a similar style be banned?


> Trained models don't have the actual images inside, they have summed up gradients. So what they are doing is far from a copy&paste job, it's more like decomposing and recomposing from basic concepts, something made clear by the "variations" mode.

Doesn't matter. JPEG of the work is just a bit of numbers to feed equation, doesn't change the fact it's copyright infringement


A digital photo of the Eiffel tower at night doesn't have the real Eiffel tower inside, only weights and pixels - still you don't have the rights to publish your photo of the Eiffel tower in France.


Drawings and art including the Eiffel tower are still ok, right?


>Except, in copyright law it depends on the _input_

I am not a lawyer, but my understanding of copyright law is that this is explicitly the opposite.

https://www.wipo.int/edocs/pubdocs/en/wipo_pub_909_2016.pdf

>• reproduction of the work in various forms, such as printed publications or sound recordings; >• distribution of copies of the work; >• public performance of the work; >• broadcasting or other communication of the work to the public; >• translation of the work into other languages; and >• adaptation of the work, such as turning a novel into a screenplay

None of these rights, to me, indicate that copyright protects the input. The AI model is not reproducing any specific works, distributing copies of it, performing it in public, broadcasting it, translating it to another language, or adapting the work from one format to another.


>Now, when the source material is someone else's property this means that without - someone else's property - you would have had no output.

Exactly the same happens with artists. The only artists who can claim not to have been influenced by seeing the work of other artists lived tens of thousands of years ago. So what makes it okay to process artwork via some processes but not others, when the ultimate output may in some way copy the input anyway?


Personally, I'm on-board with protecting artists incomes, however, I think there's a middle-ground.

First, I'd like to correct a fact you omitted: Napster, Limewire and the like didn't come out of nowhere. They were created because artists and their recording companies forced consumers to buy entire CDs at inflated prices that kept rising. Now, what they got in return from their consumers after that wasn't fair either.

I don't think making AI generation illegal for everyone makes sense. That's how you get the Metallica's of the world bank-rolling professional grifters to hold people's grandmother's financially hostage.

I do think it makes sense to bar AI generated products from making money if the works used to train it did not belong to that company or individual. If you create a program using CoPilot, you should not earn money. If you make a comic using Stable Diffusion, you don't deserve money. This keeps the power players in check while allowing artists paths to use these AIs if they own their own work outright. Imagine if you could train CoPilot on your own code and then use it to help you. That to me sounds like the framing for a new and responsible form of innovation.


yeah! i think AI tools must be transparent of the input, period.

it feels too unfair leading a new art style and simply be copied with machine precision and speed… opting to not contribute to the neural network database should be a thing but i do not know how reverse engineering of output can be done


Yes, there should be opt outs for ML training. They could take many forms - robots.txt rules, special HTML tags, http headers, plain text tags or a centralised registry. You can take any work out of the training set without diminishing the end result. But doing so would mean being left out of the new art movement. Your name will not be conjured, your style not replicated, your artistic influence thinning out.

If an artist wants her works to have the fate of BBC archives, that removed millions of hours of radio and tv shows from the internet, then go ahead. The historic BBC content was never shared, liked, commented or had any influence since the internet became a thing. A cultural suicide to protect ancient copyrights.


Music piracy didn't stop because of the Napster shutdown. It just manifested itself in different ways. Now all you need to do is use youtube-dl to download the youtube video or soundcloud track or bandcamp album with the -x flag to extract audio. Both the software and the original media sources are legal. In fact, GitHub was forced to take a public stance on youtube-dl after a DMCA takedown request on the repo.

The biggest reason the laws can't possibly hope to stop the practice is:

> [MysteryInc152] told me the training process took about 2.5 hours on a GPU at Vast.ai, and cost less than $2.

As those costs are driven down and the software is accessible to more people, distribution of the weights will not be needed.


I think that one of the main reasons YouTube became so huge was music piracy.


Modern intellectual property law, specifically copyright, is so brazenly slanted to maximize benefit profit for American corporations at the expense of your averge person with zero consideration for the rule of law or the democratic process.

Year after year entrenched media interests lobby the US government to make IP policy more corporate friendly and those policy changes are forced on the citizens of countries around the world through strong armed free trade agreements.

We don't get to discuss these thigs as citizens of soverign states, they just happen to us.

Maybe we want to live in a world with a substantially shorter copyright term, is that so wrong? Maybe that would be better for individuals and society as a whole but we'll never know because American companies wont risk the chance of losing money or power to find out.

How long do you think should copyright should be before it reverts back to the public domain?

5 years? 25? 50? 100? 500? 1000?

How much is too much copyright?


>It's a bit like saying we can't stop music piracy, now that Napster exists.

You can't. Soulseek is alive and well. It started with Napster and now others are carrying the torch.


Except now the vast majority of people pay for music via Spotify.


>It's a bit like saying we can't stop music piracy, now that Napster exists.

perfect example, as piracy has _never_ been effectively stopped and _never_ will be effectively stopped by legislative means


Sure, on a small scale, but Spotify, Apple Music, Tidal and I'm sure many other platforms exist and are quite successful.

I'm a big pirate myself, with multiple TB harddrives full of pirated music accrued over the years, but even I choose to use Spotify and Tidal a lot of the time, out of pure convenience.


Exactly though, the fears of the industry of the time were met, one way or the other. Spotify and others came around and basically destroyed the album / CD model, led to independent publishers having way more power than ever before. It is a record company hell that we're living in right now. Despite spending as much as they did to kill Napster, they weren't able to stop the "inevitable."


> Sure, on a small scale, but Spotify, Apple Music, Tidal and I'm sure many other platforms exist and are quite successful.

They are not stopping the piracy.

They are providing better service than piracy. As Gabe Newell said, piracy is a service problem.

Offer something convenient and worth the price and a lot of people will pay it just fine.


I think it may fall along very similar lines to "sampling" in terms of use... the AI model obviously used copies/samples of original/copyright works.

I'm not saying I support the argument or that it will stand up in court, but definitely some merit to making it.


Napster distributed whole songs. If it sampled 1000's of songs then created original compositions that sounded kind of like the 'style' of those songs what would the legalities be? That is a huge difference. I'm a professional artist who has been able to make a good living and support a family, what does this mean when someone with an algorithm and some key words can produce good-enough work in a fraction of the time for pennies? There is a huge swath of professional artists whose livelihoods are at stake.

Is this like the stagecoach makers when automobiles where invented? Or is this like Napster stealing copyrighted material? This is new territory.


Very much the first.

(1) Ubiquitous new technology

(2) New domains the technology opens

It's even more fundamental than stagecoach --> automobile. It is more like cipher --> RSA -- fundamental change based on basic math and ubiquitous, readily available technology.

The toothpaste is out of the tube!


At this point, I don't think the law can stop it. We're looking at a technology that can easily become illegal but ubiquitous, like Napster in the heady days of flouting audio copyright.

If the entire Western copyright sorted of influence unifies on it being an illegal system, Russia and China are under no disincentive to ban the techs. Especially if it makes their entertainment industries more competitive with the Hollywood machine.


Not just entertainment, whoever bans this will lose out everywhere and become a backwards shithole.


- Copyright expires 70 years after an artists death.

- Corporations / Contributers can buy images, or draw their own

- Getty Images already has the rights to 477 million images.

This is inevitable.


>It's a bit like saying we can't stop music piracy, now that Napster exists.

Trivially, you still can't. Lawbreaking when it comes to copyright is enabled at scale by computers (like everything else); so unless you manage to win the war on general-purpose computing there's nothing you can do.

Sure, streaming has taken the place of piracy (growing the pie is better than strict conservatism), and Patreon (and its offshoots) has made it possible to be paid for recurring content that's inevitably going to be pirated, but file-sharing (torrents) and alt.binaries (abusing free storage sites as a backend for streaming video) still work just as well as they ever did.

The only reason people pay for content is that they want to, provided the price isn't usurious or infinite ("not sold in your region"); those that continue to work with said want prosper, those that fight it fail, and that's just the way it is.


If artists were required to exclusively sign with globe-spanning conglomerates that pay them in loans and take a cut every time they teach an art class (no good comparison for record companies taking cuts of concert revenue), you'd see a society-breaking, unjustifiable level of protection for an artist's "style."

As it is, artists don't have massive teams of lawyers and billions in assets, which makes their concerns irrelevant to the people who would normally be bribed to advocate for them.

For copilot, I'd like to see more models trained on stolen and leaked proprietary code from hacks, or an organized movement to leak code from businesses and feed it into a freely-shared model. If transformation into the model is enough to launder copyright, it ceases to be stolen code. I'm sure it would be helpful in cloning proprietary products.


How do you exactly define a single person's style vs a genre? While any artist might specialize, as is the case here, in a single style and distill it and create a large body of work in the specific style, do you think no one before created a similar work of art in the same style?

Naming a recognizable artist is the current "lazy" way of doing this instead of naming every possible visual style; and sure we could ban name and surname, but should an artist own "dreamy flat pastel colored illustrations of cities of characters with high contrast, no lines, children illustrations" for perpetuity? Definitely not, for the style itself there's likely hundreds of artists that have done something similar before and after.


I think the advantage of using them is too great, companies that use the networks will outcompete ones that don't -- even if it were made illegal in the US it won't be everywhere. I imagine when it comes down to it, the law's going to be pragmatic. What happens to US industries if we allow this, and what happens if we don't, and my guess is that it ends up being allowed.

I'm not saying that's good or right - I really don't know how I even feel about the AI networks morally... I just think money is going to win out.


Copyright exists to incentivize authors, artists, scientists, etc. to create original works by providing a temporary monopoly.

The arguments suggesting that people shouldn't benefit from their work on an individual level, and pointing to music piracy as an example of why we shouldn't try, strike me as arguments for general inaction and fatalism. Not sure what the goal is, there...


The goal is to get these people to face reality. The fact is we are in the 21st century, the age of information. Their creations are just data, and data can be copied, processed and transmitted worldwide at negligible costs. There is no controlling it.

The goal is to make them stop trying to control it. Because their attempts to control it are ruining computers for all of us. We already have harmful stuff like DRM on every chip because of these people. Platforms are getting more locked down, our freedom as users and programmers is decreasing. They will destroy free computing as we know it if this keeps going unchecked.


Because someone may see a version of reality where people are incapable of benefiting from their own work does not mean that it's by any means a settled issue or indicative of "Reality". I doubt these conversations would exist if it was. It is indeed the current year, but that doesn't mean that because things can be metaphorically distilled with false and reductionist equivalencies, that it should all be free for the taking to benefit a few people who outran regulation.

Regarding the concept of control, artists were first put in a defensive position by the individuals who started using their work without their consent, and who are trying exercise their own control over the artwork produced by others through monetizing outputs. Are only companies like Stability.AI, OpenAI, and Midjourney exclusively permitted to use and control the artwork of others, and allowed to charge for access to models which use this artwork without compensation or accreditation to the original authors? Are those artists computers not also being ruined? Do they not deserve representation?

We need to stop demonizing the idea that someone can benefit from their work because there are some companies that have fought to extend copyright for their own benefit.

Copyright REFORM is generally a much more supportable issue than the idea that everything should be free in perpetuity...


> does not mean that it's by any means a settled issue or indicative of "Reality". I doubt these conversations would exist if it was.

It is the reality of computing. Anyone trying to deny that is going to discover that bits are bits and there is no control unless you end computer freedom. It takes tyranny such as mandating that computers only run government signed software to change this reality. This is the sort of thing that will happen if this copyright insanity continues and it will also pave the way to absurdities like regulation of cryptography.

> We need to stop demonizing the idea that someone can benefit from their work

Nobody is doing that. They can benefit from their work as much as they want. Plenty of creators are benefiting right now from patronage via platforms like Patreon. They're getting paid for the act of creating, not for sales of an artiticially scarce product. Copyright is not necessary.


The reality of physics and biology is that if someone is bigger and stronger than someone else, they can beat them up and take their things. Anyone trying to deny that is going to discover there is no control unless you end the freedom of unlimited violence. It takes tyranny such as mandating that beating people for no reason and taking whatever they have using the tool of your superior physical strength results in punishment imposed by collective agreement of society.


You're actually comparing these copyright issues to physical violence? I don't even know what to say. They're not even in the same conceptual space.


I don't think this is freedom - as long as some company with a million time the resources that I have can train a better model, I'm only ever using the models someone with power gives me, no matter how small a device the model runs on.

Having larger models and adapting the weights is one thing but the innovation is mostly on the side of large entities.


> Copyright exists to incentivize authors

It's ideas (memes) copying themselves, making variations, evolving. Until now ideas could only jump from human to human, intermediated by various media. Now they can be unified and distilled into a model, a more efficient medium of replication for ideas, and more helpful in general because it can be adapted to new situations on the fly.


So the same argument should advacate for having no patents, the advantage of just stealing everyone's patents is too great and not all countries enforce patents


Patents are nonsense and are an impediment to progress. They are a government granted monopoly.


You can argue that patents are an INCENTIVE to progress, since people are INCENTIVIZED to create newer and better things knowing that they will be able to enjoy the results of their labor without copycats leeching on their work and ingenuity. I think the pharma model of short-time allowed patents is the best, something like a 10 year competition free period is completely fair to INCENTIVIZE people to create the iPhones and cancer cures of tomorrow.


There are so many arguments here about big C copyright (Disney etc.) and how it is evil and that it shouldn't be an argument - but what I'm seeing is that small artists, freelancers are getting hurt by the output mostly at this point.

If this is about big C copyright, where is the Mickey Mouse dreambooth concept? Disney property is seen as property but the labor of some random freelancer is just seen as nothing.


https://imgur.com/a/BzOt61v 1s to google and 14s for image gen (of 4). It's already all out there.


No chance. IMO i can see case scenario artists get together to lobby for some sort of label system like food industry to label non synthetic art for those interested in supporting bespoke human created works. Then watch said artists get called out as fakers for using AI assisted features like context-aware fill.


I think the tech evolution will drive a niche luxury market for authenticity.


And just how will these new laws be passed? HackerNews upvotes?


> It's a bit like saying we can't stop music piracy, now that Napster exists.

AI art is unknown territory. Comparing this to media piracy (e.g. copying music) leads to a fallacy.

Specifically: where does fair use stop? And consider: Good artists copy; great artists steal. Any art historian will be able to show you how true this is accross all epochs and styles (and types of art no less, i.e. including e.g. music)[1].

Anything that follows is OT for the debate at hand. It is merely to point out that while not only not applying here (AI art is derivatives/remixed works not simple copies), the notion that the P2P crackdown and its legal repercussions of the early 2000's had anything to do with how much someone creating the music in the very first place got paid is a myth perpetuated by the music industry. Specifically that part of the music industry that is not the artists.

> Remember the naive rallying cry among those who thought everyone should have the right to all music, without any compensation for the artist?

The only naivity is that compensation of artists played a role in this. Piracy was never noticable for musicians who weren't already stinking rich. And for those, while noticable, it wasn't an issue. One may argue it was/is for people high up in the food chain of the music industry. But even that stands on feet of clay. From [2]:

> The main finding of the present study is that in 2014, the recorded music industry lost approximately €170 million of sales revenue in the EU as a consequence of the consumption of recorded music from illegal sources. This total corresponds to 5.2% of the sector’s revenues from physical and digital sales. These lost sales are estimated to result in direct employment losses of 829 jobs.

There are approx. two million people being employed by this industry in the EU[3]. Go figure.

For further reading on the funny idea that artists got compensated before P2P and didn't after there is Courtney Love's classic debunking piece on musician's revenue around the time Napster was a thing[4].

And some comparable numbers from what this means for artists trying to make a living of digital music today [5][6].

[1] My father was an art historian. My opinion is mainly based on spending every holiday of my youth looking at art from all epochs across Europe, first hand. Nolens volens I may add. I.e. I'm saying: take my word for it. :]

[2] https://euipo.europa.eu/tunnel-web/secure/webdav/guest/docum...

[3] https://www.ifpi.org/music-supports-two-million-jobs-contrib...

[4] https://www.salon.com/2000/06/14/love_7/

[5] https://www.hypebot.com/hypebot/2019/12/how-many-spotify-str...

[6] https://www.rollingstone.com/pro/features/spotify-million-ar...

Edit: typos


I know it doesn't sound nice but harm to the artists is similar to the harm you do to hole diggers when they see you bringing excavator to your plot instead of hiring them.

Art is not digging holes. But some of it is and more of it will become it in the future.


It's hardly comparable. The excavator does not owe its creative influence to the hole diggers. The quality of its work does not result from someone else's intellectual labor. It's 100% the digger doing the digging.


It's technology putting people out of work.

We care a lot about Copilot, we care somewhat about artists, we care little about manual workers, and less so if they are in another country.


Not quite. It still relies on their inputs. It doesn't just put them out of work, it feeds off them.

I'm not sure of what I'd call this relationship, but parasitic is close to it.


It’s a teacher-student relationship, except the teachers don’t do anything specifically for the students. Let our jobs be taken by Copilot. Are you that afraid?


> The excavator does not owe its creative influence to the hole diggers

But it does. While not as complex as AI art generation, the excavator is mimicking the hole digger. It takes a human action, generalizes it, and offers it in a more efficient manner.


As you allude to, this analogy works if you're creating art purely because there is a demand for it and you only put in the effort required by the customer.

But that is generally not the case with illustrators, and certainly not in this case.

Also creating a new model is dependent on artists (at least right now!) while excavators are not dependent on hole diggers.


I think the problem is ethics models kind of fall apart here. You can trivially make an argument both for and against this on the grounds of ethics. Legally it seems pretty clear nothing wrong is being done here. A human can train on a particular artists style and lift it just fine. Which they regularly do. We just made it way way easier now.

So sure, we can empathize with someone feeling kind of off about the situation. But at the same time its kind of eh, that's how the world is.


> Legally it seems pretty clear nothing wrong is being done here.

It most certainly doesn’t. Just because a human can eyeball an art style and copy it eventually, does not translate into “I can take your copyrighted work and feed it to a machine”. Ethically you may argue one way or the other, but legally you are using somebody’s else’s work without permission.


The machine in question is the human brain.

And besides, there's nothing illegal about "using" a copyrighted work without permission (for example, if artist wanted to use the pixel values in an image for a color pallet, that's totally fine), only reproducing it - which no image generation model does.


I just think the law hasn’t caught up with tech again here. This is derivative work, essentially by definition, and just because the styles are being created without “patching together existing IP,” doesn’t mean they are in the clear.

We can trust that a human creator who apes the style of another human creator will do so with a preponderance of flaws such that their works are distinguishable. AI doesn’t operate like that and the case can’t be made that somehow both the AI and the person spontaneously landed on a certain style like it could with two human beings. As the commenters say in the article, the AI couldn’t generate anything without the original works.

Apropos of nothing, her art style really isn’t that original. Her style itself clearly apes illustration styles of the 40s and 50s. I guess copying never goes out of style.


> AI doesn’t operate like that

Wrong. Various regularization schemes are used in AI models which essentially introduce “flaws” and noise into the process. The flaws are more optimal than what the brain does, but they are there.


This may only be the case right now but I find current AI art sits in the uncanny valley. At first glance it looks impressive but after you spend a few hours looking at the output of current algorithms you start to recognize the same quirks and shortcomings in every image. At this point if I needed art for a project I really cared about I'd still spend the money for a human artist.


Doesn't matter if it's legal or not

See I don't understand this. Training an AI using this content is legal, but how is copying it in the first place to use not illegal? This is what I never understand about these AI cases. If I copy a Youtube video I'm breaking the law, but if I then use my illegal copy to train an AI it retroactively makes my copying legal?


I think you're using the word "copy" incorrectly. What does it mean to "copy" a Youtube video? You are literally making a copy of it on your computer as you watch it, that's what watching a video is. You're also making a "copy" of it in your brain.


Right, but you don't train a model by pointing a camera at a screen. You download the video file. You deliberately bypass the copy protection. I'm not saying it should be illegal, I'm saying it is.


Hmm, is that how it works? We're mostly talking about images in the op, not video, and her images are freely available to view / download from her website.

I'm not sure about literally downloading YouTube videos, you might be right about that.


and her images are freely available to view / download from her website.

At least under US law, the person downloading the images has to be the same one who trains the model, because giving them to another person is copyright-violating distribution.

To be clear, I'm just sick of corporations being free and clear to do things that would get the rest of us stomped.


The LAION dataset, which SD was trained on, is just a list of URLs and textual descriptions. There's no illegal copying going on when StabilityAI trained SD. It's also not illegal for you to do the same thing.


> If I copy a Youtube video I'm breaking the law […]

No. Only if you redistribute your copy are you violating copyright law.


I like how her style has a storybook thing going on. She has a knack for expressions.

edit: That's because she is a storybook illustrator! I should have checked the video too.


No difference at all with how Disney a giant corporate machine has treated artists and art over the last few decades. This is just the next stage in the mindless machines evolutions.

Artists have had less and less influence on anything beyond mindless consumption during our generation. So no surprise where the story goes. Without influence you can't control what the machine does.


If your art style can be ripped off by an AI looking at 32 images, your art style is too pop to be considered "yours" imo.

To take that criticism further, the original artist took Disney art subjects and applied a nickelodeon animation style to it.

Hopefully the artist can appreciate the irony of claiming that it is _her_ style being ripped off when everything in the examples shown is clearly pop art, categorically defined by predominant social influences, and not something which came from her artistic perspective.


Is there any artist whose style cannot be imitated by an AI given selected images? I'm pretty sure an AI can correctly mimic Picasso if you give it 32 blue period paintings. Does that mean Picasso has no artistic value?


No, but it probably means that Picasso is also insufficiently unique/inscrutable to expect not to have his style imitated.


I very much sympathize with her because it isn't fair, but I do find it slightly hypocritical that when looking at the work on her website, much of it is drawing other copyrighted work.

I suspect the only acceptable answer here is to disallow AI training of copyrighted material, but this only delays the models supplanting the actual artists (because people will contribute to and build up a pool of copyright-free training material), it doesn't prevent the ultimate issue of people being replaced by AI.


I believe Disney paid her for those.


But her style is still defined/drawn from Disney/Nick/other cartoons anyway. At the end of the day she said that she didn't see herself in the AI images and felt distanced from it - isn't that good enough?

Her art still has value to her and her clients as long as the human in the loop still has something to offer buyers (ie being able to have a conversation/work on the design for something tailor made).

When the AI is smart enough to generate art from an ongoing conversation about a piece, making adjustments etc then she will have to draw art for herself and those that appreciate it, in the same way that there are still some blacksmiths around and people buy their works because they love that it was handmade/a piece of history.


Indeed!

And another important point that's buried, is that most of the art used to train the model is actually owned by her clients such as Disney. So, she could not, even if she wanted to, give permission to use that content.

IOW, the person training the model was just fine with stealing the artwork.

While the artist seems to not be litigious, it'll be interesting to see if the major rights-holders like Disney start going after the AI model companies and/or the people that train the models, if they find that there is output of their properties.

This automated generation of code, text, art, etc., is really nothing more than sophisticated sampling/mashup, and when you use snippets in your work output, it should be credited and properly compensated. This is rapidly amounting to automated creation theft engines.

Worse yet, thinking ahead a bit, once they've all been trained on the available works, and all the writers, artists, & coders have been put out of work, progress will stagnate, because the "AI"s will generate nothing new, only continuously regurgitating bits and mashups of what is now old stuff.


If a human looked at her art and then made new illustrations in the same style would that be ok? If so, why is it not ok for a computer to do it?


I appreciate the argument. It made me think that we might have a “photographs steal your soul” kind of moment here with new technology.

Nonetheless, the difference is pretty clear here I think. An AI makes the artists style infinitely reproducible by all. A single artist copying an artist’s style is basically what copyright is all about, including the relative straightforwardness of enforcement through ordinary litigation.

Whether copyright is or is not being breached by the AI, there’s a paradigm shifting difference in the nature of said AI, perhaps not unlike downloadable digital music on the internet compared physical media.


I'm not a moral philosophist, but I'd say the difference is effort.

I mean no shade on the illustrator herself, but her style looks derived itself from other styles.

Anyway, another illustrator would need to put in the effort to learn the style, and then X hours to create each piece. An AI, once trained, can churn out thousands of artworks in that style per second (with enough computing power); it makes the illustrator obsolete, and like mass production of low cost knockoff products, it forges competition and cheapens the brand / style.

Is that good or not? I don't know, again I'm not a moral philosophist.


If a human artist sold work created by copying her style, they would definitely get called out by other artists.


>Next, somebody grabs her work (copyrighted by the clients she works for), without permission.

Isn't this what a lot of these AI models use though? Pretty much anything trained on data from the internet is going to be largely copyrighted, no?


When it's using a lot of different types of work from different creators I think the output is more a sum of it's parts, it's a little bit more of a gray area. When it's specifically trying to copy one persons style, that's very personal, and very real to the person being copied. I think it's made weirder by how low effort the copy is. Someone learning your art style and painting it is a bit different to just making a computer do it.

My second thought on this, is that it reminds me of the attitude I saw rampant for data collection back when I was getting into tech. The casual attitude towards consuming other peoples information, be it private data or in this case, work they've labored over, has lead us down the path of exploitation and profiteering. I'm sure it will be no different here.

It's all fun and games when it's free and open and we're all just having making toys, but the commercialization has already begun, and these precedents will end up being profiteered by companies who are willing to profit off of things that others had too strong of a morale compass to do.


Jaron Lanier's 'Data dignity' idea really seems like the best solve here. Her work was indispensable to whatever value this algorithm produces in the future - it would make sense if she got partial ownership of some kind. It's what share of ownership she should have that we get hung up on. In some sense she's already won, she'll definitely get more traffic to her own site now, because she was part of an interesting story about the early days of artistic AI. But we intuit that for every Hollie Mengert or Metallica out there who benefits from the attention sluice, there are a number of other artists who don't get those benefits, and by definition, we don't know who those artists are.

In an ecosystem we might say 'fit data is what makes it into the next generation no matter the species'. In a 20th century economy, we might say 'the creator should benefit from her work'. But we're not in either of those. We care more about the Hollie Mengerts of the world than their impact on the future evolution of art. Or more precisely, we care more about the right incentives being present for Hollie Mengert than how those incentives play out in this individual case. But that influence on future art is also undeniably part of the incentive structure for an artist today.

This seems like a classic wicked problem - does anyone know of a group engaging with it?


Maybe not ruthless enough. Society needs to evolve past this notion that people have control over data and information just because they created it. The faster this happens, the better. Are we seriously gonna have to put up with the good old copyright industry forever? They keep destroying perfectly good technology just because it threatens their existence. I say let them disappear.


Don't worry, if the West doesn't come around to your way of thinking, China will.


If this was to happen, and we created a world where there was no form of copyright: why would somebody spend their life in a creative industry making new things?


Most of the entertainment I have consumed in recent years has been made by amateurs that at most got donations. Besides, there can be subsidies. The current copyright law is already a subsidy, but it’s selling off the society’s natural rights instead of petty tax money.


Without copyright people can make even more things. Bring out your own Mickey Mouse - Star Wars crossover, NSFW!

Imagine the entire corpus of human creation, free to be remixed and extended. For now we hope they make a good sequel, someday...


Most of that corpus was only created because it was profitable to do so due to the protections offered by copyright law.


Intrinsic motivation to create. Also, there's no reason they can't make money some other way. Patronage and crowdfunding seem to be the answer.


They don't seem like a very good answer.

Allowing creative output to be freely used, while forcing creators to subsist on the crumbs thrown back is a two-class system. It seems unfair to those doing the work in such a system.

Copyright is far from perfect (far far...) but it is still an improvement on patronage. At least a creator has control over the use of their work.


> It seems unfair to those doing the work in such a system.

It's the only thing that makes sense in the 21st century. Copyright is unenforceable in the age of ubiquitous networked computers.

In order to enforce copyright, every computer will have to be locked down so that they only execute "compliant" software. Surely everyone browsing this site can appreciate the unfairness of that outcome. I for one do not want such a future under any circumstances. If the copyright business model is killed, so be it.

> At least a creator has control over the use of their work.

An illusion. They have no control. Their copyrights are infringed every single day. Most of the time people don't even realize they are infringing someone's copyright.


> In order to enforce copyright, every computer will have to be locked down so that they only execute "compliant" software.

This is not true. While it is one possible approach to enforcing copyright, it is not the only possible approach. Network surveillance of distribution is another possibility that has been against p2p networks.

Copyright has never been completely enforcable. It has always been a partial solution aimed at preventing organized / profitable distribution, i.e. it is a legal fallback rather than a prevention. But a partially working solution is better than nothing.


What is a "copyright industry"? This is one human trying to make a living by making kids smile.


> her work (copyrighted by the clients she works for)

There you go.


If the artist is posting personal work, then it is covered by copyright automatically per the Berne Convention.

Copyright exists to incentivize authors, artists, scientists, etc. to create original works by providing a temporary monopoly.

If an artist works with a client, the legal stipulations are going to be different for each commissioned work, generally.

Depending on the client, they may be better able to protect an artists work than the artist themselves, and with credit...


> temporary monopoly

What a joke. That "temporary" monopoly lasts centuries and gets extended whenever some rich company's imaginary property is about to enter the public domain. Copyright duration is functionally infinite, you will be long dead before your culture is returned to you.


Right. I doubt many people would disagree copyright REFORM is sorely lacking. Or are you suggesting that an artist is not allowed to benefit from their work because Disney has extended copyright?


Creators can benefit as much as they want. Just not through artificial scarcity. That ship sailed the second computers were invented and they need to stop trying to put that genie back in its bottle.

Either copyright remains unenforceable or computing as we know it today will be destroyed. There is no middle ground and I know which side I'm on. Computers are among the greatest inventions of humanity, they are too precious to be jeopardized because of such concerns as invalidated business models.


Well yeah, I don't know if people have noticed but notice how Disney has started using the classic Mickey Mouse animation at the start of all their works now, because they know their already extended copyright is about to expire.


It should be illegal (if not already) to use other people's work to train AI models. In the future, all artworks will have a license attached to it, some fair usage clause.


Won't this lead to issues like secretly trained models and proving artwork is not AI generated?


The empowerment of man is more important his ego. Pay UBI, subsidize creators with the tax payer’s money, demolish all copyright.


The guy should have tried to use his tool on Disney.



I know this website is not a hivemind, but it's interesting every time an article like this gets posted the majority opinion seems to be that training diffusion models on copyrighted work is totally fine. In contrast when talking about training code generation models there are multiple comments mentioning this is not ok if licenses weren't respected.

For anyone who holds both of these opinions, why do you think it's ok to train diffusion models on copyrighted work, but not co-pilot on GPL code?


If I were to steel man both sides it'd be something like this:

1. Training an AI model on code (so far) makes it regurgitate code line-for-line (with comments!). This is like "learning to code" by just cut and pasting working code from other codebases, you have to follow the license. The AI doesn't "understand the algorithm" at all (or it hasn't been told "don't export the input you fool"). Obviously a bog-simple AI could make all licenses moot by dumping out what it input, and the courts wouldn't permit that.

2. Training an AI model on illustrations so far produces "style parodies" which may look similar to an untrained eye (the artist here is annoyed because she'd not art like that, even though to us it looks similar enough). Drawing a picture that looks like Mickey Mouse is a trademark violation, but tracing a picture of the Mouse is both a trademark and a copyright violation.

The first violates some pretty clear legal concepts; the second is closer to violating moral concepts but those are more flexible - if an artist spends years learning to paint in the style of Michelangelo is that immoral?


The problem with this argument is that it's founded in how the AI is used, not how it is made. It's not a compelling reason to ban the tool, it's a compelling reason to regulate its use.

Copilot can produce code verbatim, but it doesn't unless you specifically set up a situation to test it. It requires things like "include the exact text of a comment that exists in training data" or "prefix your C functions the same way as the training data does".

In everyday use, my experience has been that Copilot draws extensively from files that I've opened in my codebase. If I give Copilot a function body to fill in in a class I've already written, it will use my internal APIs (which aren't even hosted on GitHub) correctly as long as there are 1-2 examples in the file and I'm using a consistent naming convention. This isn't copypasta, it really does have a clear understanding of the semantics of my code.

This is why I'm not in favor of penalizing Microsoft and GitHub for creating Copilot. I think there needs to be some regulation on how it is used to make sure that people aren't treating it as a repository of copypasta, but the AI itself is pretty clearly capable of producing non-infringing work, and indeed that seems to be the norm.


Please let’s not start dictating how people should use a piece of software. It would be like ”regulating” Microsoft Word just because people might use it to duplicate copyrighted works.


I'm not saying we should regulate the software, I'm saying we need some rigorous method of ensuring that using the AI tools doesn't put you in jeopardy of accidental copyright infringement.

We most likely don't need new laws, because infringement is infringement and how you made the infringing work is irrelevant. Accidental infringement is already illegal in the US.


i would argue that we _do_ need new laws. AI generated code is so quite different from any other literary works - after all, it was not created by a human.

My own personal opinion is that the AI generated code (or pictures in the case of the article) should be under a new category of literary works, such that it does not receive copyright protection, but also does not violate existing copyright.


This is meaningless though. The majority of AI generated art you see out there is either hand tweaked or post-processed or both. There's human input involved and drawing a line is going to absolutely backfire.


if you presented both the generated image and the "original" to a jury of peers (or even a panel of experts in the field), they would be able to make a determination as to whether the generated image violated the copyright of the presented "original".

Humans tweaking the image is immaterial to this determination - if the human tweaked it so that it no longer seem to violate copyright, then that said panel would also make the same determination.


You are arguing that AI generated means no copyright protection. So you can't tweak it to "not violate copyright" because their literally isn't any.

Of course you have no way to prove whether any image was or was not generated by AI so welcome to a new scam for law firms to aggressively sue artists claiming they suspect AI was used in their works.


The vast majority of paintings weren't created by a human either, but by a paintbrush. We should really ban those too. Just think of all the poor finger-painters who've been put out of a job!


I think it's worth pointing out that Adobe has been doing this for a long time. You can't open or paste images into Photoshop which resemble any major currency.


> Copilot can produce code verbatim, but it doesn't unless you specifically set up a situation to test it.

It does not matter what a service can or cannot do. We do not regulate based on ability, but on action.

The service has an obligation to the license holders of the training data to not violate the license. The mechanism for which the license is violated is irrelevant. The only thing that matters is the code ended up somewhere it shouldn’t, and the service is the actor in the chain of responsibility that dropped the ball.

The prompting of the service is irrelevant. If I ask you to reproduce a block of GPL code in my codebase and you do it, you violated the license. It does not matter that I primed you or lead you to that outcome. What matters is the legally protected code is somewhere it shouldn’t be.


> It does not matter what a service can or cannot do. We do not regulate based on ability, but on action.

Whether we agree with it or not, intellectual property laws have historically been regulated by ability as well as action. Hence why blank multimedia formats would often have additional taxes in some jurisdictions just in case someone chose to record copyrighted content onto them. And why graphics cards used to include an MPEG royalty in their consumer cost, regardless of whether that user planned to watch DVDs on their computer.

Not saying I agree with this principle. Just that there is already a long history of precedence in this area.

Like a lot of politics, ultimately it just comes down to who has the bigger lobbying budget.


> If I ask you to reproduce a block of GPL code in my codebase and you do it, you violated the license. It does not matter that I primed you or lead you to that outcome. What matters is the legally protected code is somewhere it shouldn’t be.

This isn't accurate. If I reproduce GPL code in your codebase, that's perfectly acceptable as long as you obey the terms of the GPL when you go to distribute your code. In this hypothetical, my act of copying isn't restricted under the GPL license, it's your subsequent act of distribution that triggers the viral terms of the GPL.

The big question that is still untested in court is whether Copilot itself constitutes a derivative work of its training data. If Copilot is derivative then Microsoft is infringing already. If Copilot is transformative then it is the responsibility of downstream consumers to ensure that they comply with the license of any code that may get reproduced verbatim. This question has not been ruled on, and it's not clear which direction a court will go.


> The big question that is still untested in court is whether Copilot itself constitutes a derivative work of its training data.

Microsoft has a license to distribute the code used to train Copilot, and isn't distributing the Copilot model anyway, so it doesn't matter whether the model itself infringes copyright.

Whereas that same question probably does matter for Stable Diffusion.


Given that there's AGPL code in Copilot's training data, it does still matter if Copilot is derivative.


Technically this GitHub license term means you grant an extra license to GitHub whenever you upload it:

https://docs.github.com/en/site-policy/github-terms/github-t...

As in " including improving the Service over time...parse it into a search index or otherwise analyze it on our servers" is the provision that grants them the ability to train CoPilot.

(also, in case you're wondering what happens if you upload someone else's code: "If you're posting anything you did not create yourself or do not own the rights to, you agree that you are responsible for any Content you post; that you will only submit Content that you have the right to post; and that you will fully comply with any third party licenses relating to Content you post.")


But you may not have the rights to grant that extra license if CoPilot is determined to violate the GPL, they can yell at you all they want but they will have to remove it, as nobody can break someone else's license for you.

It'll have to be tested in court, but likely nobody actually gives a shit.


> But you may not have the rights to grant that extra license if CoPilot is determined to violate the GPL

Which is why that second provision is there to shift liability to you. You MUST have the ability to grant GitHub that license to any code you upload. If you don't, and MS is sued for infringing upon the GPL, presumably Microsoft can name you as the fraudster that claimed to be able to grant them a license to code that ended up in copilot.


Microsoft is selling a service to put potentially copyrighted works into ones code, stripped of and disregarding its original license.


How is that different from a consultant who indiscriminately copies from Stack Overflow?

Tangent to that is the "who gets sued and needs to fix it when a code audit is done?"

Ultimately, the question is then "who is responsible for verifying that the code submitted to production isn't copying from sources that have incompatible licensing?"


The consultants would have to knowingly copy from somewhere. One can hope they're educated on licensing, at least if they expect to get paid.

If Microsoft is so confident in Pilot doing sufficient remixing then why not train it on their own internal code? And why put the burden of IP vetting on clients who have less info than Pilot?


> How is that different from a consultant who indiscriminately copies from Stack Overflow?

and how is that different from a student learning how to code off stackoverflow (or anywhere else for that matter), then reproducing some snippets/learnt code structure, in their employment?


That's also an excellent example.

Or a random employee copies some art work that is then published ( https://arstechnica.com/tech-policy/2018/07/post-office-owes... ). You will note all the people that didn't get in trouble there - neither the photographer who created the image, nor Getty in making it available, nor the random employee who used it without checking its provenance.

In all of these cases, it is (or would be) the organization that published the copyrighted work without doing the appropriate diligence on checking what it is, if it would be useable, and how it should be licensed.

> The Post Office says it has new procedures in place to make sure that it doesn't make a mistake like this again.

... which is what companies who make use of AI models for generating content (be it art or code) should be doing to ensure that they're not accidentally infringing on existing copyrighted works.


Microsoft just calls others code as «public code». Public code is in public domain.


Pilot is regurgitating snippets of code still under copyright and not in the public domain. Some may consider publicly available code fair use, but the fact that they're selling access for commercial use may undercut that argument.


How would you regulate this?


There is a part of Deep Learning research (Differential Privacy) which focuses on making sure an algorithm cannot leak information about the training set, and this is a rigorous concept, you can quantify how much privacy-preserving a model is, and there are methods to make a model "private" (at the cost of performance I think for now)


Differential Privacy only proves that it cannot leak a certain amount of information about individual samples of the training set. This only guarantees the input is not leaked exactly back, any composition of the training set is valid, although in image generation this usually means a very distorted image.

An example of DP in image generation (using GANs): https://par.nsf.gov/servlets/purl/10283631


AI image generators also often churn out near-exact replicas of their inputs. For example:

Original: https://static-cdn.jtvnw.net/ttv-boxart/460636_IGDB-272x380....

Copies: https://lexica.art/?q=bloodborne


The AI image generator is revealed to be a lossy compression algorithm which can recall near-identical images to the ones it was trained with. Therefore, the software is conveying copyrighted works. If somebody gave you the model, they violated copyright in doing so. If somebody runs the model on a server, they violated copyright in transmitting the image to you. If you, the recipient of that copyrighted work, go on to redistribute it, you have also violated the copyright. I don't see any difference between these image generators and code generators.


Exact replicas are an issue. If you are using AI image generation to replicate the near exact image, then that's illegal. But nobody cares if you copy a nice code pattern from a GPL code and apply it to your own code base. In the same fashion nobody should care if you make an image in the same art style.


Inexact replicas are also an issue, otherwise there would be no issue with distributing MP3s of an Audio CD, as it's a lossy format that is only close to the original.

I suspect the courts will treat AI more like a "black box" - they won't care how or why your black box can perfectly play Metallica, only that it does.


> But nobody cares if you copy a nice code pattern from a GPL code and apply it to your own code base

this is not true, smh. See Oracle vs. Google.


Oracle v. Google involved actual Java (declarations) code being copied; that it was a derived work wasn't seriously disputed there.


CoPilot will return actual GPL code verbatim.


Yes, and that's why I personally believe that the model itself should be considered a derived work of such code. But OP was specifically talking about "code patterns".


Copying the code verbatim would fall under copying the code pattern. You talking about changing the name of the variables or something?


> if an artist spends years learning to paint in the style of Michelangelo is that immoral?

I'd say that artist has gained a lot by studying Michaelangelo, including an appreciation for what Michaelangelo himself accomplished and insights into how to paint as well or better, and maybe even how to teach that to other people. I don't think we get those benefits from AI models doing that (at least not yet!)


I think we're kidding ourselves to think that some nebulous concept of "the artist's journey" somehow informs the end result in a way that is self-evident in human-produced digital art. Just as with electric signals in the "brain in a vat" thought experiment, with digital art it's pixels. If an algorithm can produce a set of pixels that is just as subjectively good as a human artist, then nobody will be able to tell - and most likely the average person just won't care.

On the other hand, I would say that traditional mediums (especially large format paintings) are relatively safe from AI generation/automation - for now.


> On the other hand, I would say that traditional mediums (especially large format paintings) are relatively safe from AI generation/automation - for now.

Why do you think that? I think large format paintings might be in just as much danger.

There’s a large industry of talented artists in China, Vietnam, etc who copy famous artworks by hand for very low prices. They’re easily accessible online: you upload an image and provide some stylistic details and the artist does the hard work of turning the image into brush strokes. It’s not “automated” but I’ve already ordered one 4’x2’ AI generated painting in acrylic relief for less than the cost of a 1’x1’ from a local community gallery. I put in quite a bit of work inpainting the image to get what I want but it would have been completely impossible to get what I want even six months ago.

I’ve only ever purchased half a dozen artworks in my life and they were all under a few hundred bucks but with this new tech, it just doesn’t make sense to buy an artists’ original work unless it’s for charity. The AI can do the creative work the way I want and there are plenty of artists who are excellent at the mechanical translation (which still requires a lot of creativity, mind)


You don't even have to go to China - I had a very nice painting painted from a photograph for a friend done by another friend's mom who just like painting landscapes.

It looked great and all I had to do was pay for supplies, which was still less than the cost of the framing.


I didn't know there was an industry for that, I guess I should have figured. I might look into that for my own purposes. Although for what it's worth when I said "large format paintings" in my mind I was thinking very large paintings - like Picassos's Guernica - larger than something the average person would have hanging in their home. To the point that the cost of producing it and transporting it is large enough that a buyer is more likely to take personal interest in the artist and much less likely to knowingly purchase something AI-generated or otherwise automatically produced.


That is simply a version of the GPs "artists who are excellent at the mechanical translation".

Want someone to paint the ceiling in your mega mansion? Sure.

But now the creative bit can be done by you - or your 8 year old - if you like.


I think we're kidding ourselves to think that clustering features of existing works and iteratively removing noise based on that clustering is somehow comparable to building up human experiences and expressing them through art.

Using the "brain in a jar" thought experiment, you're making the assumption that the iterative denoising process is equivalent to the way the "brain in the jar" would generate art. Since the question is whether or not the processes are equivalent, it seems nonsensical to have to assume their equivalence for your argument.


I don't think the artist's journey necessarily informs the end result in some way - but I believe it can be an important experience for the artist. Then again, artists can still do this in the era of generative art - there's just not much as much chance of being rewarded for it. If this leads to fewer people wanting to explore art, then I think we've lost something. But it's not clear to me where things are headed I guess. This could be a huge boon in letting people explore ways of expressing themselves who otherwise lacked the artistic ability to want to try.


In retrospect I think I may have been overly pessimistic.


And perhaps more importantly regarding (1) than simple regurgitation: code does things. There's a real risk that if you just let Copilot emit output without understanding what that output does, it'll do the wrong thing.

Art is in the eye of the beholder. If the output looks correct as per what you're looking for, it is correct. There's no additional layer of "Is it saying what I meant it to say" that is relevant to anyone who isn't an art critic.


Art is in the eye of the beholder, but it still needs a creator.

That creator had a vision in mind that's unique to them because of their experiences, and I think it's wrong to say that this image can be quantified as a location in a abstract feature space.

So to say "there is no additional layer [to judge goodness] that is relevant to [most people]" assumes there is an algorithmic measure of "goodness" that can be applied to art, which is an assumption you need to make to believe that there's any similarities with AI generated art and human generated other than "they look kinda similar".


Until 100 years from now, when more general purpose AI are having what could be described as experiences, and can be asked to draw a picture of how they feel when thinking about being unplugged/death.

We hoomans love to think we're special, but quantum tubules etc besides, we really are just biological computers running a program developed over our evolutionary/personal histories.


Sure, in a future 100 years from now when AI is an actual general AI and not the specialized algorithms we have today, one might be able to argue that it does things the same way as a person(although one would hope it does then better instead, since that's the goal).

Until then we are special and we're just pretending these specialized algorithms are replicating the things we don't even understand. Anthropomorphising the algorithms by saying they "learn" and "feel" and "experience" is us as humans trying to draw parallels to ourselves as we find our understanding inadequate to explain what's actually going on.


I'm pretty sure that there's a considerable amount of art hanging in museums, that was done by students of great artists. I think there are several Mona Lisas, done by da Vinci's students, and they are almost identical to the original.


In fact it's well known that successful artists would have studios with their students churning out art that they'd apply the direction and final touches to.

Which are which has been lost to time in some cases, and the art world is filled with dissertations on it.


> an artist spends years learning to paint in the style of Michelangelo is that immoral?

This is a deceptive comparison. A human learns a style and adds their own ideas. They ideas are affected by their mood, schooling, beliefs and coffee this morning.

AI has only the training dataset. If you trained AI on 1000 copyrighted pictures, AI cant add its own ideas, it can only remix pixels from stolen work of other artists.

This is basically like money laundering, if you melted down stolen gold coins, and minted new coins, and then claimes this gold is yours because you made them.


I wouldn't dismiss it so fast. I've seen SD generate some quite creative images, and original as I've been able to determine by searching the training dataset. One example was asking for a picture of someone riding a Vespa, and one of the images had the rider wearing the Vespa fenders as a helmet, louvers and all. I don't see what else to call that but the AI's "own idea".


By deconstructing the "decisions"(to use a disgusting anthropomorphism) that led to either image we can dismiss the "I don't understand, so it must be doing something greater than it is" rhetoric.

The decisions leading up to the human art is the entire human experience leading up to the creation of the art(and possible context afterwards), which we as people tend to put value on.

The "decisions" leading up to the AI art are a series of iterative denoising steps that attempt to recover an image from noisy data by estimating how much the noise differs from the "good looking" image.

So for your "vespa fenders as a helmet" drawing, I don't think that constitutes an algorithm being "creative". If a human were to make the same picture we could rationalize that they're being creative because we can imagine a path where their human experiences led to a new idea. Since the algorithm was only ever made to denoise an image based on its abstract feature-space representation I don't see any way we could rationalize that it created a new idea. The algorithm never "thought" it should use a fender as a helmet, it only found that the best way to denoise the current image to the one described in feature-space was to remove pixels that resulted in the image.

Don't humanize algorithms. They're applied statistics, not a sum of human experiences.


If a calculator adds 2 and 2 and shows 4, is that disgustingly anthropomorphizing the word "add"? If we need a separate word for every informational process, it's going to get awfully messy.

When an idea "pops" into your head, how was that made? Couldn't it also be a similar denoising of patterns in synaptic potentials? We know from many experiments that what something feels like can be quite different from what it actually is.

Is it only that we don't know the exact brain process that makes humans special? And once we inevitably do figure it out, does all human art become meaningless too? I think we need to learn to disconnect process from result and just enjoy the result, wherever it came from.


Those own ideas are mostly affected by other things the human had seen. Way more than by the coffee.

The difference between human an AI model is that AI focuses on seeing completely but human has to do non-art stuff as well.


If you are willing to throw out the moral reason, then the legal reason is just an empty rule.


There are many legal reasons without moral force behind them beyond "we need to agree on one way or the other" - such as which side of the road to drive on.


In your example, both sides are equally acceptable and we just pick one. How does this apply to the present case?


We made a decision years ago around copyright (we've modified it since but the general concept is "promote the arts by letting artists have reproduction rights for a time"). We could change that in various ways, if we wanted to, even removing copyright entirely for "machine-readable computer code" and leave protections to trade secrets. Even if you argue "no copyright at all is immoral" or "infinite copyright is immoral" it's hard to argue that "exactly author's life + 50 years is the only moral option".

Switching the rules on people during the game is what annoys/angers people, and is basically what these AIs have done (because they've introduced a new player at low effort).


But haven’t we seen examples of generative art that are substantially similar to original artwork and examples where AI regurgitates blocks of art (with watermarks!?)


Re. Point 2:

Artists are granted copyright for their work by default per the Berne Convention. These copyrighted works are then used without consent of the original author for these models.

Additionally, the argument that you can't copyright a style is playing fast and loose with most things that are proprietary, semantically.


A key part of the concept of copyright is that having copyrighted works used without consent is perfectly fine. Copyright grants an exclusive right to make copies of the work. It does not grant the author control over how their work is used, quite the opposite, you can use a legitimately obtained copy however you want without the consent of the author (and even against explicit requirements of the author) as long as you are not violating the few explicitly enumerated exclusive rights the author has.

You do not need an author's consent to dissect or analyze their work or to train a ML model on it, they do not have an exclusive right on that. You do not need an authors consent to make a different work in their style, they do not have an exclusive right on that.


I feel there's a lot missing from this, and some terminology would require clarification (What constitutes "used"?).

Generally speaking, this supposition skirts around the concept of monetizing from the work of others, and seems at odds with what the Berne Convention seems to stipulate in that context, and arguably seems in violation of points 2 and 3 of the three-step test.

That's to say nothing regarding the various interpretations on data scraping laws that preclude monetizing outputs.

I don't feel it's that black and white, personally...


What I mean by "used" means any use where copying and reproduction is not involved.

The Berne three-step test specifies when reproduction is permitted, however, any use that does not involve reproducing the work is not restricted, and monetization does not matter. It's relevant for data-scraping laws because you are making copies of the protected work.


> Additionally, the argument that you can't copyright a style is playing fast and loose with most things that are proprietary, semantically.

This has been true since copyright existed, Braque couldn’t copyright cubism — Picasso saw what he was doing and basically copied the style with nothing to be done aside from not letting him into the studio.


But if I train my own neural network inside my skull using some artist's style, that's ok?

Either a style is copyrightable or it's not. If it's not, then I can't see any argument that you can't use it yourself or by proxy.


The brain-computer metaphor is not a very good one, it's a pretty baseless appeal. Additionally, it's an argument that anthropomorphizes something which has no moral, legal, or ethical discretion.

You do not actively train your brain in remotely similar methods, and you, as an individual, are accountable to social pressures. An issue these companies are trying to avoid with ethically questionable scraping/training methods and research loop holes.

Additionally, many artists aren't purely learning from others to perfectly emulate them, and it's quickly spotted if they are, generally. Lessons learned do not implicitly mean you perfectly emulate that lesson. At each stage of learning, you bias things through your own filter.

Overall, the idea that these two things are comparable feels grotesque and reductionist, and feel quite similar to the "Well I wasn't going to buy it anyway" arguments we've been throwing around for decades to try to justify piracy of other materials.

At the end of the day, an argument that "style can't be copyrighted" is ignoring a lot of aspects of it's definition, including the means, and can be extrapolated into an argument that nothing proprietary should be allowed to exist...


> Overall, the idea that these two things are comparable feels grotesque and reductionist

I agree with you there but the alternative - that they’re not comparable - I find equally grotesque and full of convenient suppositions rooted in romanticism of “the artist”. We’re in uncharted territory with AI finally lapping at the heels of creative professionals and any analogy is going to fall apart.

This feels like something that we should leave to the courts on a case by case basis until there’s enough precedent for a legal test. The question at the end of the day should be about harm and whether an AI algorithm was used as run-around of a specific person’s copyright


Good points.

I was actually just sitting in a AI Town Hall hosted by the Concept Art Association which had 2 US Copyright Lawyers who work at the USCO present, and their along similar lines, currently.

Basically, like you specified, legal precedent needs to be built up on a case by case basis, and harm can pretty readily be demonstrated, at least anecdotally, especially as copies are made during training of copyrighted work.

Unfortunately, historically, artists do not generally enjoy the same legal representation or resources that unionized industries with deeper pockets enjoy. It's probably one of the reasons Stability.Ai are being so considerate with their musical variant.

It would have been great if artists were asked before any of this. I could see this going in such a different direction if people were merely asked...


I'm an artist and I work in tech - I'd be very interested in working with the models if I didn't find the idea of using something made out of the labor of my peers repulsive.

Call me a training-set vegan, any model made from opt-in and public domain images I'd use in a heartbeat.


> But if I train my own neural network inside my skull using some artist's style, that's ok?

How well the network inside your skull can manipulate your limbs to reproduce good-quality work in some artist's style?

Our current framework for thinking about "fair use", "copyright", "trademark" and similar were thought about into existence during an era when the options for "network inside the skull" were to laboriously learn a skill to draw or learn how to use a machine like printing press/photocopier that produces exact copies.

Availability of a machine that automates previously hand-made things much more cheaply or is much more powerful often requires rethinking those concepts.

If I copy a book putting ink on paper letter by letter manually, that's ok, think of those monks in monasteries who do that all the time. And Mr Gutenberg's machine just makes that ink-on-paper process more efficient...


>How well the network inside your skull can manipulate your limbs to reproduce good-quality work in some artist's style?

An experienced artist can probably do this in a couple weeks, depending on how complex the style is.

>If I copy a book putting ink on paper letter by letter manually, that's ok, think of those monks in monasteries who do that all the time.

According to copyright, no, that's not okay. Copyright does not care about the method of reproduction, it just distinguishes between authorized and unauthorized reproduction. A copyist copying a book by hand without authorization is just as illegal as doing it with a photocopier. Likewise, if you decide to copy a music CD using a hex editor and lots of patience, at the end of the process you will end up with a perfectly illegal copy of the original CD.

So the question stands. Why is studying artwork with eyeballs and a brain and reproducing the style acceptable, but doing the same with software isn't?


unless you are in fact a living and breathing cyborg [in which case, congratulations] , the wet work inside your head is not analogous to the neural networks that are producing these images in any but the most loosely poetic sense.


No? The mechanisms are different but the underlying idea is the same - identify important features and replicate those features in new context. If an AI identifies those features quickly or if I identify them over a lifetime what's the difference? If I so that you might say my work is derivative but you won't due me. Why is it different if an AI does it?


This comment answers your questions:

https://news.ycombinator.com/item?id=33425414


Not particularly. Parent post is not concerned with or making any claims to special knowledge of the internal details of the modelling in the mind or in the machine, only the output.


> The mechanisms are different but the underlying idea is the same

no.

they are the same as asking a person to say a number between 1 and 6, then asking the same question to a dice and concluding that men and dice work the same.

> identify important features and replicate those features in new context

untrue

if you think that that's what people do, obviously you can conclude that AI and humans are similar.

But people don't identify features, people first of all learn how to replicate - mechanically - the strokes, using the same tools as the original artists until they are able to do it, most of the time people fail and reiterate the process until they find something they are actually very good at and only after that the good ones develop their own style.

based either on some artistic style or some artistic meaning.

But the first difference we learn here is that humans can fail to replicate something and still become renown artists.

An AI cannot do that.

Not on its own.

For example, many probably already know, but Michelangelo was a sculptor.

He was proficient as a painter too, but painting wasn't his strongest skill.

So artists, first of all, are creators, not mere replicators, in many different forms, they are not good at everything in the same way, but their knowledge percolates in other fields related to theirs: if you need to make preparatory drawings for a sculpture, you need to be good at drawing and probably painting (lights, shadows, mood, expressions, are all fundamental for a good sculpture)

Secondly, the features artists derive from other art pieces are not the technical ones, those needed to make an exact replica of the original, but those that make it special.

For example, in the case of Michelangelo, the Pietà has some features that an AI would surely miss.

First of all the way he shaped the marble that was unheard of, it doesn't mean much if you don't contextualize the opera and immerse it in the historical period it was created.

An AI could think that Michelangelo and Canova were contemporary, while they were separated by 3 centuries, which make a lot of difference in practice and in spirit.

But more importantly, Michelangelo's Pietà is out of proportion, he could not make the two figures in the correct scale, proving that even a genius like he was could not easily create a faithful reproduction of two adults one in the lap of the other, with the tools of the 16th century.

The Virgin Mary is very, very young, which was at odds with her role as a grieving mother and, the most important of them all, the Christ figure is not suffering, because Michelangelo did not want to depict death.

An AI would assume that those are all features of Michelangelo's way of sculpting, but in reality it's the result of a mix of complexity of the opera, time when it was created, quality and technology of the tools used and the artist intentions, which makes the opera unique and, ultimately, irreproducible.

If you use an AI to reproduce Michelangelo, everybody would notice, because it's literally something a complete noob or someone with a very bad taste would do.

So to not say the difference, you should copy the works of lesser known artists, making it even more unethical.


respectfully, you're raising a whole lot of arguments here that had nothing to do with any point I was raising and doesn't seem to be moving this discussion forward in any significant way. The point of this subthread thread was a user saying the following:

>But if I train my own neural network inside my skull using some artist's style, that's ok?

This post and others uses a lot of flowery language to point out that we train artificial neural networks and real neural networks in different ways. OK, great. I don't think anyone is saying that's not true. What I am saying is that it's irrelevant.

If I am an exceptional imitator of the style of Jackson Pollock and i make a bunch of paintings that are very much in that style but clearly not his work I'm not going to be sued. My work will be labeled, rightfully so, as derivative but I have the right to sell it because it's not the same thing. Is that somehow more acceptable because I can only do it slowly and at a low volume? What if I start an institute whose sole purpose is training others to make Jackson Pollock-like paintings? What if I skip the people and make a machine that makes a similar quality of paintings with a similarly derivative style? Is that somehow immoral / illegal? Why?

There's a whole lot of hand-wavey logic going on in this thread about context and opera and special human magic that only humans can possibly do and that somehow makes it immoral for an AI to do it. I am yet to see a simple, succinct argument of why that is the case.


> This post and others uses a lot of flowery language to point out that we train artificial neural networks and real neural networks in different ways. OK, great. I don't think anyone is saying that's not true. What I am saying is that it's irrelevant

Maybe I was too aulic.

The point is: you don't train "your artificial intelligence", because you're not an artificial intelligence, you train your whole self, that is a system, a very complex system.

So you can think in terms of "I don't like death, I don't want to display death"

You can learn how to paint using your feet, if you have no hands.

You can be blind and still paint and enjoy it!

An AI cannot think of "not displaying death" in someone's face, not even if you command it to do it, because it doesn't mean anything, out of context.

> Jackson Pollock

Jackson Pollock is the classic example to explain the concept: of course you can make the same paintings Jackson Pollock made.

But you'll never be Jackson Pollock, because that trick works only the first time, if you are a pioneer.

If you create something that look like Pollock, everybody will tell you "oh... it reminds of Jackson Pollock..." and no one will say "HOW ORIGINAL!"

Like no one can ever be Armstrong again, land on the Moon and say "A small step for man (etc etc)"

Pollock happened, you can of course copy Pollock, but nobody copies Pollock not because it's hard, but because it's cheap AF

So it's the premise that is wrong: you are not training, you are learning.

They are very different concepts.

AIs (if we wanna define the "intelligent") are currently just very complex copy machines trained on copyrighted material.

Remove the copyrighted material and their output would be much less than unimpressive (probably a mix of very boring and very ugly).

Remove the ability to watch copyrighted material from people and some of them will come up with an original piece of art.

It happened many times throughout history.


You're typing a lot in these posts but literally every point you're making here is orthogonal to the actual discussion, which is why utilizing the end product of exposing an AI to copyrighted material and exposing a human to copyrighted material are morally distinct.


> which is why utilizing the end product of exposing an AI to copyrighted material and exposing a human to copyrighted material are morally distinct.

sorry for writing in capital letters, maybe that way they will stand out enough for you to focus on what's important.

WE ARE NOT AIS

an AI is the equivalent of a photocopier or sampling a song to make a new song, there are limits on how much you can copy/use copyrighted material, that do not apply TO YOUR HEARS, because you hearing a song does not AUTOMATICALLY AND MECHANICALLY translates into a new song. You still need to LEARN HOWTO MAKE MUSIC, which is not about the features of the song, it's about BEING ABLE TO COMPOSE MUSIC.

which is not what these AI do, they cannot compose music, they can mix and match features taken from copyrighted material into new (usually not that new, nor good) material.

If we remove the copyrighted material from you, you can still make music.

You could be deaf and still compose music.

If we remove copyrighted material from AIs they cannot compose shit.

Because the equivalent of a deaf person for an AI that create music CANNOT EXIST - for obvious reasons.

So AIs DEPEND ON copyrighted material, they don't just learn from it, they WOULD BE USELESS WITHOUT IT.

and morally the difference is that THEY DO NOT PAY for the privilege of accessing the source material.

They take, without giving anything back to the artists.

They do not even ask for the permission.

is it clearer now?


I'll try to address your underlying thought, and hope I'm getting it right.

I think you are right to be skeptical and cautious in the face of claims of AI progress. From as far back as the days of the Mechanical Turk, many such claims have turned out to be puffery at best, or outright fraud at worst.

From time to time, however, inevitably, some claims have actually proven to be true, and represent an actual breakthrough. More and more, I'm beginning to think that the current situation is one of those instances of a true breakthrough occurring.

To the surface point: I do not think the current proliferation of generative AI/ML models are unoriginal per se. If you ask them for something unoriginal, you will naturally(?) get something unoriginal. However, if you ask them for something original, you may indeed get something original.


> If we remove copyrighted material from AIs they cannot compose shit.

I wonder in what way you mean that? In any case the latest stable diffusion model file itself is 3.5 GB, which is several of orders of magnitude less than the training dataset.

It probably doesn't contain much literal copyrighted data.


You're making much more concise arguments now, I think that makes the discussion more useful and interesting.

I would take the position that it's self evident that if you take the 'training data' away from humans they also can't compose music. If you take a baby, put it in a concrete box for 30 years (or until whatever you consider substantial biological maturity), and then put it in front of a piano it's not going to create Chopin. It might figure out how to make some dings and boops and will quickly lose interest.

Humans also need a huge amount of training data and we, at best, make minor modifications to these ideas to place them into new context to create new things. The difference between average and world class is vanishingly small in terms of the actual basic insight in some domain. Take the greatest composers that have ever lived and rewind them and perform our concrete box experiment and you'll have a wild animal, barely capable of recognizing cause and effect between hitting the piano and the noise it makes.

That world class composer, when exposed to modern society, consumed an awful lot of media for 'free' just by existing. Should they be charged for it? Did they commit a copyright infraction? Why or why not?


You are romanticizing brains. Please stick to logical arguments that can be empirically tested.


I feel like a broken record on this topic lately, but I strongly believe that training ML models on copyrighted works should be legal.

It is clear to anyone that understands this tech that it is not simply "memorizing" or "copying" the training data, even if they can be coaxed into doing this for certain inputs (in the current iteration of the tools).

Ultimately, I think the problem of reproducing certain popular works or code snippets will be solved. One interesting direction here are the tools of information theory and differential privacy. e.g) proving that certain training inputs cannot be recovered from the weights, or that there is a threshold of how much information can be gleaned from a single input.

It is easy to imagine (because its nearly there already) a future version of StableDiffusion (or CoPilot) which provably compresses all training data beyond any possibility of recovery, and yet still produces extremely convincing results which disrupt the creative professions of art and programming.

Until we get to that point, it feels like the only consistent and sensible place to apply regulations is with the model end user. When I use CoPilot, I accept the small (honestly overblown) risk that _maybe_ I won't have the license to use some small snippet of code it spits out. But I'm happy to wear that responsibility, because the boost to productivity is so great that I dread a return to pre-CoPilot world. That is, a world where everyone keeps reinventing the same solution to simple problems over and over and over again.


Training being fair use is something I can buy into.

As for actually using the model... personally I still find that unacceptable. Even if the risks are low, we have lots of works in the model, so the risk can still add up.

The idea you have in your head of training without regurgitation is likely not possible. The underlying technology treats the training set as gospel: the system is trained to regurgitate first, and generalization is a happy accident. Likewise, we can't look into a model to check and see what it's memorized, nor can we trace back an output to a particular training example. Which has ethical implications with the way that AI companies crawl the web to get training data; such models almost certainly hold someone's personal information and there's no way to get it out of there aside from curating your training set to begin with.


I mean, in a sense yes the training set is gospel. But these systems are also (generally) tested against held out data.

When you have to model 100TB of images with 4gb of weights there is no way this is possible without learning some kind of patterns and regularity that generalise outside the training set. Most generated items will be novel, and most training items will not be reproducible.

It doesn’t seem radical to suggest that the copying issue will continue to recede as we get better models.

And there are areas of research specifically concerned with _provably_ showing that you cannot identify what items a model was trained was.

Lots of reasons to be optimistic in my view.


Yup. There’s also this, FTA:

> the original images themselves aren’t stored in the Stable Diffusion model, with over 100 terabytes of images used to create a tiny 4 GB model

Is jpeg compression transformative then? Should a compressed image of something not be copyrightable because “it doesn’t store” the “real image”? How about compressed video? Where do we draw the line?


The difference is that JPEG does store the real image, at least close enough to within the given tolerance (determined by the compression factor). That image is as real as say an image on film (also not exact, nor in "original" form).

With Stable Diffusion it's storing the style, but can't reproduce any single input image-- there aren't enough bits [0]. (except by luck, but that's really true for any storage).

[0] https://en.wikipedia.org/wiki/Shannon–Hartley_theorem


The weights of a NN are just a compressed representation of the training data, think lossy zip.

Rank all generated images by similarity to the training data (etc.) and you can see what's stored.

The Shannon-Hartley theorem isnt relevant. A 4GB zip of 100TB text data can exactly reproduce the initial 100TB for some distributions of that initial dataset.


if you reproduced an exact image (to the same lossy degree as jpeg) using the NN, then you are violating copyright.

But if you reproduced an image whose style matches another copyrighted image (e.g., blah in the style of starry night), then how does that new image (which didn't exist before) violate existing copyright? You cannot copyright a style.

The NN containing information which _could_ be used to reconstruct an exact image doesn't itself constitute copyright violation - because the right to use information for training NN is not an exclusive right that the original holder of the training set has.

So either a new law has to come into existence, vis a vis the right to use copyrighted works to train a NN, or the current copyright laws should apply (which implies that NN generated images which are not "exact" copies of existing works don't violate copyright).


If a given model can consistently reproduce an exact image given the same input prompt, why shouldn't the model itself be considered a compressed form of that image?


Determinism is not a copyright violation.

("If a human artist...")


Right, but underneath your premise is a scam, right?

A NN has not learnt to paint: it doesn't coordinate its sensory-motor system with its environment through play, it hasn't developed any taste, it does not discern the aesthetic good from the bad, it has no judgement, and so on ad infinitum.

A NN is just a kNN with an extra compression step. The way all gradient based "learners" work is to compute distances to pregiven training data. In the case of kNN that data is used exactly, in NN its compressed.

There is no intelligence here, there is no learning: it's a trick. It turns out that interpolating a point between prior examples can often look novel and often fool a human observer.

This is largely due to how incredibly tolerant to flaws we are in the cases where NNs are used to perform this trick. We go to great lengths to impart intention, fix communicative flaws, etc. and this is exploited by "AI" to make simple crap seem great by having the observer fill-in the details, perceptually. I see it as a kind of proto-schizophrenia that all people have which usually works if we're dealing with a human, but on everything else produces religions.

In any case, a NN is just a case of a kNN -- which is capable of fooling people exactly the same way, and clearly violates copyright and is a case of theft of intellectual work to make a product you can sell. Adding compression seems irrelevant.


I don't think this interpretation of NNs is correct. There's been a few papers purporting to show this, but afair they used a very tortured definition of "interpolation".

Stable Diffusion is certainly capable of differentiating good from bad. That's why you can tell it to draw good or draw bad.

Not that this point is relevant to my comment. "Play", "taste" and "judgment" can be just as deterministic as a sequence of large matrix operations interspersed with nonlinear layers.


Sure, but then who is torturing matrices to turn them into organic bodies which adapt their musculature to their environment?

Interpolation is forced in the case of NNs, it's a training condition.

And the "kNN interpretation" isnt an interpration, kNNs define what "ideal learning" is in the case of statistical learning, and hence show, it doesnt count as actual learning.

In actual learning we're not interested in whether you can solve prespecified problems but how well you cope when you can't. This is, by definition, not a problem which can be formulated in statistical learning terms and the particular "learning" algorithm here is irrelevant.

In other words accuracy isnt a test of learning. Accuracy is a "non-modal condition" in being fit to a history that actually took place. Learning "in the usual sense" is strictly a modal, "what if" phenomenon, and is assessed by the quality of failure under adverse conditions, not of success.

If one gave any AI/ML system in existence adverse conditions, posed relevant "what ifs" and observed the results, they'd be exposed as the catastrophe they are. None survive any even basic test of "coping well" in these cases.

This is why all breathless AI public relations, ie., academic papers published in the last decade, do not perform any such tests.


so does the number pi constitute a copyright infringement?


No, since it's not a derived work.

And if you can come up with a model that can reproduce images exactly without first getting trained on them, it wouldn't be a derived work, either.


> No, since it's not a derived work.

but why is the set of numbers in a matrix considered derived?

I can trivially derive the number pi.

My original point is that just because something contains the information does not imply that it violates copyright.


Because said set of numbers is produced via a training process that has the original as an input, and a different input would produce a different set of numbers.

You're correct that merely containing the information would not violate copyright - it's all about how that information was produced.


Becuase that new image wasn't in the training set?


With the examples that we're seeing, the images were in the training set.


I'm actually seeing plenty of new images that are in the same style but are different from any of the images in the train set, like wonder woman in front of a mountain that looks like the setting of "frozen".


> there aren't enough bits

you can create compressed copy of a file containing 100TB of the letter "A" in much less than 4GB

there could be enough bits in there to reproduce some of the inputs.


What this analogy is saying is that if an image is generic and derivative enough (or massively overrepresented in the training data) it may be possible to reconstruct a very close approximation from the model. If the training data is unbiased, I question the validity of copyright claims on an image that is sufficiently derivative that it can be reproduced in this manner.


By that case all art should be copyrighted since our brain stores a highly compressed version of everything we've seen.

I wager 100 TB => 4GB is different from JPEG compression and more similar to what happens in our brains. "Neural compression" so to speak


> By that case all art should be copyrighted since our brain stores a highly compressed version of everything we've seen.

Good thing this is already the case! https://en.wikipedia.org/wiki/Berne_Convention

> The Berne Convention formally mandated several aspects of modern copyright law; it introduced the concept that a copyright exists the moment a work is "fixed", rather than requiring registration. It also enforces a requirement that countries recognize copyrights held by the citizens of all other parties to the convention.


This doesn't make any sense in this context and I think you know it bud.

I was making the point that stable diffusion != JPEG compression


The fact that it is a different form of compression doesn't change that it's compression. What is the argument here, that a numerical method should have the same rights as a person?


The author of the original post (Andy Baio) found exactly where the line was. He released a (great) chip tune version of the jazz classic Kind of Blue (named Kind of Bloop), fully handled the music copyrights, and was promptly sued by the cover artist for Kind of Blue who believed that the pixel art cover of Kind of Bloop was not adequately transformative.

https://waxy.org/2011/06/kind_of_screwed/


> Where do we draw the line?

This is what we have courts and legislation for. I expect there's existing legislation here about what constitutes a different work versus an exact copy but it may need some updates for AI.


If you can't reproduce a quantitatively (not qualitatively) similar likeness of an image then it is not just a compressed image.


Possibly controversial opinion: I think the biggest reason why so many people hold conflicting views on this is because of who the victim is in each case.

The loudest voices complaining they were directly hurt by Copilot's training are open source maintainers. These are exactly the kind of people who we love to root for on here. They're the little guy involved in a labor of love, giving away their work for free (with terms).

On the other hand, the highest-profile victims of Stable Diffusion and DALL-E are Getty Images and company. They're in most respects the opposite of open source maintainers: big companies worth millions of dollars for doing comparatively little work (primarily distributing photos other people took).

Because in the case of images the victim is most prominently faceless corporations, I think our collective bias towards "information wants to be free" shows through more clearly when regarding DALL-E than it does with Copilot.


> On the other hand, the highest-profile victims of Stable Diffusion and DALL-E are Getty Images and company. They're in most respects the opposite of open source maintainers: big companies worth millions of dollars for doing comparatively little work (primarily distributing photos other people took).

It's puzzling to me you acknowledge people are taking these photos and getting a cut from their use through the marketplace, yet still see Getty as the biggest victim.

If Getty could AI-generate their whole portofolio and keep 100% of the sales to themselves they'd do it in heartbeat (and I'd expect them to partially go that route). The most screwed people are the photographs ("the little guy" in your comparison)


I said Getty is the highest profile victim, not the only victim. They're the ones making waves.


If you're in the freelance art-sphere, the victims are also small artists who have been hustling hard to be able to live from their art.


With the caveat of "Strong opinions, weakly held", my personal take is that creating artificial scarcity is inherently immoral and thus copyright itself is immoral. Training AI on someone's non-private work is then completely fine IMO.

Copyleft is a license that weakens copyright (and thus inherently good :)), so using machine learning to weaken copyleft by allowing you to copyright "clones" of copyleft code is bad.

If I try to generalize here, the problem in both cases is only if you produce copyrighted works, especially if you trained on copyleft works. If instead both models would stipulate that all produced works are copyleft I would be much more fine with it (and I feel it would respect the license of the copyleft works it was trained on, even if that may be legally shaky).


> With the caveat of "Strong opinions, weakly held", my personal take is that creating artificial scarcity is inherently immoral and thus copyright itself is immoral. Training AI on someone's non-private work is then completely fine IMO.

Why do you hold that opinion? There's a few very clear benefits to creating artificial scarcity, mostly around incentivizing creation (and sharing!) of innovations.

If we can't create artificial scarcity around ideas, then ideas are in a sense less monetizable than, e.g., creating a piece of furniture. But this is just an accident of the way the world works - I can physically prevent you from taking a chair that I made, but I can't prevent you from taking an idea I had. Why does it make sense that the world works this way? Isn't it a whole lot better to encourage innovation, vs. encouraging more people to make physical objects, just because those are inherently scarce?

(The other side is that innovations also have the great property that copying them isn't depriving anyone else of use of the original idea, but that's a side issue to the encouraging innovation one, IMO.)


> Why do you hold that opinion? There's a few very clear benefits to creating artificial scarcity, mostly around incentivizing creation (and sharing!) of innovations.

I think it is self evident why creating artificial scarcity is immoral, but the point you are trying to make is that from a utilitarian standpoint you think it is preferable to behave in a way that is immoral in the micro scale since it will create greater good in the macro scale. If you agree, then I don't think there's any need to justify my belief here :)

That said, I also think that it is unclear that copyright is a net positive increase to creation and sharing of innovations. The current state of monetization is not inspiring since the actual creators usually are not well compensated and money tends to stay with large corporations that are essentially just "right holders". There's also many factors that actively stifle innovation and creativity.

Patents are the most well known example, but being unable to borrow chord progressions or characters or storylines from other works is also stifling (you can't exactly release your "edanm's cut of Spiderman Homecoming" publicly, nor can you create your own sequel or alternate interpertation of the story). Quite a few fan games or fan remakes have also met their demise at the hands of aggressive copyright enforcement.

My own suspicion is that if the current models of creation will be disrupted by copyright abolition, we'll just end up seeing that a lot of the money that was spent on them will move to other avenues of funding like Patreon style or Kickstarter style funding for works. We may even see some new models created. I'd also expect that it will actually shift the balance away from large corporations (whose primary value is having lots of money that allows them to hoard rights) to smaller creators which will now have more direct funding available to them.

I also think an interesting case to look at is Video Games, where the concensus is that a game's mechanics aren't copyrightable, and so whenever a new interesting Indie game comes out on steam, there is a rash of other cool Indie games with their own takes and remixes of the same concept, like the glut of Roguelike deckbuilders after Slay the Spire or the current glut of "Vampire Survivors"-likes that have their own interesting takes on the core idea. Eventually when there's enough buzz around such ideas they can even penetrate the AAA sphere (where adding "Roguelike" elements has started to appear slightly). It is also quite common to see games that are in Early Access for a very very long time, essentially letting the community fund the future creation and expansion of the game.

> If we can't create artificial scarcity around ideas, then ideas are in a sense less monetizable than, e.g., creating a piece of furniture. But this is just an accident of the way the world works - I can physically prevent you from taking a chair that I made, but I can't prevent you from taking an idea I had. Why does it make sense that the world works this way? Isn't it a whole lot better to encourage innovation, vs. encouraging more people to make physical objects, just because those are inherently scarce?

So as I said, I'm not sure it really will make ideas less monetizable (or that if it will, that it will do so significantly). Even today I can get any video game I want for free (illegally, but that effectively doesn't matter since no one will persecute me for it), and yet I still buy video games. In fact, a strong reason for why I buy video games today is because as a child I had a friend that had easy access to pirated CDs and I'd play a lot of games at their house, fueling my passion for them.

And relatedly, it is no accident that "software is eating the world", it is exactly because it is so easy to share, the low marginal costs make it easy to have a strong worldwide impact without too much real world effort :)


> I think it is self evident why creating artificial scarcity is immoral, but the point you are trying to make is that from a utilitarian standpoint you think it is preferable to behave in a way that is immoral in the micro scale since it will create greater good in the macro scale. If you agree, then I don't think there's any need to justify my belief here :)

I'll start with the end, I don't think the "immoral on the micro scale" idea makes much sense. Like, you can say it's "bad" or "annoying" on the micro scale, but if it is good for society as a whole to create artificial scarcity, it just isn't immoral for society to provide mechanisms to create it.

I also don't think it's self-evident that it's "immoral" on the small scale (though not sure what that means, since artificial scarcity is kind of a society-level mechanism.)

That said, maybe I'm just bumping on your use of the word immoral and we're not really disagreeing.

> That said, I also think that it is unclear that copyright is a net positive increase to creation and sharing of innovations. The current state of monetization is not inspiring since the actual creators usually are not well compensated and money tends to stay with large corporations that are essentially just "right holders". There's also many factors that actively stifle innovation and creativity.

This has been a talking point of people against copyright for a long time (I've been having these discussions for at least 20 years, personally).

But I think that you're basically wrong. It's pretty easy to see that you're wrong too - just look at the state of news, the state of music, etc. In most cases, artists today make far less money than they made before the rise of pirated alternatives. Patreon/other models/etc have helped some, but nowhere near where things were before.

In fact, you talk about the current state of compensation being bad, but I think that it would make more sense to listen to actual artists about whether or not copyright helps them or not. I've listened to a bunch, and most of them couldn't come close to doing what they love without copyright.

Also, personally, I'm a software dev. Most of what I do on a day-to-day basis is create IP. I'm fairly happy that someone can't just come along and repurpose everything I built, for free. Otherwise, I'm fairly sure I'd be out of a job.

(I'm fairly sure that without copyright/IP, most software we use wouldn't exist either.)


How do you propose to keep artists able to pay their bills and live a decent life if you're completely cool with training AIs on them?

Bonus points if you have any actionable scheme beyond waving your hands and talking vaguely about "basic income".

Keep in mind that the life of a professional artist is currently very perilous, anyone working freelance is constantly battling against the social media giants' desire to keep everyone scrolling their site forever. Words like "patreon" and "commission" and links off-site to places an artist can exchange their works for money are poison to The Algorithm and will be hidden.

And also if I am reading this right, you have absolutely no problem with an image generator that's been trained on copyrighted work producing work that's either copyrighted or copylefted? You are utterly fine with disregarding the copyrights of the original artist and/or whoever they may have assigned the copyright to as part of their contract?


> How do you propose to keep artists able to pay their bills and live a decent life if you're completely cool with training AIs on them?

Why isn't it a concern for any other automation?

How do we progress, exactly, if we randomly decide that nothing can disrupt any of the current ways of earning income?

What about, IDK, coal miners?

> And also if I am reading this right, you have absolutely no problem with an image generator that's been trained on copyrighted work producing work that's either copyrighted or copylefted? You are utterly fine with disregarding the copyrights of the original artist and/or whoever they may have assigned the copyright to as part of their contract?

Copyright maximalism is bad. It also doesn't make any sense. Someone learning to reproduce your capabilities by looking at your stuff isn't violating copyright. If we allowed copyright to somehow mean that someone's skills can't be reproduced...


The very first Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel winner Paul Samuelson[0] makes the argument here[1] when discussing how lighthouse economics works that anything with zero marginal costs that has a price other than free is by definition an economic loss. Therefore you should find other ways to fund lighthouses, and by extension, all media and software.

If an economic loss is currently occurring, that means that, if copyright is abolished, an economic gain will accrue. Where that economic gain is captured is therefore the core focus. What we want to occur is for society and the author to share in this increased economic gain, what we don't want is for monopolistic rent-seekers to grab all of this value for themselves.

I am not yet sure of this, but I think a land value tax would accomplish this goal. One would then discover some method to find the weight for each artist or piece of software through a market means that might only be an approximation and take from the land value tax revenue and distribute it to artists and other creators. I think a decent way to find this value is through a revenue neutral(or slightly negative) opt-in sortition process in which, when you go to use an artists work, you are entered into an auction wherein the 50% who bid more than the median value get to use the work at the median price, and the 50% who bid less do not get to use the work, but receive their bid in cash. This is surely not the full system, but it is just me working out how we can move forward from such an unjust system like copyright.

[0]: https://en.wikipedia.org/wiki/Paul_Samuelson

[1]: https://courses.cit.cornell.edu/econ335/out/lighthouse.pdf - page 359, first paragraph


Speaking as another copyright abolitionist,

> How do you propose to keep artists able to pay their bills and live a decent life if you're completely cool with training AIs on them?

1. Software being able to imitate artists doesn't mean people will stop wanting art from other people.

2. If it does, I would say that it's unfortunate, but it's the world we live in. Nobody has a guarantee on being able to make a living doing any conceivable profession whatsoever, and artists are no different. I would very much like to make a living from working on my personal projects, but people don't seem to want to take me up on my offer.


Because when I go to a live performance, watch a movie, browse an art gallery, etc., I am training my brain on copyrighted work. Every artist has done the same. No artist has developed their style in a vaccuum.

(See my other comment though, I am not sold on any of this being right).


> Because when I go to a live performance, watch a movie, browse an art gallery, etc., I am training my brain on copyrighted work

you paid for it and enjoyed it in the way the artist intended.

> I am training my brain on copyrighted work.

too bad your brain alone is useless.

You need good hands to replicate much of the copyrighted works you "trained" your brain on.

> No artist has developed their style in a vacuum.

some absolutely did, indeed.

Just look at the school of film and animation that artists in USSR developed while separated from the rest of the World.

They are unique and completely different from what the west was used to (Disney)

https://www.youtube.com/watch?v=2qWBZattl8s

https://www.youtube.com/watch?v=1qrWnS3ULPk


Both links contain great examples of late 1960s Polish animation. Poland was never a part of USSR.

Moreover, just like Czechoslovakia, Poland did not develop its animation style in a vacuum. They had strong artistic connections with many other European countries, especially France.

> They are unique and completely different from what the west was used to (Disney)

Not sure if you are an American... but Europe has a very old and very rich animation tradition. Growing up in Europe I was only vaguely familiar with Disney. The vast majority of animation I watched as a child was European.


correct, I wrongly used the term USSR to mean the eastern block.

> Not sure if you are an American... but Europe has a very old and very rich animation tradition

No, I am Italian.

I grew up with these kind of animations. [1] [2]

Nonetheless, the "animation studio" of the West, the one who won Oscars and was published virtually everywhere in the western block, was Disney

[1] https://www.youtube.com/watch?v=GV3BqbsyaUk

[2] https://www.youtube.com/watch?v=8Adqk9KD6Fk


> Just look at the school of film and animation that artists in USSR developed while separated from the rest of the World.

So they were developed in a complete vacuum including traditional things like theater, poetry and story telling?


they pioneered new techniques in a vacuum, because they were segregated.

complete vacuum is a silly argument, we owe dinosaurs the oil we used to build up our modern societies, would you say that AI would be impossible without dinosaurs?

Do we owe the big bang a debt of gratitude?


> In contrast when talking about training code generation models there are multiple comments mentioning this is not ok if licenses weren't respected.

I think one of the differences is that people are seeing non-trivial amounts of copyrighted code being output by AI models.

If a 262,144 pixel image has a few 2x2 squares copied directly, you can't tell.

If a 300 line source file has 20 lines copied directly from a copyrighted source, well, that is more blatant.


As an artist you can spot parts though that have the same 'visual language' that are much larger than 2 pixels. E.g. how someone uses their brushes, how someone does texture on corrugated metal etc. Those are footprints as large as a matrix multiplication method - they just can't be that easily quantified, because we need an AI model to quantify them.


I personally feel the bar for copyrighting code should be considerably higher than 20 lines.


> I personally feel the bar for copyrighting code should be considerably higher than 20 lines.

That highly depends on the lines of code.

One of my (now abandoned) open source react components essentially does some smarter-than-it-probably-should state management in just a handful of LOCs. At least a few hundred people found the clever solution I came up with useful enough to integrate into their own projects.

I've seen talented a graphics programmer hand optimize routines to gain significant speed boosts, speed boosts that helped save non-trivial amounts of system resources.

And where do you draw the line? That same gfx programmer optimized maybe a dozen functions, each less than 20 lines, but all quite independent of each other. The sum total of his work gave us a huge performance boost over everyone else in the field at the time.

And of course you also have super terse languages like APL, where non-trivial algorithms can easily be implemented in 20 LOC.

But let's move to another medium, the written word, also one of the less controversial aspects of copyright (ignoring the USA's penchant for indefinite extension of copyright)

Start with poems, plenty of artistically significant poems that come in under 20 lines, deserving of copyright for sure.

https://tinhouse.com/miracles-by-lucy-corin/

There is a short story, around 22 lines.

The problem is, it is complicated, which is why these are the types of things that get litigated all the time.

Heck as a profession we cannot even agree on what a line of code is. A LOC in Java is, IMHO, worse less than a LOC in JavaScript, and if you jump to embedded C, wow that is super terse, unless you count the thousands of lines of #defines describing pinouts and such, but domain knowledge is needed to know that those aren't "real" lines of code.


I think that your code examples are not (and should not be) copyrightable.

Quoting US copyright law (but the same principle is global) "In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work." - when a work combines an idea and its expression, copyright protects only the expression but not the idea itself, no matter how clever or valuable it is. Copyright does not prohibit others to freely copy the method/process/system/etc expressed in the copyrighted work.

Also, there is a general rule of thumb (established in law and precedent) in copyrightability that functionally required aspects can not be copyrighted - in essence, if you create a new, optimized routine that's superior to everything else, then only the "arbitrary" free, creative parts of that code are copyrightable, but you absolutely can't get an exclusive right to that algorithm or technique. If the way you wrote it is the only way to write it, that can't be protected; and if not, others must be able to write stuff that's functionally the same (i.e. gets all the same performance boosts) and varies only in the functionally irrelevant parts. That's one of the reasons, for example, for recipe sites having all that fluff - because you can't get copyright protection on the functional part of the recipe, or some technique in another domain like architecture or computer science. Perhaps you can get a patent on that, but copyright is not applicable for that goal.

So, going back to your examples:

People reimplementing that "smarter-than-it-probably-should state management in just a handful of LOCs" is absolutely permitted by copyright law. If the same state management can be written in many different ways, then copying your code would be a violation and they would have to reimplement the same idea in different words, but copyright law definitely allows them to copy and reuse your idea without your permission, it doesn't protect the idea, only its specific expression.

Hand-optimized graphics routines may fall into the area where there is only one possible expression for that idea which implements the same method with the same efficiency. If that happens to be the case, the routine is not eligible for copyright protection at all - you can't get a monopoly on a particular effective technique or method using copyright law; patent law covers the cases in which that can or can't be done.

For APL implementations of algorithms - again, the key principle is that copyright definitely allows others to implement the same algorithm. If an obvious reimplementation of the same algorithm results in the same terse APL code, then that's simply evidence that this particular APL code is solely the "idea" (unprotectable), not "creative expression" which would be eligible for copyright protection.


It's not the training of the models that's the problem, it's when the AI spits out "substantial portions of code", an important term in the GPL and with regard to fair use law, that are exact, sometimes even including exact comments from specific codebases. This does violate the licenses.

There's something quantitative in code that you don't get in drawings, drawings the unique quality is purely qualitative, so it is hard to demonstrate what exactly it was that was ripped off. When you find your exact words being returned by a code helper AI it's hard to pretend that it's not directly and plainly just copy pasting code snippets.


I am going to go ahead and say it, some of you people are so far up your own ass that you don't realize it is the exact same thing. All of you are saying it's unique because you code but you don't understand art and can't pick out the things that are clearly copied from artists because it isn't exactly the same.


I don't understand art, you're right, at least from an artist's perspective. Maybe you can enlighten me.

So my current perspective is this: if a picture is drawn in a style similar to yours for example, that's not infringing, but if it's just a scrapbook collage of cutouts of your art it is, except where it's fair use. Would that be right?

So the same applies, if an AI actually can help people write code by learning from existing code, that's fine, but if an AI just copy pastes code blocks that's not.

Where am I going wrong here?


It’s a dumb hill to die on. Doomed to fall to a layer of minor refactoring. If you say that’s the problem, you’ll have nothing to stand on later.


On the contrary - the part that is problemmatic is the verbatim reproduction of copyrighted code. If that's fixed by a "minor refactoring" then there's no hill to die on. It's not AI code generation per se that's problemmatic - it's when it does things that are break current IP law.

If you want to debate expanding IP law - that's a different discussion and one I would be rather sceptical about. I'd prefer the that IP law in general was rolled back - not forward.


Which is a fine point of view. But it’s not the one that many (most?) detractors actually hold.

It would imply that you cannot offend open source licenses by doing something as simple as recreating a codebase in another language, thus eliminating the exact matches.


> Doomed to fall to a layer of minor refactoring

Not quite - there is a reason why https://wikipedia.org/wiki/Clean_room_design exists as a concept to workaround copyright, and the same concept could hold for ML models.


Clean room design? Did the models train on art the trainers drew themselves?


I always held the opinion that GPL etc was a copy-left license that was intended to make sure the code was free (free as in freedom not as in beer). That in an ideal world you wouldn't need the GPL or any licenses at all. At this point I really don't care what co-pilot or any of its derivatives result in and I think in the not too distant future we will have machine code to readable code translation which will enable more freedom. That is, it really won't matter if the code is compiled or not, when you can "AI decompile" it into human readable code, do your modifications, and then do with it what you will.

From that view let the data be free.


As long as this copyright violation laundering isn't reserved for the big guys, I'm happy for anything that confuses and delegitimizes the concept of copyright. But it is reserved for the big guys, you're going to get sued to death if you copy any of their work.


GPL folks are completely OK with something like Copylot when GPL license is obeyed, so all emitted code, generated by AI trained on GPL code, is licensed under GPL again. It's not OK to call our code as «public code» and ignore our license.


But by repeating this argument you are strengthening copyright, which is the fundamental evil GPL was made to fight. There surely will be FOSS clones of Copilot in the near future. There is no need to feed the copyright lobby.


Some languages need to compile but others don't


I think both are inevitable and I'm ok with both. I think a sticking point is that its considered normal to make your own art in the style of another but abnormal to copy code verbatim. Art seems to be clearly the former while there are instances that probably stick in people's minds where copilot has produced verbatim examples.

Indeed it seems like code will be vastly more prone to this problem compared to art because changing a single pixel is merely a question of aesthetics whereas code is constrained tightly by the syntax of the language. With a much smaller space of correct results duplication is likely inevitable.


This is my thinking too. A maximally useful code AI would include verbatim reproduction (since presumably the code it was trained on was written that way for a reason relevant to its function). A maximally useful art AI has comparatively little reason to ever want to output verbatim training inputs.


I've been thinking of a possible resolution to this. For Copilot and similar systems: keep the training data, and in addition to the text generation, add a search function. Find generated sequences that Copilot puts out, and send pointers to the source for close matches in strings over a threshold length. Example: if Copilot produces the Quake fast inverse square root routine, you'd get a pointer to the source AND to the license. This would allow credit for the author for permissive free licenses, and would allow the user to dump that code if it's GPL and they aren't willing to distribute under those terms.

For art, train on contributed images whose authors agree to use them for that purpose. There could be some organization, perhaps a nonprofit, that would own the images and users of the model could credit that organization, and perhaps contribute back their own generated work. That way a legally clean commons could be built and could grow.


Of course people today are training on copyrighted images, chanting fair use because people on the internet tell them it's okay, instead of finding groups of artists who consent to having their art be trained on.


I hold neither of those opinions. My take is that agricultural civilization is being digested by technocapital and we're all along for the ride.

That said, the application of copyright to text, vs, code, vs images, has salient differences. The concept of plagiarism in visual artwork exists by analogy, but it's hard to call it coherent.

Music is somewhere in the middle, people have sued successfully over melody and hooks.

There are things like characters in animation, it's not unheard of, but the balance is more on the side of "great artists steal".


The vast majority of HN's patronage is tech aligned. Try asking a community of artists the same question and see what their responses are. The results might surprise you.


What's funny is HN was majority OK with AirBNB "disrupting" hotels and Uber/Lyft "disrupting" taxi services by bending the rules and exploiting legal loopholes, but when AI starts "disrupting" their artwork and code by bending the rules suddenly disruption becomes a personal problem.

Disrupt onward I say. Humans learn and remix from prior copyrighted work all the time using their brains (consciously chosen or not). So long as the new work is distinguishable enough to be unique there's nothing wrong with these new AI creations.


Because the models are not creating a 1:1 replacement of the original work.

As mentioned before "style" is not something subject to copyright and the model creates a model of that style. The process of finetuning a model generally means that one would not want to recreate the original images as that would overfit it and render it, essentially useless.

When it comes to code, there is a higher chance of getting a one-to-one clone of the input as the options used in creating an algorithm, or even a simple function are dramatically reduced imo.


> Because the models are not creating a 1:1 replacement of the original work.

Since when did that become a requirement? If those are the rules now, then cutting the final credits is good enough to start torrenting movies.

> When it comes to code, there is a higher chance of getting a one-to-one clone of the input as the options used in creating an algorithm, or even a simple function are dramatically reduced imo.

If you're going to consider each function within a larger work as an individual work, that makes the 1:1 replacement claim more dubious. In order to recognizably imitate a style, one or more features of that style have to be recognizably copied, although no single area of the illustration would have to be. A function is a facet of a complete program just like recognizable features of a style are facets of each work an artist produces. If it helps, consider an artist's style as their own personal utility library.


If I made a scene for scene remake of a Disney movie, with an ugly woman for a princess and social commentary/satirical injections, it would be defensible as fair use in court.


That's because it is parody, which is explicitly defined as fair-use. NNs are not only used for parody.


I think when it comes to art, less than one-to-one clones are often still functionally equivalent in the mind of many viewers. Stylistic and thematic content is often just as, if not more, important than the exact composition. But currently the law does agree that this is not copyrightable. And sometimes independent artists profit and make a name for themselves copping other styles, and I think that's great.

But could it be considered an intellectual and sociological denial-of-service attack when it's scaled to the point where a machine can crank out dozens of derivative works per minute? I'm not sure this is a situation at all comparable to human artists making derivative works. Those involve long periods of concentration, focus, and reflection by a conscious human agent to pull off, thus in some sense furthering to the intellectual development of humanity and fostering a deeper appreciation for the source work. The machine does none of that; it's sort of just a photocopier one step removed in hyperspace, copying some of the artists' abstractions instead of their brush strokes.


> when it comes to code, there is a higher chance of getting a one-to-one clone of the input.

I'm not so sure. There's a generated image in the article that I think looks enough like Wonder Woman to cause a lawsuit.

That's just one of a handful of images in the article, and doesn't seem to have been chosen for its similarity to Wonder Woman.


Code is hundreds to many thousands of lines. A line of code is analogous to one color pixel in digital art.


Depends on which lines of code.

I have written projects where I'd consider a handful of lines of code to be the core central tenant of the entire project that everything else is built up around. Copy those lines and everything else is scaffolding that falls out naturally from the development process.


but one line similar code is easier to find. this is because copilot work in one line/small function level.


Style is not protected by copyright. You can create your own art in the style of any living artist and this is allowed. AI is automating that process. Some works that the models produce may be too close to an original work and probably be guilty of copyright violation if that ever goes to court. It'll be up to a judge to look at the original and the AI output to weigh in on if it's different enough or if it's an elaborate copy.

Code is different, there isn't a style to it other than perhaps indention and variable naming conventions. Entire sections (that are protected by GPL) are copied. This by itself isn't the issue, it's contaminating your codebase that is the problem. If your work ends up with the same license as the code sources and those are properly documented per those license agreements you're fine. But if you end up violating the GPL and someone knows their code is in use, you are in a tough situation. Again, it'll end up in an expensive court room session where a judge is going to have to determine if enough code was copied to be construed as a license violation. That's the one scenario you would want to avoid in the first place because for a lot of businesses that kind of lawsuit is too expensive to fight.


Humans used to learn to code from copyrighted works (textbooks) without much reference to OSS or Free Software. Similarly, teaching ML models to code from copyrighted works isn't going to violate copyright more frequently than a human might; and detecting exact copies should be pretty easy by comparing with the corpus used to train it. Software houses already have to worry about infringement of snippets, and things like Codex are just one more potential source.


Those books were purchased and a license granted for such use.


> Those books were purchased

Sometimes. People also borrowed them, read them in libraries, or in later years looked at free textbooks online.

> and a license granted for such use

I never heard of such a thing, and it was never seen as necessary. No student checked the licenses on their textbooks before deciding whether they could read them.


Cognitive dissonance, different persons, hypocrisy.


People often hold conflicting views, they done like to think about it though, because it can lead to cognitive dissonance.

That’s one reason why it is probably better to have a derived world view than a contrived world view.


Because there’s more programmers than artists in this hive mind.


> why do you think it's ok to train diffusion models on copyrighted work, but not co-pilot on GPL code?

Probably worth pointing out that GitHub has a license to the code on its site (read the fine print) that is independent of other licenses that the code may available under.

Whether that license applies to training ML models is legally uncharted waters.

Whether it’s right to train those models is another matter.


It's the hypocrisy of it all. Multi-billion dollar corporations whose empires were built on copyright, violating the licenses of other people's code. Why do they get a pass for that while simultaneously shoving DRM and trusted computing down our throats? I hope they get sued for ridiculous sums.


I think any code that's posted in public should be considered free to use by anyone for anything and its corresponding license be ignored and invalid. If you want restrictions on how people use your code, don't post it publicly, or have a proprietary portion that's required for compilation


Your argument is tantamount to expecting Michael Bay to have never seen a Scorcese or deriving influence from it.


Is Michael Bay cloning and copyrighting entire scenes of the Scorsese films as entirely yours?


Pretending it's not clearly more complicated than that will not convince anyone, it will make them feel condescended to. While Scorcese is Scorcese, a deep learning model is not Michael Bay.


I used that comparison on purpose. Michael Bay leans on a lot of computer driven technology to shoot movies inspired by other more traditional directors. The comparison is direct, if you feel condescended to, I did not intend that.


It's actually fine in both cases


It is totally okay to train Copilot on GPL code, but the resulting generated code should also be released under GPL license, clearly being a derivative work. I don't even know why it is being discussed.


> the majority opinion seems to be that training diffusion models on copyrighted work is totally fine

Well, maybe they were all just downvoted into invisibility. But up to your post, I have seen none.


Can humans learn from copyrighted work?

ps. I'm surprised music has not yet been yet leveled by AI, can't wait for "dimmu borgir christmas carols" prompts.


Music is somewhat more challenging because you have a few other problems that have to be solved in the pipeline, and source separation is still not a 100% solved problem. Beyond that, audio tagging beyond track level artist/genre is a lot harder than image tagging.

Once you have separated sources for a training data set, it's like text generation, except that instead of a single function of sequence position, you have multiple correlated functions of time. Text generation models can barely maintain self consistency from paragraph to paragraph, which is a sequence difference of maybe 200 tokens, now consider moving from token position to a time variable, and adding the requirement that multiple sequences retain coherency both with each other, and with themselves over much larger distances.

There are generative music models, but it's mostly stuff that's been trained on midi files for a specific genre or artist, and the output isn't that impressive.

I am also eagerly awaiting "hark the bloodied angel screams" with blastbeats, shrieks and blistering tremolo guitar, though.


I don't think the law will get hammered down until the AI models generate 'major recording artist' inspired songs. Anyone claiming that artists can't claim 'style' as a defense of AI generated works is in for a rude awakening.


Stealing stuff so you can get rich selling it is worse than stealing for private use.

Stealing from idealist volunteers is worse than stealing from a random person.


I don't think copyrighting things is fine.


To be honest, the majority opinion on this just demonstrates how narrow minded and uncritical many people here are that the clear and obvious juxtaposition doesn't get their minds churning. It's hard not to notice half of them merely use AI tools and don't really understand how they work, hence why the silly and incorrect phrase of "your mind is a NN!" keeps occurring here.


Code generation models tend to much more often regurgitate code from the training data compared to one of these image based models regurgitating images from the training data.

Code generation models need to have special handling for checking if the generated code falls under copyright.


I do open source on Github and I believe both are totally fine.


maybe we can have a GPL code generator model, all generated code are under GPL license?


I think this is a great question, but I think answers should rest on a slightly more detailed understanding of how actually copyright works.

IANAL, but to a first-order approximation: everything is "copyrighted" [1]. The copyright is owned by someone/something. The owner gets to set the terms of the licensing. The rare things not under copyright may have been put explicitly into the public domain (which actually takes some effort), or have had their copyright expire (which takes quite a while; thanks Disney).

So: this is really a question about fair use [2], and about when the terms of licensing kick in, and it should be understood and discussed as such. I don't think anyone who has really thought about this is claiming that the models can't be trained on (copyrighted) material; the consumption of the material is not the problem, is it? The problem is that the models: (1) sometimes recreate particular inputs or identifiable parts of them (like the Getty watermark), or recreate some essential characteristics of their inputs (like possibly trademark-able stylistic elements), AND ALSO, (2) have no way of attributing the output to the input.

Without being able to identify anything specific about the input, it is impossible know with certainty that the output falls within fair use (e.g. because it was sufficiently transformative), and it is impossible to know how to implement the terms of licensing for things that don't fall within fair use. There's just no getting around that with the current crop of models.

The legal minefield is not from (1) or (2), but from (1)+(2), at the moment of redistribution, monetized or not. Even if Copilot was only trained on non-reciprocal licenses (BSD, MIT), there are very likely still licensing terms of use, which may include identifying the original copyright owner. Reciprocal licenses like GPL have more involved licensing terms, but that is not the problem: the problem is failure to identify the original licensing terms. We should not use these models as an opportunity to make an issue about GPL or its authors, or about the business model of companies like Getty; both rest on copyright, and come to our attention because of licensing.

Sorry about the rant. As for your question: I think it may be as simple as: to what extent are readers here the producers of inputs to the ML models, versus consumers of outputs. It gets personal for coders when models violate licensing terms of FOSS code, but it feels fun/empowering to wield the models to make images that we'd otherwise be unable to access. From my rant above you can tell that whether its for code or images, I think the whole thing is an IP disaster.

[1] https://en.wikipedia.org/wiki/Berne_Convention

[2] https://en.wikipedia.org/wiki/Fair_use


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: