Hacker News new | past | comments | ask | show | jobs | submit login

> Copyrighted material, sexual content, political opinions, throw it all in and release it please!

Why copyrighted material? Could we stop celebrating how tech is going to steal everyone's copyrighted works in a massive effort to replace the artists who made it? Why does everyone here hate artists so much? Do they not deserve any rights over their IP, eg, the right to say no when someone wants to make derivative works that replace them from it?




>Could we stop celebrating how tech is going to steal everyone's copyrighted works

Copyright has a fair-use exception for transformative works. It is difficult to look at the LLM's of the day and think "No, they have not taken the copywritten works and transformed them into something completely new."

There is no hate of artists. I don't know how you get from "It's not copyright violation" to "I hate artists". This is a matter of existing law. It is of course unsettled, the fair-use interpretation not yet tested in courts. But the same goes for affirming it's a violation of copyright.

These questions will have their day in court, and until then there is no need to make arguments in an inflammatory tone that that borders on personal attack. In the meantime, maybe engage in a conversation about how copyright law would need to be changed to account for this new technology, not condemn people who see a reasonable interpretation of existing law the differs from your own.


So I can have a giant database of all the most recent copyrighted works, on my home computer, as long as I claim I'm using it for training a model? And if I happen to listen to some music or watch some movies too, who's going to know?

I think an argument could be made that it's fine for an LLM to learn from copyrighted works, but maybe it should have to go to the library to do it. Having your own human-accessible copy of those works (at home, or at Meta), doesn't sound as acceptable to me.


Sure, I think you’re right, it’s reasonable to expect a company to at least buy a single copy of each book (or whatever it eats) it ingests if the work could not otherwise be legally obtained for free. Or something reasonable along those lines


Yes, let’s shut out the small players now go Microsoft can bilk us


It’s not Microsoft’s fault that people are generally required to buy a book in order to read it. You could say the same thing about any business with a high capital cost required to get up and running. Want to start a supermarket? That’s many $Millions to plan, buy/rent land, build the facility, stock the shelves, etc. Starting any business costs money, and some cost a lot of money. Suppliers of the materials or feedstock required to start them aren’t to blame. If I want to fine tune a LLM to be an expert on the Dune book series, should I be allowed to download a copy without paying, or should I have to pay $8 for a copy before I feed it to the model? I’m sure the publisher would want me to pay some ridiculous amount but that is a different argument than requiring the purchase of a single copy that anyone would have to do to get access to the content.


Pretty much everything nowadays is copyrighted, by omitting such materials, what are you really left with?

LLM is a tool much like the internet is a tool. Yes, someone can use it to steal, but stealing is against the law.

Instead of encoding a criminal justice system into an LLM by omitting the possibility of stealing an artists work or omitting the knowledge of physics so someone can't learn how to build a bomb, we should instead just prosecute people for using it in that way intentionally.

How often do people get prosecuted for ripping off an artists style? The criminal justice system "hated artists" long before LLMs and it's not the responsibility of the tech. companies to rectify that in my opinion.


You're not addressing the massive abuse of the commons this still represents. If artists don't have the right to tell you to fuck off for using their work in training data, they're less likely to publicly show that work, which hurts them because they become less visible and hurts the AI because the training data gets worse.


Go on youtube and type "copy arstyle". Now tell me how artists were not stealing from each other.


Artists are generally pretty encouraging to people entering the field and using their stuff as reference for new artists. That actually contributes to art. It's definitely not the same thing as a massive tech corporation trying to automate their livelihoods, but please keep making this flawed argument analogizing two completely different processes.


> It's definitely not the same thing as a massive tech corporation trying to automate their livelihoods, but please keep making this flawed argument analogizing two completely different processes.

Ah yes, the famous massive tech corporation behind stable diffusion.


> Now tell me how artists were not stealing from each other.

Key words being "from each other". AI only takes, it doesn't give anything back, it doesn't inspire. Their ultimate goal is to absorb the entire human history of art and then displace millions of people who received nothing from this transaction they were forced into, just so that some billionaire can afford another yacht.

I have no problems with AI models, but if you want to use art, writing, code, etc. for training, you should restrict your use to public domain works, ask for consent, or commission it. Obviously that would cost a lot of money so big tech companies are once again looking for a free ride.


> AI only takes, it doesn't give anything back, it doesn't inspire.

Totally wrong. There is even a well known counter example: dream-like videos generated by ai wasn't something we had before. This statement tells me you never used it.

> just so that some billionaire can afford another yacht

You are heavily mistaken, that the scenario where ai learning isn't considered as fair use of material. In this case, only megacorps will be able to trains their own AI by spending billions in content.

> Obviously that would cost a lot of money so big tech companies are once again looking for a free ride.

You are constructing your own story while ignoring what is happening.

Adobe and Dall-e models were trained with datasets they mostly had rights over. OpenAI partnered with Shutterstock, and Adobe have their own photo stock. Big tech companies didn't had problem looking for image content. Stable Diffusion on the other hand is open source & open research, but didn't had any rights on most of the image they trained.


They do have an option; they can choose to not publish their work. By digitizing your creation, you are creating a version of your work that can be distributed at practically-free prices with little effort. If that undermines or undervalues the art you make enough, you can choose not to share it.

Even Open Source advocates don't really have the right to stop companies from using Open code. The license discourages it, but everyone from Tesla to Nintendo has been caught violating it's terms. Publishing stuff on the open web has always had consequences, unfortunately.


There is a lot of mental gymnastic, to submit an artwork publicly on the internet, allowing everyone to copy your arstyle, make derivative art of it, have other learn from it, but if ever a machine 'learn' from it, it's "stealing". It's not because you don't like a derivative work, that the derivative work is "stealing" your content. Saying it's stealing is wrong, it's lying to get your point accross.

You blame "how tech is going to steal everyone copyrighted works", yet, they were already in the tech world: the internet.


It's a lot of mental gymnastic to think machine learning = human learning. Especially on HN, where people should understand scale matters a lot in real world.

It's generally consider ok to sell your fanart on comic festivals. But do you think it's okay that Disney starts selling fanart of One Piece without the publisher's permission?

A car and a pair of legs both move you from A point to B point. So why do laws treat automobiles and pedestrians so differently?


> It's a lot of mental gymnastic to think machine learning = human learning.

That's why I have put it in quotes.

> But do you think it's okay that Disney starts selling fanart of One Piece without the publisher's permission?

It's why it's called "fair use". Fanart isn't the only case of fair use. Parody for example allows it.

And AFAIK, no court yet have decided that AI "learning" is not fair use or what the limite are.


> have other learn from it, but if ever a machine 'learn'

Why did you put one of those in quotes and not the other?


Because these two are similar, yet not the same.


I believe LLMs should be allowed to read/view/consume content and learn from it even if that content has a copyright.

We phrase it like somehow the material is being copied into the LLM, but that’s not what it’s doing. It’s building a neural graph from the experience of consuming that content.

What would the world be like if humans couldn’t learn, train the weights of the interconnects of their neural tissue, from any material with a copyright?


It’s a form of lossy compression. Can I strip the copyright off an image by JPEG compressing it?

At the very least I think LLMs trained on data that the trainer does not own or have rights to use in that manner should not be copyrightable.


All knowledge is lossy compression.

My thinking “the enemy gate is down” when considering the tokens “Ender’s Game” is my recalling a learned association of those tokens to the given token string.

My knowing that doesn’t strip the copyright. My telling someone the meaning and context of the phrase generally doesn’t strip the copyright away from Orson Scott Card. I’m not reproducing his work but my knowledge of it. And it’s dependent on what I do with that knowledge and how if I’ve violated his copyright.

We are prosecuting the LLMs for possessing fragments of knowledge. And we’re assuming that the recall of some of those fragments means a copy of that work is in fact contained within the weights.


An LLM is a lossy compression of the internet and I think it should be treated as such. You can't copyright the internet itself.


A mathematical transformation of the data is not enough to qualify as a transformative work. Saving a copy-written work in a lossy compression does not negate the copywrite.


I wonder if there's a future title like "AI Model Artist", "AI Model Contributor", "AI Model Author", "AI Model Creator" or something for creators who contribute to the models in use.

While I love we can all immediately benefit from previous works, I feel for the countless artists and creators who's work has been integrated into models with zero compensation. There are already outstanding lawsuits seeking reparations, but this is new legal territory and there will be a lot to learn and decide in the future.


I think if AI art damages the commons to an extreme extent, there's a real possibility that AI companies pay artists to make whatever the fuck they want to keep the training data coming.


We know who copyright is meant to protect, and it's not artists: it's Disney's shareholders. Pay writers and actors a fair cut, and then maybe people will take seriously that copyright is for artists.


It's both. Because we don't live in a child-like, black & white world.

Copyright like many regulatory devices can be used for good, evil and everything in between.


Modern copyright is not a good, it's an evil.


That depends on the jurisdiction, but in my view I err on the side of "but it's necessary."

A society without copyright would be a much poorer one, and humans figured that out a long time ago.


A society without copyright would be a much poorer one

That was before "Attention Is All You Need" and the subsequent work on LLMs that grew out of it. Things are different now.


I don't think you can look to the past to picture a future with AI.


Authors, artists, actors etc whose work have been ripped off by mega corporations like Meta, OpenAI, Google etc would disagree.

Copyright may need to be updated to cater for the new world of AI but that doesn't mean as a concept it is evil.


How are Modern Copyright laws helping authors, artists, and actors in those cases?


It's allowing them to sue OpenAI for copyright infringement:

https://www.theguardian.com/books/2023/jul/05/authors-file-a...


It's worth noting you can sue for just about anything, but a case could end up being dismissed or you could simply lose it.


That hasn't protected copywritten works in the past: https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....


Allowing them to block human intellectual progress may not be the long-term win you assume it is.


And some would argue that billion dollar companies like OpenAI, Meta etc should figure out a way to compensate the people who created the content they depend on.


Making the uncensored, unbiased, un-"aligned" results available to all for a fair price is compensation enough, I think. That's what we should be pushing for.

Whether demon or genie, any notion of putting this tech back in the bottle is a non-starter. As is turning it into a giant money grab for the copyright industry.


People should be paid for their labor.


Then they shouldn't give away their work for free.


They're not. They have a copyright. Practicing artists usually benefit professionally from maintaining a public portfolio. Data being public is also notably not a license to use it for whatever purpose you want. Have some respect.


I have a copyleft, for all the good it does my work. Copyright protects artists from nothing, and leaving it uncontested harms the consumer more than the artists.

> Data being public is also notably not a license to use it for whatever purpose you want.

Under the Fair Use doctrine, it very well could be. It was when Google indexed every book they could buy, to the disdain of the Author's Guild and the titleholders they represented.

> Have some respect.

I will not respect an authority that forces me to rent digital content.


> I will not respect an authority that forces me to rent digital content.

They are very obviously asking you to respect the individual artists.


I can't imagine what goes through the head of someone who thinks they don't have the right to say no. They don't have the right to be listened to or the right to use state power to enforce a colloquial notion of IP and rights with no basis in law or fact, but they surely have the right to say "No!"

In all seriousness, they're afraid because they aren't very pleasant. AI will write things people will like and enjoy hearing. Preachy artists these days just make works they think people ought to want, and ought to like. Getting moralism out of art will be a great technological accomplishment.


Moralism! In art! Heaven forfend. How dare that art have a message, right? Never in history, not once, before the Woke Tens and, I guess, the Woker Twenties did somebody have an opinion and encode it in allegory. It is a purely novel phenomenon and all known media before this dark age was absolutely without message and without any editorial lens to determine what is and is not to be depicted--for that, of course, would communicate values and ideas, and art simply did not do such a thing.

More seriously: if you have this attitude for-reals and aren't Poe's Lawing me, perhaps you need a little more preaching aimed your way because your conception of art and the civic society is the closest thing I can think of to a humanities position being objectively wrong.

The ideological soma of agree-with-you AI being desirable is scary enough stuff. Removing communication from art being a "great technological accomplishment" is fucking dystopic.


I want to see the world where this happens, where everything that made life enjoyable is corporatized, standardized, optimized, one-dimensionalized. Since it will happen over the course of your lifetime, you'll get to see it happen - as things that were fulfilling cease to be, an ennui you can't escape sets in because you built every wall of your cage one by one, and all you can do is regret. And then the Black Mirror helmet comes off your head, everyone else points and laughs at you, and then puts the helmet back on.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: