In practice, it's unclear how well avoiding training on NSFW images will work: t... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

minimaxir on Nov 24, 2022 | parent | context | favorite | on: Stable Diffusion 2.0

In practice, it's unclear how well avoiding training on NSFW images will work: the original LAION-400M dataset used for both SD versions did filter out some of the NSFW stuff, and it appears SD 2.0 filters out a bit more. The use of OpenCLIP in SD 2.0 may also prevent some leakage of NSFW textual concepts compared to OpenAI's CLIP.

It will, however, definitely not affect the more-common use case of anime women with very large breasts. And people will be able to finetune SD 2.0 on NSFW images anyways.

kmeisthax on Nov 24, 2022 [–]

The main reason why Stable Diffusion is worried about NSFW is that people will use it to generate disgusting amounts of CSAM. If LAION-5B or OpenAI's CLIP have ever seen CSAM - and given how these datasets are literally just scraped off the Internet, they have - then they're technically distributing it. Imagine the "AI is just copying bits of other people's art" argument, except instead of statutory damages of up to $150,000 per infringement, we're talking about time in pound-me-in-the-ass prison.

At least if people have to finetune the model on that shit, then you can argue that it's not your fault because someone had to do extra steps to put stuff in there.

SXX on Nov 24, 2022 | | [–]

> If LAION-5B or OpenAI's CLIP have ever seen CSAM

Diffusion model dont need any CSAM in training dataset to generate CSAM. All it's need is any random NSFW content alongside with any safe content that includes children.

evouga on Nov 24, 2022 | | | [–]

So I definitely see an issue with Stable Diffusion synthesizing CP in response to innocuous queries (in terms of optics—-the actual harm this would cause is unclear).

That said, part of the problem with the general ignorance about machine learning and how it works is that there will be totally unreasonable demands for technical solutions to social problems. “Just make it impossible to generate CP” I’m sure will succeed just as effectively as “just make it impossible to Google for CP.”

dhdgrygev on Nov 24, 2022 | | | [–]

It sometimes generates such content accidentally, yes. Seems to happen more often whenever beaches are involved in the prompt. I just delete them along with thousands of other images that aren't what I wanted. Does that cause anyone harm? I don't think so...

creata on Nov 24, 2022 | | | | [–]

> I’m sure will succeed just as effectively as “just make it impossible to Google for CP.”

So... very, very well? I obviously don't have numbers, but I imagine CSAM would be a lot more popular if Google did nothing to try to hide it in search results.

fastball on Nov 24, 2022 | | | [–]

Is artificially generated CSAM that doesn't actually involve children in its production not an improvement over the status quo?

theclansman on Nov 24, 2022 | | | [–]

I remember Louis CK made a joke about this, in regards to pedophiles (who are also rapists), what are we doing to prevent this? Is anyone making very realistic sex dolls that look like children? "Ew no that's creepy" well I guess you would rather them fuck your children instead. It's one of those issues that you have to be careful not get too close to, because you get accused by proximity, if you suggest something like what I said before people might think you're a pedophile. So in that way, nobody wants to do anything about it.

kmeisthax on Nov 24, 2022 | | | | [–]

No, it's not.

The underlying idea you have is that the artificial CSAM is a viable substitute good - i.e. that pedophiles will use that instead of actually offending and hurting children. This isn't borne out by the scientific evidence; instead of dissuading pedophiles from offending it just trains them to offend more.

This is opposite of what we thought we learned from the debate about violent video games, where we said stuff like "video games don't turn people violent because people can tell fiction from reality". This was the wrong lesson. People confuse the two all the time; it's actually a huge problem in criminal justice. CSI taught juries to expect infallible forensic sci-fi tech, Perry Mason taught juries to expect dramatic confessions, etc. In fact, they literally call it the Perry Mason effect.

The reason why video games don't turn people violent is because video game violence maps poorly onto the real thing. When I break someone's spine in Mortal Kombat, I input a button combination and get a dramatic, slow-motion X-ray view of every god damned bone in my opponent's back breaking. When I shoot someone in Call of Duty, I pull my controller's trigger and get a satisfyingly bassy gun sound and a well-choreographed death animation out of my opponent. In real life, you can't do any of that by just pressing a few buttons, and violence isn't nearly that sexy.

You know what is that sexy in real life? Sex. Specifically, the whole point of porn is to, well, simulate sex. You absolutely do feel the same feelings consuming porn as you do actually engaging in sex. This is why therapists who work with actual pedophiles tell them to avoid fantasizing about offending, rather than to find CSAM as a substitute.

charcircuit on Nov 24, 2022 | | | [–]

>The reason why video games don't turn people violent is because video game violence maps poorly onto the real thing

I don't believe this is the reason. By practicing martial arts which maps well to real life violence I do not see an increase of violent behaviour. Similarly playing FPS games in VR which maps much closer that flat screen games does not make me want to go shoot people in real life. I don't think people playing paintball or airsoft will turn violent from partaking in those activities. The majority of people are just normal people are not bad people who would ever shoot someone or rape someone.

>You know what is that sexy in real life? Sex.

Why is any porn legal then? If porn turned everyone into sexual abusers I would believe your argument, but that just isn't true. If it were true that a small percentage of people who see porn will turn into sexual abusers I don't think that makes it worth banning porn altogether. I feel there should be a better way that doesn't restrict people's freedom of speech.

SergeyHack on Nov 26, 2022 | | | | [–]

> You absolutely do feel the same feelings consuming porn as you do actually engaging in sex

I can't believe someone says this. It's so not true in my experience. These feelings have a lot in common, but they are definitely not the same.

Hamuko on Nov 24, 2022 | | | | [–]

"Artificially-generated CSAM" is a misnomer, since it involves no actual sexual abuse. It's "simulated child pornography", a category that would include for example paintings.

novatad on Nov 24, 2022 | | | [–]

Very much this. If someone goes out and trains a model on actual photographs of abuse, then holy shit, call in the cops.

If someone is generating sketchy cartoons from a training set of sketchy cartoons... well, gross, but there's no victims there.

just-ok on Nov 24, 2022 | | | | [–]

Not exactly, since the abuse needed to actually happen for the derivative images to be possible to generate.

Hamuko on Nov 24, 2022 | | | [–]

Is Stable Diffusion only able to generate images of things that have actually happened?

just-ok on Nov 24, 2022 | | | [–]

Hmm, that’s a good point. It seems to be able to “transfer knowledge” for lack of a better term, so maybe it wouldn’t need to be in the dataset at all…

SequoiaHope on Nov 24, 2022 | | | | [–]

I have no answer to this but I have seen people mention that artificial CSAM is illegal in the USA, so the question of whether it is better or not is somewhat overshadowed by the very large market where it is illegal.

PartiallyTyped on Nov 24, 2022 | | | | [–]

Reminds me of flooding a market with fake rhino horn. Idk whether it worked though.

mook on Nov 24, 2022 | | | | [–]

I believe the status quo is non-realistic drawings (think Lisa Simpson) can be illegal.

I don't think the fact that it's artificially generated has any bearing for some important purposes.

kyletns on Nov 24, 2022 | | | | [–]

lol there's a piping hot take

charcircuit on Nov 24, 2022 | | [–]

>then they're technically distributing it.

The model does not contain the images themselves though. I think it would not be classified as that.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact