Hacker News new | past | comments | ask | show | jobs | submit login

Models don't store any artist's works. They are way too small to do that.



Getty has alleged that Stable Diffusion is sometimes returning some of their copyright images[1]. Even if the model seems too small to directly store the images, it seems at least plausible to me that the parameters can act as a compression such that the model could just output an almost direct copy of an original. I have certainly seen stable diffusion emit images which look like a getty watermark has just been blurred out.

[1] https://www.reuters.com/legal/getty-images-lawsuit-says-stab...


It doesn't store the original images, but it has learned how getty images watermark looks like and where it's located, because it has been repeated millions of times. So it sometimes can return that.

This is why it's important to clean up the training dataset. To remove duplicates, images containing watermarks, images that are too similar to each other and so on.


Models don't store any artist's works. They are way too small to do that.

I have close to none knowledge on this subject but I find it very curious and I'd like to know more on that because it seems to me they don't store it but only in the traditional sense. For example, if you could procure a quote, for instance, I asked (chatgpt):

- "In Game of Thrones what did Jon Snow say to Arya when he gave her the sword named 'needle' ?",

- and it answers: "[...] "Stick 'em with the pointy end. [...]"

Then it indicates to me that the information is there. Maybe we should consider that the model actually stores the information but the information is compressed ? Could you ask midjourney to recreate the Mona Lisa of Mickey Mouse ? The right information can't just appear out of thin air. If I recall correctly, someone had some success in identifying and modifying the right neurons or weights of some LLM which changed it "opinion" on where Rome is located ?


This is not precisely true. It's been shown that image models can reproduce certain works almost exactly (up to very minor differences). It takes some effort to find such pieces but they exist.


It takes a lot of effort and the results aren't that great. (low resolution, bad hands) It's way easier to just find the original image and use that instead.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: