This reminds me of something a few friends and I tried a couple of months ago: No matter what prompt was used, neither Midjourney nor Dream Studio could generate an image of a man wearing a red suit jacket with a blue shirt. (We were trying for red suit + blue shirt + white tie... but even just the first two proved impossible.) Presumably the combination is so unusual as to run counter to the training data of the models. Likewise for a forehead with three eyes.
I tried to make Dall-E 2 draw "Santa giving a massage to a woman" (to personalize a gift certificate), and the results were bizarre. Mostly I got women giving Santa a massage, but even those had worse artifacts than usual. When I adjusted the prompt to be more explicit as to who was giving the massage and who was receiving it, the results got increasingly weird.
With some finagling, I got an acceptable image, but it was much worse than I would have expected. On the other hand, the whole thing made for a funny story to tell when I was handing over the present.
This is line art (because I wanted to print it in black and white), but the non-line-art style had the same issue. Also it's Saint Nicholas and not Santa, but I just confirmed that the issue still persists with Santa.
I suspect Dall-E 2 has had some kind of 'negative training' for pornography... Ie. something beyond just filtering porn from the training set. Perhaps they put pornographic images through it with a reversed loss function?
And A woman giving santa a massage is just too similar to a bunch of porn...
You've misread. Santa is supposed to be giving the massage, but instead the woman is in the generated output. If your assumption is correct about porn having lots of women massaging men, then these outputs point to positive training with porn, rather than negative.
Interesting. Thanks. For some reason, at the time, it didn't work at all -- but I've double checked it right now with Midjourney & in Dream Studio, and it works now using the same old prompts. Old sticking points are being swept aside pretty fast...
ETA: They still can't seem to make it with the white tie, though!
this is pretty fascinating to think about the implications. Everyone assumes that AI is going to be all powerful and scary, but these little edge cases just completely stop it in its tracks. It makes me wonder if any of the self-driving car projects have run into weird edge cases like this that which has stopped them from releasing it more seriously.
I can see many types of clothing could completely throw off the cars in a crosswalk type situation. Reflections are solved with lidar, I wonder if a sudden breeze picks up a dense cloud of dust, that the car reacts very unpredictably.
Its a possibility that insurance scams will evolve in the future to target these vehicles using adversarial techniques that for humans are obvious. How could we even stop that? It'd likely be a cat and mouse game.
An astronaut riding a horse is unusual too but it can make that.
I think the reason is it doesn't understand compositionality, meaning it doesnt have any logic or "intuition" about the relationships between the colors and items (or eyes and forehead) so it fails to constrain the way they are assembled in the image.
On a similar note to Stable Diffusion refusing to put 3 eyes in the middle of a sci-fi character's forehead: I have been experimenting with GPT-3 rewriting some of my sci-fi stuff. It's really funny because it right away tries to steer the plot into the most cliche sci-fi storyline and characterization possible where all the characters are perfect almost superhero like action heroes capable of incredible feats of strength and agility. My characters have a lot of flaws, and aren't impressive in an action movie sort of way so GPT-3 winds up being almost totally unusable.
This is achievable without copy/pasting eyes: If you're using the Automatic111 GUI, go to img2img -> inpaint, mask the area for one eye (on the forehead), enter prompt, and set padding = 0 and denoising accordingly (0.4 - 0.6 would be acceptable). Repeat for all three eyes. You can add practically anything to an image with inpainting, provided your prompt and padding is correct.
There's a brilliant YouTube animation of a woman who applies one of those anime AI filters to herself. At first she's happy with the results, as it made her face considerably more beautiful. Then she discovers she has a few additional fingers, and her foot has been twisted into a sort of hand/foot/shoe deformity. Then she looks outside and sees one of those AI-generated hellscapes with a black hole where the sun should be.
I only have a 3060 laptop gpu and an im2im run like this barely takes 3 seconds. It's really fun and near real time if you keep the unet loaded in vram in between runs instead of re-loading it like calling a script would likely do.
I’ve tried to get stable diffusion to draw 3-armed pianists, or pianists with extra fingers, and failed, probably for the same reasons this was difficult
Making a cyclops was a long-running challenge among the users of Midjourney. Turns out multiprompting “photo of a girl:: big eye” so that the AI would attempt to make both concepts individually and simultaneously in the same image would do it.
I didn't really spend any time to make the result look good (I am not an artist, I was more interested in the technological aspect rather than the art aspect).
"each inpainting took about 20 seconds which was quite annoying. But I could envision a future where generation is basically real-time, imagine navigating through possible generations using mouse wheel and tweaking the parameters and seeing the effects in real-time"
This is really funny actually, considering what basic Photoshop tools are capable of out of the box :)
I fully understood what he did, but what he dreams about is already possible now in another environment in real time. I guess AI performance will catch up quickly. Meanwhile combining different programs yield the best result. I used like 3-4 tools to get a final render I was happy with when I felt the need to play with them.
It's already the case that photoediting for novices are a reality, but the level of control is lacking.
Yeah, the entire thing reads like a case of tunnel vision. Even image-gen models are capable of much more than just cherrypicking and "tweaking the parameters", not to mention of the entire range of image manipulation tools.
Model output being defined by the training data is still a valuable observation for many unaware of the limitations, regardless of how obvious it might be for others.
There's a lot of things I try to do in Stable Diffusion that I could do in minutes in Photoshop.
But half the fun is trying to do it entirely inside the generative art tools available. If I were really trying to make something good I'd use both together.
With plugins, yes, but that's not the point. Inpainting is great, but using inpainting to get rid of the masking outline is funny to me. There are healing tools in PS that are designed for this exact purpose and they work in real time :)