Hacker News new | past | comments | ask | show | jobs | submit login
Three-eyed forehead in Stable Diffusion (ahrm.github.io)
112 points by hexomancer on Jan 4, 2023 | hide | past | favorite | 51 comments



This reminds me of something a few friends and I tried a couple of months ago: No matter what prompt was used, neither Midjourney nor Dream Studio could generate an image of a man wearing a red suit jacket with a blue shirt. (We were trying for red suit + blue shirt + white tie... but even just the first two proved impossible.) Presumably the combination is so unusual as to run counter to the training data of the models. Likewise for a forehead with three eyes.


I tried to make Dall-E 2 draw "Santa giving a massage to a woman" (to personalize a gift certificate), and the results were bizarre. Mostly I got women giving Santa a massage, but even those had worse artifacts than usual. When I adjusted the prompt to be more explicit as to who was giving the massage and who was receiving it, the results got increasingly weird.

With some finagling, I got an acceptable image, but it was much worse than I would have expected. On the other hand, the whole thing made for a funny story to tell when I was handing over the present.

https://imgur.com/a/3UWLBeV

This is line art (because I wanted to print it in black and white), but the non-line-art style had the same issue. Also it's Saint Nicholas and not Santa, but I just confirmed that the issue still persists with Santa.


I suspect Dall-E 2 has had some kind of 'negative training' for pornography... Ie. something beyond just filtering porn from the training set. Perhaps they put pornographic images through it with a reversed loss function?

And A woman giving santa a massage is just too similar to a bunch of porn...


You've misread. Santa is supposed to be giving the massage, but instead the woman is in the generated output. If your assumption is correct about porn having lots of women massaging men, then these outputs point to positive training with porn, rather than negative.


The generated images also had almost all women wearing heels, wearing short skirts, showing cleavage, ...


Stable Diffusion, using the prompt "red suit jacket on blue shirt":

https://imgur.com/a/bwjQLxv

Further fine-tuning would certainly improve the final result.


Interesting. Thanks. For some reason, at the time, it didn't work at all -- but I've double checked it right now with Midjourney & in Dream Studio, and it works now using the same old prompts. Old sticking points are being swept aside pretty fast...

ETA: They still can't seem to make it with the white tie, though!


Still funny how it loves six fingers per hand


Even for humans hands are very difficult! You usually make use of these: https://thumbs.dreamstime.com/z/isolation-artist-s-wooden-jo...


Just goes to show how weird the people wearing blue shirts under red jackets were in the training images.


On the other hand, I, too, could not draw a hand if my life depended on it.


Oh wow, I had to go back and check after reading your comment. Totally missed that at a quick glance.


And the eyes don't seem to be symmetrically placed.


Generate an image you like.

Open it up in GIMP, paint the shirt blue and jacket red.

Feed resulting image into img2img with a low denoising strength (0.3-0.4 maybe) with the textual prompt of what you want.

This can help you get combinations of things the AI doesn't easily do itself.


this is pretty fascinating to think about the implications. Everyone assumes that AI is going to be all powerful and scary, but these little edge cases just completely stop it in its tracks. It makes me wonder if any of the self-driving car projects have run into weird edge cases like this that which has stopped them from releasing it more seriously.

I can see many types of clothing could completely throw off the cars in a crosswalk type situation. Reflections are solved with lidar, I wonder if a sudden breeze picks up a dense cloud of dust, that the car reacts very unpredictably.

Its a possibility that insurance scams will evolve in the future to target these vehicles using adversarial techniques that for humans are obvious. How could we even stop that? It'd likely be a cat and mouse game.


An astronaut riding a horse is unusual too but it can make that.

I think the reason is it doesn't understand compositionality, meaning it doesnt have any logic or "intuition" about the relationships between the colors and items (or eyes and forehead) so it fails to constrain the way they are assembled in the image.


> Likewise for a forehead with three eyes.

No Shiva images were included in the training data?


On a similar note to Stable Diffusion refusing to put 3 eyes in the middle of a sci-fi character's forehead: I have been experimenting with GPT-3 rewriting some of my sci-fi stuff. It's really funny because it right away tries to steer the plot into the most cliche sci-fi storyline and characterization possible where all the characters are perfect almost superhero like action heroes capable of incredible feats of strength and agility. My characters have a lot of flaws, and aren't impressive in an action movie sort of way so GPT-3 winds up being almost totally unusable.


This is achievable without copy/pasting eyes: If you're using the Automatic111 GUI, go to img2img -> inpaint, mask the area for one eye (on the forehead), enter prompt, and set padding = 0 and denoising accordingly (0.4 - 0.6 would be acceptable). Repeat for all three eyes. You can add practically anything to an image with inpainting, provided your prompt and padding is correct.


This is what Invoke's Stable Diffusion canvas solves for.

https://youtu.be/RwVGDGc6-3o


Dall-E has an interesting take on the problem: https://labs.openai.com/sc/JZIuAmvnELh8cMnBsLRVo5qk


Eyes? I thought they were rings/piercings (in the original).


One of the linked resources in the article is a great high-level overview of how Stable Diffusion works:

https://stable-diffusion-art.com/how-stable-diffusion-work/

It’s a quick read and I found it very helpful.


Very good indeed


Recent and related:

Remaking old computer graphics with AI image generation - https://news.ycombinator.com/item?id=34212564 - Jan 2023 (73 comments)


Not really on his forehead.

Six fingers, yeah, and also car wheels are a random mess of spokes and bolts.


There's a brilliant YouTube animation of a woman who applies one of those anime AI filters to herself. At first she's happy with the results, as it made her face considerably more beautiful. Then she discovers she has a few additional fingers, and her foot has been twisted into a sort of hand/foot/shoe deformity. Then she looks outside and sees one of those AI-generated hellscapes with a black hole where the sun should be.

Best horror film of last year.

https://m.youtube.com/shorts/G9D_PL61_pA


I would suggest you give the link and encourage people to watch the 20 second video for themselves before you list everything that happens in it.


I only have a 3060 laptop gpu and an im2im run like this barely takes 3 seconds. It's really fun and near real time if you keep the unet loaded in vram in between runs instead of re-loading it like calling a script would likely do.


Square wheels is another fun example of how AI art is still super bad.


I’ve tried to get stable diffusion to draw 3-armed pianists, or pianists with extra fingers, and failed, probably for the same reasons this was difficult


Just ask it to draw two armed pianists and it will happily generate scads of three armed ones.


In Midjourney this could be done using miltiprompting, and Automatic's webui supports an analogous gesture with the AND keyword.


Making a cyclops was a long-running challenge among the users of Midjourney. Turns out multiprompting “photo of a girl:: big eye” so that the AI would attempt to make both concepts individually and simultaneously in the same image would do it.

https://www.reddit.com/r/deepdream/comments/v9jr73/how_can_y...


Pretty cool article, I'd say the final result is kinda underwhelming, though.


I didn't really spend any time to make the result look good (I am not an artist, I was more interested in the technological aspect rather than the art aspect).


Disco diffusion is worth a go, it's images are much more dream like.


Looks like it stole Alex Ross style.


"each inpainting took about 20 seconds which was quite annoying. But I could envision a future where generation is basically real-time, imagine navigating through possible generations using mouse wheel and tweaking the parameters and seeing the effects in real-time"

This is really funny actually, considering what basic Photoshop tools are capable of out of the box :)


I think it's amusing how completely you miss the capability of what OP did, vs what photoshop can do.

1. Generate me an eye

2. now place 3 of these on a head where I choose

3. merge it in for me, in some way that fits the overall image.

The 4th step (missing due to time it takes) iterate over a few iterations till I see one I like.

3 years from now a novice will be able to do more advanced photoediting than a pro inside photoshop right now.


I fully understood what he did, but what he dreams about is already possible now in another environment in real time. I guess AI performance will catch up quickly. Meanwhile combining different programs yield the best result. I used like 3-4 tools to get a final render I was happy with when I felt the need to play with them.

It's already the case that photoediting for novices are a reality, but the level of control is lacking.


Can you show me how "basic photoshop tools" can do this in 20 seconds?: https://ahrm.github.io/images/2023-01-02-three-eyes/inpainti...


It's called feather edges for selection, it creates a blur around the selection.

https://lh3.googleusercontent.com/OFIpc-iFtO1AQty-IJ6beLEGe6...

He already had a good enough lantern, all he had to do is to set a tasteful setting for this.


Thank you, I am aware what feather edges is. Here are the original photos:

https://imgur.com/a/SgQ38bw https://imgur.com/a/Gi0IZtB

Can you apply feather edges for me and compare the results (and let's use the gentlemen's agreement that you will not spend more than 20 seconds)?


I don't have photoshop :) Photopea on the other hand supports this.

I can't do it in the time constraint, if that's the goal of course AI wins.


Well you claimed existing tools can do this in "real time" so 20 seconds is way more than that.


I did see the results of my action in real time. Some tasks are easier and faster with PS and some aren't :)


Yeah, the entire thing reads like a case of tunnel vision. Even image-gen models are capable of much more than just cherrypicking and "tweaking the parameters", not to mention of the entire range of image manipulation tools.

Model output being defined by the training data is still a valuable observation for many unaware of the limitations, regardless of how obvious it might be for others.


There's a lot of things I try to do in Stable Diffusion that I could do in minutes in Photoshop.

But half the fun is trying to do it entirely inside the generative art tools available. If I were really trying to make something good I'd use both together.


Is photoshop cabale of inpainting what you tell it to inpaint?


With plugins, yes, but that's not the point. Inpainting is great, but using inpainting to get rid of the masking outline is funny to me. There are healing tools in PS that are designed for this exact purpose and they work in real time :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: