My stable diffusion output looks awful. I've been trying to recreate the xkcd ab...

CuriouslyC · on Sept 13, 2022

SD isn't great at generating images for detailed, weird prompts (at least not compared to DALL-E2). If you're not great at prompt writing or just having bad luck, you can use img2img with a rough sketch of what you want.

capitalsigma · on Sept 13, 2022

Here, a less specific prompt:

"Portrait of Joe Biden in the oval office, 4k render"

First attempt: https://pasteboard.co/IYo5m6KeaqF4.png

OK, I'll grant you that all of the parts of a face are there and reasonably correct (except for the bottomless pits of darkness in his nose and mouth)

Second attempt, I end up with these weird artifacts in his head half of the time (3/5 of my generated images)

* https://pasteboard.co/xDFv9KD7Or4U.png

* https://pasteboard.co/j7PqHPgorZ9G.png

Am I holding it wrong somehow?

filoleg · on Sept 13, 2022

What is your guidance scale number, the number of iterations, and the chosen sampler? Those would be very relevant to know. Pretty much the most relevant thing aside from the prompt itself.

Setting guidance scale number higher typically results in imagery getting trippier and more surreal with more artifacts. So i feel like that's the main culprit for the artifacts.

I am pretty curious to see how far we can get with this prompt. So I will try playing with it later today and post the results and what I found in a reply to this comment.

capitalsigma · on Sept 14, 2022

I'm using the default settings on the webui, here are the parameters:

``` Portrait of Joe Biden in the oval office, 4k render seed:1331361607 width:512 height:512 steps:50 cfg_scale:7.5 sampler:k_lms ```

filoleg · on Sept 14, 2022

Thanks for providing the seed, because that would let me show you how exactly the parameters can affect your specific image without generating a "random" different one every time.

Check out the exact same seed and prompt and cfg_scale, but with the steps aka iteration number at 100 (50 in general feels way too low, even for the samplers that are kinda good with low iteration numbers).

https://pasteboard.co/I6yXg5mZip6D.png

Obvious glitchiness in the face. Below is the same one, but with a k_euler_a sampler (I don't use k_lms, mostly k_euler_a or k_dpm_2_a) + 100 iterations.

https://pasteboard.co/xaTJiN6eVhm2.png

Less glitchiness, but Joe looks more caricature-like, than real. And also, not super quite like Joe. Let's try the same, but at 150 iterations and set the CFG at 10.

https://pasteboard.co/a9OigPXS9Ky1.png

Not much different in terms of realism, but the person looks distinctly way more like Joe. Let's up the iteration number to 200.

https://pasteboard.co/ey0ZzC110CrK.png

We got a bit closer to what we wanted. Faces are a bit of a difficult thing to do, but i think we can figure it out. Overall, it feels a bit "wobbly". I noticed that it tends to be beneficial to decrease the CFG as you increase iteration number, if you want photos to be more photorealistic. Let's set it to 6, and the iteration at 200.

https://pasteboard.co/na1fH54LkqO2.png

I would say this looks pretty good, but I think we can do better. The important part is imo the prompt, and I think we can edit yours to get a bit better results. Here is the result for "portrait of Joe Biden in oval office, dslr" with 200 iterations, CFG at 6, and k_euler_a sampler.

https://pasteboard.co/noHbMgxhLU4s.png

That one was probably my favorite (or maybe it was the one before).

Overall, you can play with this almost infinitely. Adding different words to the prompt in different spots can yield pretty different results. And that's not even mentioning all the parameters one can tune.

deadcoder0904 · on Sept 23, 2022

that was a cool detailed breakdown.

how did you learn all that? i'm a noob at prompt engineering but would love to know how to get good at it with the least amount of time.

any good reasources to get up to speed?

dickholesalad · on Sept 13, 2022

You might try using image to image with the comic and previously mentioned prompt as inputs.