My stable diffusion output looks awful. I've been trying to recreate the xkcd about Joe Biden eating sandwiches, so I try something like "Joe Biden eating a sandwich in the oval office, 4k render, photograph" and I get nightmare fuel with pieces of bread attached to his head, faces that dissolve into random geometric shapes, toppings that melt into hands while a sandwich sits on a plate in front of him, etc.
I had high hopes based on posts from the SD subreddit, and I figured Biden would be well represented in the training data. Am I missing something?
SD isn't great at generating images for detailed, weird prompts (at least not compared to DALL-E2). If you're not great at prompt writing or just having bad luck, you can use img2img with a rough sketch of what you want.
What is your guidance scale number, the number of iterations, and the chosen sampler? Those would be very relevant to know. Pretty much the most relevant thing aside from the prompt itself.
Setting guidance scale number higher typically results in imagery getting trippier and more surreal with more artifacts. So i feel like that's the main culprit for the artifacts.
I am pretty curious to see how far we can get with this prompt. So I will try playing with it later today and post the results and what I found in a reply to this comment.
Thanks for providing the seed, because that would let me show you how exactly the parameters can affect your specific image without generating a "random" different one every time.
Check out the exact same seed and prompt and cfg_scale, but with the steps aka iteration number at 100 (50 in general feels way too low, even for the samplers that are kinda good with low iteration numbers).
Obvious glitchiness in the face. Below is the same one, but with a k_euler_a sampler (I don't use k_lms, mostly k_euler_a or k_dpm_2_a) + 100 iterations.
Less glitchiness, but Joe looks more caricature-like, than real. And also, not super quite like Joe. Let's try the same, but at 150 iterations and set the CFG at 10.
We got a bit closer to what we wanted. Faces are a bit of a difficult thing to do, but i think we can figure it out. Overall, it feels a bit "wobbly". I noticed that it tends to be beneficial to decrease the CFG as you increase iteration number, if you want photos to be more photorealistic. Let's set it to 6, and the iteration at 200.
I would say this looks pretty good, but I think we can do better. The important part is imo the prompt, and I think we can edit yours to get a bit better results. Here is the result for "portrait of Joe Biden in oval office, dslr" with 200 iterations, CFG at 6, and k_euler_a sampler.
That one was probably my favorite (or maybe it was the one before).
Overall, you can play with this almost infinitely. Adding different words to the prompt in different spots can yield pretty different results. And that's not even mentioning all the parameters one can tune.
I had high hopes based on posts from the SD subreddit, and I figured Biden would be well represented in the training data. Am I missing something?