The reason you can't get the images you want from it is not because of the noise...

astrange · on Oct 4, 2022

It's not just size but also model architecture. DALLE mini (craiyon.com) has the opposite priority because of its different architecture; you can enter a complex prompt and it will follow it, but it's much slower and the image quality is a lot worse. SD prefers to make aesthetic pictures over listening to everything you tell it.

You can improve this in SD by raising cfg_scale at the cost of some weird "oversharpening" artifacts. Or, you can make a crappy image in DallE mini and use that as the img2img prompt with SD to make it prettier.

The real sign it's lacking intelligence is, if you ask it a question it won't draw the answer, it'll just draw the question. Of course, they could fix that too, it's got a GPT in it, they just don't let it recurse…

l33tman · on Oct 5, 2022

Yeah true, I like dalle-mini :) It did seem to understand the prompts better.

The training set also affects it, as the guidance signal competes with the diffusion-model's priors it learned from the training set (the cfg_scale) and I've found situations where it seems the priors are just encoded too strong it seems - for example with very well-known celebs or objects it's difficult to make variations.

I guess it's interesting that these issues are kind of reflected in humans as well.