glid-3 is trained specifically on photographic-style images, and is a bit better...

glid-3 is trained specifically on photographic-style images, and is a bit better at generalization compared to the latent diffusion model.

eg. prompt: half human half Eiffel tower. A human Eiffel tower hybrid (I get mostly normal Eiffel towers from LDM but some sensical results from glid-3)

glid-3 will be worse for things that require detailed recall, like a specific person.

With smaller models you kind of have to generate a lot of samples and pick out the best ones.