Would be fascinated to see the DALL-E output for the same prompts as the ones us...

joeycodes · on May 23, 2022

Posting a few comparisons here.

https://twitter.com/joeyliaw/status/1528856081476116480?s=21...

rg111 · on May 24, 2022

Imagen seems more realistic where Dall-E2 is more feel-good.

That is what I feel personally.

joeycodes · on May 24, 2022

I agree with you, but for me, Dall·E 2 feels good because 90% of the time I can keep hitting the generate button and massage the prompt until I get something inspirational, surprisingly, or visually pleasing. Without access to Imagen, it's impossible for me to compare how much of the "realistic feels" of its images is constrained by the taste of the cherry-pickers.

dclowd9901 · on May 24, 2022

Looking at these… I can’t help but wonder if these are literal examples of AI imagination?

joeycodes · on May 24, 2022

I've started to ask myself if my own creativity is a result of random sampling from the diffusion tapestry of associated memories and experience on that topic.

albertzeyer · on May 24, 2022

What else could creativity possible be?

Nition · on May 24, 2022

I do wonder what Dall-E 2 would output for a request along the lines of "A still life of a vase of flowers in a completely new art style."

zimpenfish · on May 24, 2022

Don't have access to Dall-E 2 or Imagen but I do have [1] and [2] locally and they produced [3] with that prompt.

[1] https://github.com/nerdyrodent/VQGAN-CLIP.git [2] https://github.com/CompVis/latent-diffusion.git [3] https://imgur.com/a/dCPt35K

Nition · on May 24, 2022

Nice. Latent-diffusion has come out very traditional but the VQGAN/CLIP ones are fairly original.

zimpenfish · on May 24, 2022

From my experiments, the LD one doesn't seem to have been trained on as big or as tagged data set - there's a whole bunch of "in the style of X" that the VQGAN knows* about but the LD doesn't. That might have something to do with it.

qclibre22 · on May 23, 2022

See the paper here : https://gweb-research-imagen.appspot.com/paper.pdf Section E : "Comparison to GLIDE and DALL-E 2"

thorum · on May 23, 2022

Imagen seems better at capturing details/nuance from the prompt, but subjectively the DALLE-2 images feel more “real” to me. Not sure why. Something about the lighting?

ravi-delia · on May 24, 2022

That feels about right. Imagen has a better text processing model, so it can tease apart the prompt, but DALLE has a rocking image part.