Would be fascinated to see the DALL-E output for the same prompts as the ones used in this paper. If you've got DALL-E access and can try a few, please put links as replies!
I agree with you, but for me, Dall·E 2 feels good because 90% of the time I can keep hitting the generate button and massage the prompt until I get something inspirational, surprisingly, or visually pleasing. Without access to Imagen, it's impossible for me to compare how much of the "realistic feels" of its images is constrained by the taste of the cherry-pickers.
I've started to ask myself if my own creativity is a result of random sampling from the diffusion tapestry of associated memories and experience on that topic.
From my experiments, the LD one doesn't seem to have been trained on as big or as tagged data set - there's a whole bunch of "in the style of X" that the VQGAN knows* about but the LD doesn't. That might have something to do with it.
Imagen seems better at capturing details/nuance from the prompt, but subjectively the DALLE-2 images feel more “real” to me. Not sure why. Something about the lighting?