4o, despite OpenAI's practically draconian content policies, is a pretty big leap forward. I put together a comparison of some of the most competitive generative models (Imagen, 4o, Flux, and MJ7) where I prioritized increasingly difficult prompt adherence. If Imagen 3 had 4o's multimodal capabilities (being able to make constant adjustments against a generated image by prompting) I would say its nearly on-par with 4o.
https://genai-showdown.specr.net