I tried the "pull or push a glass door with mirror writing".
I feel it's a huge difference between GPT-4, which seems to be able to reason logically around the issue and respond with relevant remarks, and Gemini Gemini Advanced which feels a lot more like a stochastical parrot.
Gemini quickly got confused and started talking about "pushing the door towards yourself" and other nonsense. It also couldn't stay on point, and instead started to regurgitate a lot of irrelevant stuff.
GPT-4 is not perfect, you can still hit things where it also breaks down.