I tried this on ChatGLM, a frontier foundation model developed by Zhipu.ai and Tsinghua University, and it gave the correct answer:
https://chatglm.cn/share/FoZBJ
A comment from Boris Power, an OpenAI guy:
The top line number for MMLU is a bit gamed - Gemini is actually worse than GPT-4 when compared on normal few shot or chain of thought
https://twitter.com/BorisMPower/status/1732435733045199126
"Applying AI to core search algorithm. We’ve also applied the AI model to our core Bing search ranking engine, which led to the largest jump in relevance in two decades. With this AI model, even basic search queries are more accurate and more relevant."
The large foundational model will change lots of things!