This comes up time and time again. People claim these models are mind blowing. But then someone will post something where the model falls flat on its face, and they just get a bunch of that's too complex or that's the wrong type of thing to ask.
So it ends up that these models are awesome if you ask them questions from a narrow set of things and if you assume what they respond with is correct.
Well, at least in this subthread, the model is only failing at the same things humans are failing at too. To see the mind-blowing part, stop treating GPT-4 like the Oracle in Delphi, and start treating it as "first comes to mind" answer (aka. the inner voice) - and then notice the failure modes are pretty much the same like with humans. For example, coercing a trick question into a similarly-sounding straight question, and answering it before realizing the person asking is an asshole.
I was originally making the point that these models struggle with even basic mathematics (of the true kind, not arithmetic — though of course they struggle with that too). My point here was to play devil’s advocate and be slightly forgiving of the model, since I as a human am likely to be tripped up by similar trick questions. Since we don’t really know ‘how these models think’ (have much idea of the emergent world model they build) we are stuck in constant debate about whether they’re really quite amazing or absolutely pathetic.
So it ends up that these models are awesome if you ask them questions from a narrow set of things and if you assume what they respond with is correct.