Indeed, and the ability to make heads or tails of slightly-slippery problems of ...

aithrowawaycomm · 2024-10-11T18:01:19 1728669679

In particular the fact that humans sometimes don't do this, taking the bait with extraneous distractions, is almost always a fairly shallow psychological thing rather than an actual cognitive deficit, e.g. OP hypothetically assuming the question had a typo and trying to read the examiner's mind. In education the gotchas really can be unfair if the (human) student has been conditioned to bark answers but the teacher changes things drastically on an exam. I don't think that's an accurate characterization of this study; even if it was that would be a problem with shallow LLM training, not mean-spirited evaluation. But I suspect that "barking answers according to surface characteristics" is as far as transformers can go. It certainly is possible that we just need to train transformers better... but there have been some theoretical results suggesting otherwise. [E.g. transformer LLMs + chain-of-thought is pretty good at O(n) problems but struggles with O(n^2), even if the O(n^2) task is an obvious combination of two O(n) tasks it is able to do.]

That leads to a serious annoyance I have with discussing LLMs - humans' capacity for boredom / cynicism / distraction / laziness being used to excuse away what seems to be deep-rooted limitations in LLMs. It simultaneously misunderstands what a human is and what a machine is. ("Sometimes humans also refuse to work" would be a bad excuse from an auto dealer.)

pishpash · 2024-10-11T22:03:39 1728684219

Psychology is cognitive. Doesn't seem principled to discard that at all.

aithrowawaycomm · 2024-10-12T00:38:27 1728693507

That’s why I specified “fairly shallow psychological thing.”

woopwoop · 2024-10-11T19:34:21 1728675261

My argument is not that slippery problems are unimportant or extraneous, it's that this paper does not convincingly demonstrate that these models are actually especially bad at this kind of reasoning.

aithrowawaycomm · 2024-10-12T08:04:06 1728720246

To be clear the paper's argument isn't that they're "bad at" the reasoning problems, so much as they're not using reasoning to solve them. In terms of getting the answer "turning the crank" with a canned solution can be more effective than reasoning through on deeper principles.

aguaviva · 2024-10-11T20:21:23 1728678083

Noted, and thanks for clarifying. BTW when I get questions with typos/inversions (that are supposed to be logical or mathy questions), I tend to throw them back at the person asking, rather than simply ploughing forward. But I guess I'm the kind of person who does that sort of thing.