Hacker News new | past | comments | ask | show | jobs | submit login

Success doesn’t imply that “reasoning” was involved, and the definition of reasoning is extremely important.

Apple’s recent research summarized here [0] is worth a read. In short, they argue that what LLMs are doing is more akin to advanced pattern recognition than reasoning in the way we typically understand reasoning.

By way of analogy, memorizing mathematical facts and then correctly recalling these facts does not imply that the person actually understands how to arrive at the answer. This is why “show your work” is a critical aspect of proving competence in an education environment.

An LLM providing useful/correct results only proves that it’s good at surfacing relevant information based on a given prompt. That fact that it’s trivial to cause bad results by making minor but irrelevant changes to a prompt points to something other than a truly reasoned response, i.e. a reasoning machine would not get tripped up so easily.

- [0] https://x.com/MFarajtabar/status/1844456880971858028






You’re still suffering from the biases of the parent poster. You are picking and choosing papers that illustrate failure instances when there are also an equal amount of papers that verify successful instances.

It’s bloody obvious that when I classify success I mean that the llm is delivering a correct and unique answer for a novel prompt that doesn’t exist in the original training set. No need to go over the same tired analogies that have been regurgitated over and over again that you believe LLMs are reusing memorized answers. It’s a stale point of view. The overall argument has progressed further then that and we now need more complicated analysis of what’s going on with LLMs

Sources: https://typeset.io/papers/llmsense-harnessing-llms-for-high-...

https://typeset.io/papers/call-me-when-necessary-llms-can-ef...

And these two are just from a random google search.

I can find dozens and dozens of papers illustrating failures and successes of LLMs which further nails my original point. LLMs both succeed and fail at reasoning.

The main problem right now is that we don’t really understand how LLMs work internally. Everyone who claims they know LLMs can’t reason are just making huge leaps of irrational conclusions because not only does their conclusion contradict actual evidence but they don’t even know how LLMs work because nobody knows.

We only know how LLMs work at a high level and we only understand these things via the analogy of a best fit curve in a series of data points. Below this abstraction we don’t understand what’s going on.


> The main problem right now is that we don’t really understand how LLMs work internally.

Right, and this is why claims that models are “reasoning” can’t be taken at face value. This space is filled with overloaded terms and anthropomorphic language that describes some behavior of the LLM but this doesn’t justify a leap to the belief that these terms actually represent the underlying functionality of the model, e.g. when terms like “hallucinate”, “understand”, etc. are used, they do not represent the biological processes these ideas stem from or carry the implications of a system that mimics those processes.

> Everyone who claims they know LLMs can’t reason are just making huge leaps of irrational conclusions because not only does their conclusion contradict actual evidence but they don’t even know how LLMs work because nobody knows.

If you believe this to be true, you must then also accept that it’s equally irrational to claim these models are actually “reasoning”. The point of citing the Apple paper was that there’s currently a lack of consensus and in some cases major disagreement about what is actually occurring behind the scenes.

Everything you’ve written to justify the idea that reasoning is occurring can be used against the idea that reasoning is occurring. This will continue to be true until we gain a better understanding of how these models work.

The reason the Apple paper is interesting is because it’s some of the latest writing on this subject, and points at inconvenient truths about the operation of these models that at the very least would indicate that if reasoning is occurring, it’s extremely inconsistent and unreliable.

No need to be combative here - aside from being against HN guidelines, there just isn’t enough understanding yet for anyone to be making absolute claims, and the point of my comment was to add counterpoints to a conversation, not make some claim about the absolute nature of things.


>If you believe this to be true, you must then also accept that it’s equally irrational to claim these models are actually “reasoning”.

If a novel low probability conclusion that is correct was arrived at from a novel prompt where neither the prompt nor the conclusion existed in the training set, THEN by logic the ONLY possible way the conclusion was derived was through reasoning. We know this, but we don't know HOW the model is reasoning.

The only other possible way that an LLM can arrive at low probability conclusions is via random chance.

>The point of citing the Apple paper was that there’s currently a lack of consensus and in some cases major disagreement about what is actually occurring behind the scenes.

This isn't true. I quote the parent comment:

   "What this tells me is there is clearly no “reasoning” happening whatsoever with either model, despite marketing claiming as such." 
Parent is clearly saying LLMs can't reason period.

>Everything you’ve written to justify the idea that reasoning is occurring can be used against the idea that reasoning is occurring. This will continue to be true until we gain a better understanding of how these models work.

Right and I took BOTH pieces of contradictory evidence into account and I ended up with the most logical conclusion. I quote myself:

   "You have contradictory evidence therefore the LLM must be capable of BOTH failing and succeeding in reason. That's the most logical answer."
>The reason the Apple paper is interesting is because it’s some of the latest writing on this subject, and points at inconvenient truths about the operation of these models that at the very least would indicate that if reasoning is occurring, it’s extremely inconsistent and unreliable.

Right. And this, again, was my conclusion. But I took it a bit further. Read again what I said in the first paragraph of this very response.

>No need to be combative here - aside from being against HN guidelines, there just isn’t enough understanding yet for anyone to be making absolute claims, and the point of my comment was to add counterpoints to a conversation, not make some claim about the absolute nature of things.

You're not combative and neither am I. I respect your analysis here even though you dismissed a lot of what I said (see quotations) and even though I completely disagree and I believe you are wrong.

I think there's a further logical argument you're not realizing and I pointed it out in the first paragraph. LLMs are arriving at novel answers from novel prompts that don't exist in the data set. These novel answers have such low probability of existing via random chance that the ONLY other explanation for it is covered by the broadly defined word: "reasoning".

Again, there is also evidence of prompts that aren't arrived at via reasoning, but that doesn't negate the existence of answers to prompts that can only be arrived via reasoning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: