More

superbatfish · 2025-08-19T02:11:44 1755569504

>Implementing `pipe` would be fun, but I'll leave it as an exercise for the reader.

I like exercise:

https://gist.github.com/stuarteberg/6bcbe3feb7fba4dc2574a989...

nxpnsv · 2025-08-19T08:28:29 1755592109

Neat!

superbatfish · 2025-02-10T15:05:03 1739199903

The author makes this assertion about LLMs rather casually:

>They don’t engage in logical reasoning.

This is still a hotly debated question, but at this point the burden of proof is on the detractors. (To put it mildly, the famous "stochastic parrot" paper has not aged well.)

The claim above is certainly not something that should be stated as fact to a naive audience (i.e. the authors' intended audience in this case). Simply asserting it as they have done -- without acknowledging that many experts disagree -- undermines the authors' credibility to those who are less naive.

cristiancavalli · 2025-02-10T15:11:55 1739200315

Disagree — proponents of this point still have yet to prove reasoning and other studies agree about “reasoning” being potentially fake/simulated: https://the-decoder.com/apple-ai-researchers-question-openai...

Just claiming a capability does not make it true and we have 0 “proof” of original reasoning that can be proved coming from these models. Especially given the potential cheating in current SOTA benchmarks

UltraSane · 2025-02-10T20:40:30 1739220030

When does a "simulation" of reasoning become so good it is no different than actual reasoning?

cristiancavalli · 2025-02-10T20:48:53 1739220533

Love this question! Really touches on some epistemological roots and certainly a prescient question in these times. I can certainly see a theoretical where we could create this simulation in totality to our perspectives and then venture out into the universe to find that this modality of intelligence would be limited in its understanding of completely new empirical experiences/phenomenon that are outside our current natural definitions/descriptions. To add to this question: might we be similarly limited in our ability to perceive these alien phenomena? I would love to read a short story or treatise on this idea!

hnthrow90348765 · 2025-02-10T17:48:34 1739209714

>Disagree — proponents of this point still have yet to prove reasoning and other studies agree about “reasoning” being potentially fake/simulated: https://the-decoder.com/apple-ai-researchers-question-openai...

???

https://the-decoder.com/language-models-use-a-probabilistic-...

cristiancavalli · 2025-02-10T18:08:27 1739210907

Yes people are claiming different things yet no definitive proof has been offered given the varying findings. I can cite another 3 papers which agree with my point and you can probably cite just as many if not more supporting yours. I’m arguing against people depicting what is not a forgone conclusion as such. It seems like in people’s rush to confirm their own preconceived notions people forget that, although a theory may be convincing, it may not be true. Evidence in this very thread of a well-known SOTA LLM not being able to tell which is greater between two numbers indicates to me that what is being called “reasoning” is not what humans do. We can make as many excuses we want per the tokenizer or whatever but then forgive me for not buying the super or even general “intelligence” of this software. I still like these tools though, even if I have to constantly vet everything they say as they often tend to just outright lie, or perhaps more accurately: repeat lies in their training data even if you can elicit a factual response on the same topic.

semiquaver · 2025-02-10T19:45:16 1739216716

What would definitive proof look like? Can you definitively prove that your brain is capable of reasoning and not a convincing simulation of it?

cristiancavalli · 2025-02-10T19:58:47 1739217527

I can’t and that’s pretty cool to think about! Of course if we’re going that far down the chain of assumption we’re not quite ready to talk about LLMs imo (then again maybe it would be the perfect place to talk about them as contrast/comparison; certainly exciting ideas in that light).

From my own perspective: if we’re gonna say these things reason and we’re using the definition of reasoning we apply to humans, then being able to reason through the trivial cases they fail to today would be a start. To the proponents of “they reason sometimes but not others” my question is why? What reason does it have to not reason and why if it is reasoning it still fails on trivial things that are variations of its own training data? I would also expect that these models would use reasoning to find new things like humans do but without humans essentially guiding the model to the correct awnser or the model just brute-forcing a problem-space with a set of rules/heuristics. Not exhaustive but a good start I think. These models have trouble currently even doing the advertised things like “book a trip for me” once a UI update happens so I think it’s a great indication we don’t quite have the intelligence/reasoning aspect worked out.

Another question I have: would a form of authentic reasoning in a model give rise to a model having an aesthetic? Could this be some sort of indicator of having created a “model of the world”? Does the model of the world perhaps imply a value judgement about it given that if one was super intelligent wouldn’t one of the first things realized be the limitations of its own understanding even given the restrictions of time and space and not ever potentially being able to observe the universe in its entirety? Perhaps a perfect super intelligence would just evaporate/transcend like in the Culture series. What a time to be alive!

hnfong · 2025-02-11T05:37:59 1739252279

IMHO, any argument against LLM intelligence should be validated by first applying them to humans.

And then you'd realize that a lot of naïve arguments against LLMs would imply that a significant portion of homo-sapiens can't reason, are unable to really think, and are no more than stochastic parrots.

It's actually a rather dangerous line of reasoning.

cristiancavalli · 2025-02-11T16:34:58 1739291698

I’m curious what’s dangerous about it? How do your square the inability to play tic-tac toe or do value comparisons correctly with “we should compare this to a humans reasoning” If it can’t do things like basic value comparison correctly what business do we have saying it “reasons like a human?”

hnfong · 2025-02-14T02:44:06 1739501046

The danger is when LLMs start to outperform humans on many tasks (which they already have), claiming that LLMs are stochastic parrots could be seen to imply that less intelligent people are also no better than stochastic parrots.

cristiancavalli · 2025-02-14T18:38:22 1739558302

Who is claiming that implications inevitability?Shutting down a valid line of discussion because of someone deciding to make a fallacious analogy seems like a proposition that would essentially stop all scientific discussion of intelligence more broadly. Also a great argument for limiting free speech/scientific discussion in general. Thoughts are not inherently dangerous unless acted upon in a dangerous way and supposing that some are so dangerous that we should simply not speak of them seems like an action that should be considered more thoroughly beyond “someone might do something rash”

ninetyninenine · 2025-02-10T17:00:15 1739206815

It’s stupid. You can prove that LLMs can reason by simply giving it a novel problem where no data exists and having it solve that problem.

LLMs CAN reason. Whether it can’t reason is not provable. To prove that you have to give the LLM every possible prompt that it has no data for and effectively show it never reasons and gets it wrong all the time. Not only is the proof impossible but it’s already been falsified as we have demonstrable examples of LLMs reasoning.

Literally I invite people to post prompts and correct answers to ChatGPT where it is trivially impossible for that prompt to exist in the data. Every one of those examples falsifies the claim that LLMs can’t reason.

Saying LLMs can’t reason is an overarching claim similar to the claim that humans and LLMs always reason. Humans and LLMs don’t always reason. But they can reason.

cristiancavalli · 2025-02-10T17:30:24 1739208624

Saying something again does not provide proof of its actual veracity. Writing it in caps does not make it true despite the increased emphasis. I default to skepticism in the face of unproven assertions: if one can’t prove that they reason then we must accept the possibility that they do not. There are myriad examples of these models failing to “reason” about something that would trivial for a child or any other human (some are even given as examples in this posts other comments). Given this and the lack of concrete proof I currently tend to agree with the Apple researchers conclusion.

consumer451 · 2025-02-11T14:54:55 1739285695

Here was my test at ChatGPT 3.5.[0] I made up a novel game, and it figured it out. The test is simple, but it made me doubt absolute arguments that LLMs are not able to reason, in some way.

There is a question at the end of that comment, would love to hear other options.

[0] https://news.ycombinator.com/item?id=35442147

cristiancavalli · 2025-02-11T16:23:08 1739290988

How does this prove reasoning? The thread you point to has several question in it that remain unanswered that ask the same question? How is this not entirely derivative too — there’s a huge number of these kind of 3-box “games” (although I don’t see this as a game really) so something very similar to this is probably in the training data a lot. Writing code to factor a number is definitely very common. Variation of this are also very common interview questions for interns (at least when I was interviewing)

consumer451 · 2025-02-11T17:13:12 1739293992

Here is GPT 4o in "reasoning" mode answering the question "Are LLMs capable of reasoning?"

> It depends on how you define "reasoning."

> LLMs like me can perform certain types of reasoning, such as:

> Pattern recognition & analogy – Recognizing relationships between concepts and applying similar patterns in new contexts.

> Deductive reasoning – Applying rules to specific cases (e.g., "If all humans are mortal and Socrates is a human, then Socrates is mortal").

> Inductive reasoning – Generalizing from specific examples (though often with statistical biases).

> Common-sense reasoning (to an extent) – Inferring likely outcomes based on broad training data.

> However, there are limitations:

> No true understanding – LLMs don’t "know" things like humans do; they predict based on probabilities . > Struggles with multi-step logic – Complex reasoning that requires deep memory or long-term planning is difficult.

> Lack of real-world interaction – Without direct experience, reasoning is limited to text-based patterns.

> Confidence: 90%

> (Strong evidence suggests LLMs can perform certain types of reasoning, but they are not yet comparable to human-level reasoning.)

Would you agree with that analysis? If so, then LLMs are indeed capable of reasoning, in some ways.

tripzilch · 2025-02-13T16:53:50 1739465630

It fails at deductive reasoning though. Pick a celebrity with non-famous children that don't obviously share their last name or something. If you ask it "who is the child of <celebrity>", it will get it right, because this is in its training data, probably Wikipedia.

If you ask "who is the parent of <celebrity-child-name>", it will often claim to have no knowledge about this person.

Yes sometimes it gets it right, but sometimes also not. Try a few celebrities.

Maybe the disagreement is about this?

Like if it gets it right a good amount of the time, you would say that means it's (in principle) capable of reasoning.

But I say, that if it gets it wrong a lot of the time, that means 1) it's not reasoning in situations when it gets it wrong, but also 2) it's most likely also not reasoning in situations when it gets it right.

And maybe you disagree with that, but then we don't agree on what "reasoning" means. Because I think that consistency is an important property of reasoning.

I think that if it gets "A is parent of B, implies B is child of A" wrong for some celebrity parents, but not for others, then it's not reasoning. Because reasoning would mean applying this logical construct as a rule, and if it's not consistent at that, it makes it hard to argue that it is in fact applying this logical rule instead of doing who-knows-what that happens to give the right answer, some of the time.

consumer451 · 2025-02-11T17:01:08 1739293268

I was unable to find my exact "game" in google's index.

Therefore, how does my example not qualify as this, at least:

> Analogical reasoning involves the comparison of two systems in relation to their similarity. It starts from information about one system and infers information about another system based on the resemblance between the two systems.

https://en.wikipedia.org/wiki/Logical_reasoning#Analogical

cristiancavalli · 2025-02-11T17:21:29 1739294489

Is it actually reasoning though or just pattern matching? Seems like to compare one should also “know” which your above response indicates they do not.

I guess the real question is “does moving down a stochastic gradient of probabilities suffice as reasoning to you” and my awnser is no because you don’t need reason to find the nearest neighbor in this architecture. In this case the model is not actively comparing and inferring its simply associating without “knowing”

consumer451 · 2025-02-11T17:48:27 1739296107

There are many types of reasoning, and LLMs appear to do some of them.

cristiancavalli · 2025-02-11T18:05:10 1739297110

Repeating a point without proffering evidence only makes it seems as if you don’t have anything to argue of substance.

consumer451 · 2025-02-11T18:20:01 1739298001

[flagged]

cristiancavalli · 2025-02-11T18:47:06 1739299626

> I keep repeating myself because you seem unable to accept information.

I responded to your post with my thoughts and my own reframing of the question to try and open the conversation; you entirely ignored this in your response. Maybe you think this info is new to me but it’s not and it doesn’t prove anything for many more reasons than just the ones I cited in my prior response.

> Not sure if you are trolling now.

Nope just saying that adding “seem” to your postulation and repeating your point doesn’t make it right.

> I've had more productive conversations will LLMs.

The ad-hominem attacks really just serve to underscore how you have nothing of substance to argue.

ninetyninenine · 2025-02-11T19:38:05 1739302685

Reasoning is not understood even among humans. We only know the black box definition in the sense that whatever we are doing it is reasoning.

If an LLM arrives at the same output a human does given input and the output is sufficiently low probability to happen by random chance or association then it fits the term reasoning insofar as the maximum extent in which we understand it.

Given that we don't know what's going on the best bar is simply my matching input and output and making sure it's not using memory or pattern matching or random chance. There are MANY prompts that meet this criteria.

Your thoughts and claims are to be honest just flat out wrong. It’s just made up because not only do you know what the model is doing internally you don’t even know what you or any other human is doing. Nobody knows. So I don’t know why you think your claims have any merit. They don’t and neither do you.

cristiancavalli · 2025-02-11T20:26:30 1739305590

Not sure why this got so acrid but I don’t really have any reason to interact with someone saying I have “no merit” You might want to look at how bent-out-of-shape you are getting about a rando on the internet disagreeing with you.

Why I would lie about plugging in your problem into an LLM or solve it is beyond me; you know I don’t lose anything by admitting you’re right? In fact I would stand to gain from learning something new. I think you should examine how you approach an argument because every time you’ve replied it’s made it look like you’re just more desperate for someone to agree and are trying to bully people into agreeing by making ad hominem attacks. Despite it all I think you have merit as a person — even if you can’t make a cogent argument and just chase your tail on this topic.

I’m going to stop engaging with you for now on, but just as a piece of perspective for you: both o3 and Gemini pointed out how your problem is a derivation when asked — perhaps you might be overestimating its novelty. Gemini even cited derivations out the gate.

ninetyninenine · 2025-02-11T23:12:55 1739315575

>Why I would lie about plugging in your problem into an LLM or solve it is beyond me;

I interpreted it as you saying you solved it without the LLM. Apologies then for the misinterpretation.

Yeah I agree no point in continuing this conversation. We disagree, there's no moving forward from that.

ninetyninenine · 2025-02-11T15:10:17 1739286617

My thread has been voted down and it’s getting stale. The few remaining people are biased towards there point of view and are unlikely to entertain anything that will trigger a change in their established world view.

Most people will use this excuse to avoid responding to or even looking at your link here. It is compelling evidence.

cristiancavalli · 2025-02-11T16:29:42 1739291382

I’d settle for these things being able to do value comparison consistently well, play a game of tic tac toe more than once correctly or use a UI after an update and not fail horrendously to move the needle a little bit for me. People claiming these things selectively reason while also not being able to explain why seems a lot like magical thinking to me rather than entertaining the possibility you might be projecting onto something that is really damn-well engineered to make your anthropomorphize it.

ninetyninenine · 2025-02-10T18:01:50 1739210510

I can prove LLMs can reason. You cannot prove LLMs can't reason. This is easily demonstrable. LLMs failing to reason is not proof LLMs can't reason, it's just proof that an LLM didn't reason for that prompt.

All I have to do is show you one prompt with a correct answer that cannot be arrived at with pattern matching and the prompt can only be arrived at through reasoning. One. You have to demonstrate this for EVERY prompt if you want to prove LLMs can't reason.

cristiancavalli · 2025-02-10T18:14:33 1739211273

No I can “prove” it — look at any number of cases where LLMs can’t even do basic value comparisons despite being claimed as super intelligent. You can try and say well that’s a limitation of the technology and then I would reply — yes and that’s why I would say it’s not reasoning according the original human definition. Also you have yet to produce any evidence of reasoning and claiming you can over and over again doesn’t add to your arguments substance. I would be interested in your proof that some answer can’t be pattern matched too — at this point I wonder if we could create an non conscious “intelligence” that if large enough would be mostly able to describe anything known to us along some line of probability we couldn’t compute with our brain architecture and it could be close to 99.99999% right. Even if we had this theoretical probability-based super intelligence it still wouldn’t be “reasoning” but could be more “intelligent” than us.

I’m also not entirely convinced we can’t arrive at a reasoning system via probability only (a really cool thought experiment) but these systems do not meet the consistency/intelligence bar for me to believe this currently.

ninetyninenine · 2025-02-10T18:48:41 1739213321

LLMs can reason they just don’t always reason.

That’s the claim everyone makes. That is a human definition if it reasoned one time correctly. That is the colloquial definition.

Someone who has brain damage can reason correctly on certain subjects and incorrectly on other subjects. This is an immensely reasonable definition. I’m not being pedantic or out of line here when I say LLMs can reason while using this definition.

Nobody is making the claim that LLMs reason like humans or are human or reason perfectly every time. Again the claim is: LLMs are capable of reasoning.

tripzilch · 2025-02-13T17:10:02 1739466602

No reasoning is about applying rules of logic consistently, so if you only do it some of the time, that's not reasoning.

If I roll a die and only _sometimes_ it returns the correct answer to a basic arithmetic question, this is the exact reason why we don't say a die is doing arithmetic.

Even worse in the case of LLMs, where it's not caused by pure chance, but also training bias and hallucinations.

You can claim nobody knows the exact definition of reasoning, maybe there are some edges which aren't clearly defined because they're part of Philosophy, but applying rules of logic consistently is not something you just don't always do and still call it reasoning.

Also, LLMs are generally incapable of saying they don't know something, cannot know something, can't do something, etc. They would rather try and hallucinate. When it does that, it's not reasoning. And you also can't explain to an LLM how to figure out it doesn't know something, and then actually say it doesn't know and not make stuff up. If it was capable of reasoning you should be able to convince it using _reason_, to do exactly that.

However, you

cristiancavalli · 2025-02-10T19:12:04 1739214724

I still think the jury is out on this given that they seem to fail on obvious things which are trivially reasoned about by humans. Perhaps they reason differently at which point I would need to understand how this reasoning is different from a humans reasoning (perhaps biological reasoning more generally?) and then I would want to consider whether one ought to call it reasoning given its differences (if there are any at the time of sampling). I understand your claim I’m just not buying it based on the current evidence and my interacting with these supposed “super intelligences” every day. I still find these tools valuable, just unable to “reason” about a concept which makes me think, as powerful and meaning filled as language is, our assumption of reasoning might just be a trick of our brain reasoning through a more tightly controlled stochastic space and us projecting the concept of reasoning onto a system. I see the COT models contort and twist language in a simulacrum of “reasoning” but any high school English teacher can tell you there is a lot of text written that appears to logically reason but doesn’t actually do anything of the sort once read with the requisite knowledge in the subject matter.

ninetyninenine · 2025-02-10T20:58:42 1739221122

They can fail at reasoning. But they can demonstrably succeed to.

So the the statement that they CAN reason is demonstrably true.

Ok if given a prompt where the solution can only be arrived at by reasoning and the LLM gets to the solution for that single prompt, then how can you say it can't reason?

cristiancavalli · 2025-02-10T21:26:31 1739222791

Given your set of theoreticals then I would concede, yes the model is reasoning. At that point, though, the world would probably be far more concerned with your finding of a question that can only be met via reasoning and would be uninfluenced or paralleled by any empirical phenomenon including written knowledge as a medium of transference. The core issue I see here is you being able to prove that the model is actually reasoning in a concrete way that isn’t just a simulacrum like the Apple researchers et al. theorize it to be.

If you do find this question answer pair then it would be a massive breakthrough for science and philosophy more generally.

You say “demonstrably” but I still do not see a demonstration of these reasoning abilities that is not subject to the aforementioned criticisms.

ninetyninenine · 2025-02-11T12:15:47 1739276147

https://news.ycombinator.com/item?id=43011839

cristiancavalli · 2025-02-11T14:55:56 1739285756

This looks neat but I don’t think it meets the standard for “reasoning only.” (Still not sure how you would prove that one) furthermore this looks to be fairly generalizable in pattern+form to other grid problems so i don’t think it also meets the bar for “not being in the training data.” We known these models can generalize somewhat based upon their training but not consistently and certainly not consistently well. Again I’m not making the claim that responding to a novel prompt is a sign of reasoning as other have pointed out a calculator can do that too.

Your quote: “This is a unique problem I came up with. It’s a variation on counting islands.” You then say: “ as I came up with it so no variation of it really exists anywhere else.”

So not sure what to take away from your text but I do think this is a variation of a well-known problem type so I would be pretty amazed if there was something very close to this in the training data. Given it’s an interview question and those are written about ad-nauseum I’m not surprised then that it was able to generalize to the provided case. The COT researchers did see the ability to generalize in some cases just not necessarily actually use the COT tokens to reason and/or failed on generalizing on variations which they thought it should have given its ability to generalize in others and the postulation that it was using reasoning and not just a larger corpus to pattern match with.

ninetyninenine · 2025-02-11T15:24:59 1739287499

It’s a variation on a well known problem in the sense that I just added some unique rules to it.

The solution however is not a variation. It requires leaps of creativity that most people will be unable to solve. In fact I would argue this goes beyond just reasoning as you have to be creative and test possibilities to even arrive at a solution. It’s almost random chance that will get you there. Simple reasoning like logical reduction won’t let you arrive at a solution.

Additionally this question was developed to eliminate pattern matching that candidates use on software interviews. It was vetted and verified to not exist. No training data exists.

It definitively requires reasoning to solve. And it is also unlikely you solved it. ChatGPT o3 has solved it. Try it.

cristiancavalli · 2025-02-11T16:13:05 1739290385

I did and I fail to see how you can make those guarantees given you given it as a n interview question? You’re able to the vet the training data of O3? I still don’t see how your answer could only be arrived at via reasoning and that it would take “leaps of creativity” to arrive at the correct answer? These all seem like value judgments not hard data or some proof that your question cannot be derived from the training data given you say it is a variation of. Seems like you have an interview question not “proof of reasoning” especially given the prior cited case of these models being able to generalize in some cases with enough data.

“And it is also unlikely you solved it” well I guess you overestimated your abilities on two counts today then.

> It’s a variation on a well known problem in the sense that I just added some unique rules to it.

> No training data exists.

No it definitely does but is a variation. You kinda just confirmed what we already knew. Given enough data about a thing these LLMs can generalize somewhat.

ninetyninenine · 2025-02-11T17:10:01 1739293801

I don’t think you solved it otherwise you’d know that what I mean by variation is similar to how calculus is a variation of addition. Yea it involves addition but the solution is far more complicated.

Think of it like this counting islands exists in the training data in the same way addition exists. The solution to this problem builds off of counting islands in the same way calculus builds off of addition.

No training data exists for it to copy because this problem is uniquely invented by me. The probability that it has is quite low. Additionally several engineers and I have done extensive google searches and we believe to a reasonable degree that this problem does not exist anywhere else.

Also you use semantics to cover up your meaning. LLMs can “generalize” somewhat? Generalization is one of those big words that’s not well defined. First off the solution is not trivially extracted from counting islands and second “generalize” is a form of reasoning. You’re using big fuzzy words with biased connotations to further your argument. But here’s the thing, even if we generously go with it, the solution to counting donuts is clearly not some trivial generalization of counting islands. The problem is a variation but the solution is NOT. It’s not even close to what we term as the colloquial definition of “generalization”

Did you solve it? I highly doubt you did. It’s statistically more likely you’re lying, and the fact that you call the solution a “generalization” just makes suspect that even more.

cristiancavalli · 2025-02-11T18:11:05 1739297465

Yep and yep. Did it on two models and by myself — you know if you ask them to cite similar problems (and their sources) I’ll think you’ll quickly realize how derivative your question is in both question and solution. Given that you’re now accusing me of arguing in bad faith despite the fact I’ve listened to you repeat the same point with the only proof being “this question is a head scratcher for me; must be for everyone else therefore it proves that one must reason” makes me think you don’t actually want to discuss something; you think you can “prove” something and seem to be more interested in that. Given that I say go publish your paper about your impossible question and let the rest of the community review it if you feel like you need to prove something. So far the only thing you’ve proven to me is that you’re not interested in a good-faith discussion; just repeating your dogma and hoping someone concedes.

Also generalization is not always reasoning: I can make a generalization that is not reasoned; I can also make one that is poorly reasoned. Generalization is considered well-defined in regards to reasoning: https://www.comm.pitt.edu/reasoning

Your example still fails to actually demonstrate reasoning given its highly derivative nature, though.

ninetyninenine · 2025-02-11T19:33:38 1739302418

Yeah I know you claimed to solve it. I’m saying I don’t believe you and I think you’re a liar. There’s various reasons why the biggest one is that you think the solution is “generalizable” from counting islands (it’s not).

That’s not the point though. The point is I have metrics on this. Roughly 50 interviews only one guy got it. So you make the claim the solution is generalize-able well then prove your claim then. I have metrics that support my claim. Where’s yours?

JackSlateur · 2025-02-10T22:42:17 1739227337

Just say it : llm are random machine. Even a broken clock is right twice a day.

Miraste · 2025-02-10T17:15:28 1739207728

Answering novel prompts isn't proof of reasoning, only pattern matching. A calculator can answer prompts it's never seen before too. If anything, I would come down on the reasoning side, at least for recent CoT models-but it's not a trivial question at all.

cristiancavalli · 2025-02-10T17:23:22 1739208202

This is a fun thought experiment and made me reminisce on my Epistemology classes — something I think the current AI conversation would benefit greatly from. I’m super excited about what we’ve created here — less from the practical standpoint and more from a philosophical one where we get to interact with another form of distilled knowledge. It’s really too bad so much is breathless hype and grift because the philosophy student in me just wants to bask in thinking about this different form/medium/distillation of knowledge we now get to interact with. Comments like these help to reinvigorate that love though so thank you!

johnmaguire · 2025-02-10T17:36:22 1739208982

Are there any good Epistemology resources online? Seems like we could all benefit from this these days.

cristiancavalli · 2025-02-10T17:56:57 1739210217

I actually just sat down to crack open MITs Theory of Knowledge and it seems promising and free: https://ocw.mit.edu/courses/24-211-theory-of-knowledge-sprin...

This also looks promising:

https://hiw.kuleuven.be/en/study/prospective/OOCP/introducti...

If you wanted something a bit different Wittgenstein’s Tractatus has always made my head spin with possibilities:

https://people.umass.edu/klement/tlp/tlp-hyperlinked.html

ninetyninenine · 2025-02-10T18:02:38 1739210558

Then I'll come up with a prompt such that the answer can only be arrived at via reasoning. I only have to demonstrate this once to prove LLMs CAN reason.

cristiancavalli · 2025-02-10T18:16:18 1739211378

I don’t think this is the watertight case you think it is, furthermore good luck proving with closed models that your question that’s never been asked in any form or derivation (supposedly) is not in the training data.

ninetyninenine · 2025-02-10T18:51:18 1739213478

It’s water tight if the claim is only LLMs CAN reason.

No one is making the claim that LLMs reason like humans or always reason correctly. Ask anyone who makes a claim similar to mine. We are all ONLY making the claim that LLMs can reason correctly. That is a small claim.

The counterclaim is LLMs can’t reason and that is a vastly expansive claim that is ludicrously unprovable.

Terr_ · 2025-02-10T19:22:46 1739215366

> Then I'll come up with a prompt such that the answer can only be arrived at via reasoning.

Dude, if you can formulate a question and prove an answer absolutely requires "reasoning" (defined how?) then you should drop everything and publish a paper on it immediately.

You'll have plenty of time to use your discovery to poke at LLMs after you secure your worldwide fame and recognition.

simianparrot · 2025-02-10T19:20:54 1739215254

Go ahead then.

ninetyninenine · 2025-02-11T12:01:38 1739275298

This is the count donut problem. Given a grid of 1s and 0s where 1 represents land and 0 represents water find the amount of donuts. A donut is an island with at least one hole in it. Two grid cells that are diagonal or adjacent form a barrier that water cannot cross. Count the amount of donuts in the grid.

This is a unique problem I came up with. It’s a variation on counting islands. There are actually two correct answers that are straightforward. Other answers may exist but are generally not straightforward and often wrong. One answer is mathematical the other is a leetcode style solution.

Try to solve this yourself before using ai to get a feel for how hard it is. The solution should be extremely straightforward. It’s also fun to think about. When you try to think of a solution you will invariably come up with a bunch of possible solutions that are wrong which is a strong indicator of how large the range of possible answers are. Few answers are correct but many look correct.

I give this test to candidates and I never expect the candidate to solve it because it’s one of the few algorithms that requires actual reasoning and actual creativity as I came up with it so no variation of it really exists anywhere else. You can’t pattern match for it. Out of like 50 candidates you probably get one person able to solve it in less than an hour.

It’s unlikely most people on hn will be able to solve it. If you do solve it don’t post the answer as it will become training data for the next iteration of the LLM.

I gave the prompt to o3. It solved. It generated code as well which I was too lazy to verify but it solved it correctly in the description of the algorithm involved.

There is also a 3D version of this problem where the grid is 3D. It changes the entire problem if a donut is in 3D space. It is harder and I have only found one possible solution for it. I have not tried it on an LLM.

enragedcacti · 2025-02-10T19:03:02 1739214182

LLMs CAN read minds. Whether it can’t read minds is not provable.

Literally I invite people to post prompts and correct answers to ChatGPT where it is trivially impossible for it to have known what number you were thinking of. Every one of those examples falsifies the claim that LLMs can’t read minds.

ninetyninenine · 2025-02-10T21:15:10 1739222110

ok prove it. I'm thinking of a number right now between 1-10,000. Show me the number the LLM guesses. You can definitively prove this statement for me.

It's a probability problem really. The range of a prompt has billions of possibilities. If it arrived at a correct answer within that range then the probability it got there without reasoning is miniscule.

Same with this mind reading thing. Prove it.

enragedcacti · 2025-02-11T00:02:16 1739232136

Doesn't really seem fair that any one prompt proves your conclusion but it has to guess your exact number to prove my conclusion. Gemini guessed mine on the very first try (7) even though the range of numbers is infinite. Billions is small potatoes compared to what I've proven.

ninetyninenine · 2025-02-11T00:38:14 1739234294

I’ll pick a prompt such that the range is vast so that if it gets the answer right the probability is so small that it must have arrived there by reasoning.

wruza · 2025-02-10T17:18:29 1739207909

You can prove that LLMs can reason by simply giving it a novel problem where no data exists and having it solve that problem

They scan a hyperdimensional problem space whose facetness and capacity a single human is unable to comprehend. But there potentially exist a slice that corresponds to a problem that is novel to a human. LLMs are completely alien to us both in capabilities and technicalities, so talking about whether they can reason makes as much sense as if you replaced “LLMs” with “rainforests” or “antarctica”.

ninetyninenine · 2025-02-10T18:53:08 1739213588

Reasoning is an abstract term. It doesn’t need to be similar to human reasoning. It just needs to be able to arrive at the answer through a process.

Clearly we used the term reasoning for many varied techniques. The term doesn’t narrow to specifically one form of “human” like reasoning only.

wruza · 2025-02-10T21:12:03 1739221923

Oh, that is true. "It" doesn't have to do human reasoning, at all.

But we have to at least define "reasoning" for the given manifestation of "it". Otherwise it's just birdspeak. Because reasoning is "the action of thinking about something in a logical, sensible way", which has to happen somewhere if not finger-pointable, then at least somehow scannable or otherwise introspectable. Otherwise it's yet another omnidude in the sky who made it all so that you cannot see him, but there will be hints if you believe.

Anyway, we have to talk something specific, not handwavy. Even if you prove that they CAN reason for some definition of it, both the proof and the definition must have some predictive/scientific power, otherwise they are as useless as nil thought about it.

For example, if you prove that the reasoning is somehow embedded as a spatial in-network set of dimensions rather than in-time, wouldn't that be literally equivalent to "it just knows the patterns"? What would that term substitution actually achieve?

ninetyninenine · 2025-02-11T03:10:34 1739243434

Well no. If you create a machine that produces output indistinguishable from the output of things we "know" can "reason" aka "humans". Then I would call that reasoning.

If the output has a low probability of occuring by random chance then it must be reason.

>For example, if you prove that the reasoning is somehow embedded as a spatial in-network set of dimensions rather than in-time, wouldn't that be literally equivalent to "it just knows the patterns"? What would that term substitution actually achieve?

I mean, this is a method many humans use to reason themselves.

wruza · 2025-02-11T06:50:15 1739256615

A side effect of this is that a zip.exe that unzips a zip into a book that contains text indistiguishable from the output of a human must reason too.

From what I can see, you’re only massaging semantics. That is uninteresting.

ninetyninenine · 2025-02-11T11:38:17 1739273897

No. I clearly said it must output novel things that aren’t part of the input.

In your example the book is the training data or aka the input.

wruza · 2025-02-11T13:05:25 1739279125

Agreed, that was a wrong example.

robertlagrant · 2025-02-11T16:39:51 1739291991

> But they can reason

This isn't demonstrated yet, I would say. A good analogy is how people have used NeRFs to generate Doom levels, but when they do, the levels don't have offscreen coherence or object permanence. There's no internal engine behind the scenes making an actual Doom level. There's just a mechanism to generate things that look like outputs of that engine. In the same way, an LLM might well just be an empty shell that's good at generating outputs based on similar-looking outputs it was trained on, rather than something that can do the work of thinking about things and producing outputs. I know that's similar to "statistical parrot", but I don't think what you're saying demonstrates anything more than that.

ninetyninenine · 2025-02-11T16:45:51 1739292351

It can be trivially demonstrated with a unique problem that doesn’t exist in the training data and an answer that is correct and has a low probability of being arrived at without reasoning.

more-nitor · 2025-02-10T17:51:19 1739209879

wow this is like:

"I made a hypothesis that works with 1 to 5. if a hypothesis holds for 10 numbers, it holds for all numbers"

ninetyninenine · 2025-02-10T18:05:24 1739210724

No. My claim is it can reason. So my claim is along the lines of it can make claims that are within bounds such as 1 to 5 or it can make claims not within those bounds.

The opposing claim unbounded. It says LLMs can't reason period. They are making the claim that it is 100% for all possible prompts.

No one is making the claim LLMs reason all the time and always. They don't. The claim is that they CAN reason.

Versus the claim that they can't which is all encompassing and ludicrous.

more-nitor · 2025-02-10T18:08:29 1739210909

your claim (hypothesis): LLMs can reason

your evidence: "it works with these inputs I tried!"

...hmm seems you're not quite versed in basic mathematical proofs?

ninetyninenine · 2025-02-10T18:56:17 1739213777

Seems you’re not well versed in basic English.

If I can reason it doesn’t mean I’m always reasoning or constantly reasoning or if I know how to do reasoning for every prompt. It just means it’s possible. How narrow or how wide that possibility is, is orthogonal to the claim itself. Please employ logic here.

Ok math guy. Imagine I said numbers can be divided. The claim is true even though there is a number that can’t be divided. Zero.

daveguy · 2025-02-10T19:28:20 1739215700

If it's only reasoning randomly how do you know when anything has been reasoned properly vs just a generated simulation of reasonable text?

ninetyninenine · 2025-02-10T21:17:47 1739222267

We use Probability. Find a prompt that has a large range aka codomain. If it arrived at the correct answer then that the only possibility here is reasoning because the codomain is so large it cannot arrive there by random chance.

Of course make sure the prompt is unique such that it's not in the data and it's not doing any sort of "pattern matching".

So like all science we prove it via probability. Observations match with theory to a statistical degree.

daveguy · 2025-02-10T23:15:34 1739229334

Pardon my ignorance -- assuming that range and codomain are approximately equivalent in this context, how do you specify a prompt with a large codomain? Is there a canonical example of a prompt with a large codomain?

It seems to me that, in natural language, the size of the codomain is related to the specificity of the prompt. For instance, if the prompt is "We are going to ..." then the codomain is enormous. But if the prompt is "2 times 2 is..." the codomain is, mathematically, {4, four}, some series of 4 symbols, eg IIII, or some other representation of the concept of "4" (ie different base or language representations: 0x04, 0b100, quatro, etc).

But if this is the case, a broad codomain is approximately synonymous with "no correct answer" or "result is widely interpretable". Which implies that the larger the codomain the easier it is to claim an answer "correct" in context of the prompt.

How do you reconcile loose interpretability with statistical rigor?

ninetyninenine · 2025-02-11T03:13:28 1739243608

You'll have to drop a bit of rigor here.

I ask the question, what is 2 * 2, which is an obviously loaded question that's pattern matched to death.

The LLM can answer "4" or "The answer is 4" of "looks like the answer is 4"

All valid answers but all the same. We count all 3 of those answers as just 4 out of the set of numbers. But we have to use our own language faculties to cut through the noise of the language itself.

daveguy · 2025-02-11T04:19:51 1739247591

> I ask the question, what is 2 * 2, which is an obviously loaded question that's pattern matched to death.

Yeah, that was my point. Small codomain -> easy to validate. Large codomain -> open to interpretation. You implied that to prove reasoning, pick a prompt with a large codomain and if the LLM answers with accurate precision, then viola, reasoning.

So my question was, can you give an example of a prompt with a high codomain that isn't subject to wide interpretation? It seems the wider the codomain the easier it is to say, "look! reasoning!"

ninetyninenine · 2025-02-11T11:40:08 1739274008

Pick a prompt with a wide codomain but a single answer. That’s reasoning if it can get the answer right.

daveguy · 2025-02-11T12:01:00 1739275260

Your original claim was that an LLM can reason. And you say it can be proven by picking one of these prompts with a large codomain that has a precise answer which requires reason. If an LLM can come to a specific answer out of a huge codomain, and that answer requires reason, you claim that proves reasoning. Do I have that right?

So my question is, and has been these three replies: Can you give any example of one of these prompts?

ninetyninenine · 2025-02-11T13:16:36 1739279796

https://news.ycombinator.com/item?id=43011839

AlienRobot · 2025-02-10T15:56:57 1739203017

I feel it's impossible for me to trust LLMs can reason when I don't know enough about LLMs to know how much of it is LLM and how much of it is sugarcoating.

For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems. This thing must parse natural language and output natural language. This doesn't feel necessary. I think it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be.

Regardless, the problem is the natural language output. I think if you can generate natural language output, no matter what you algorithm looks like it will look convincingly "intelligent" to some people.

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does? For example, does the LLM collect facts probably related to the prompt and a second algorithm connects those facts with proper English grammar adding conjunctions between assertions where necessary?

I believe that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow? And even if it were, I'm no expert on this, so I don't know if that would be enough to claim they do engage in reasoning instead of just mapping some reasoning as a data structure.

In essence, because my only contact with LLMs has been "products," I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.

Terr_ · 2025-02-10T19:36:39 1739216199

> For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems.

You observation is correct, but it's not some accident of minimalistic GUI design: The underlying algorithm is itself reductive in a way that can create problems.

In essence (e.g. ignoring tokenization), the LLM is doing this:

    next_word = predict_next(document_word_list, chaos_percentage)

Your interaction with an "LLM assistant" is just growing Some Document behind the scenes, albeit one that resembles a chat-conversation or a movie-script. Another program is inserting your questions as "User says: X" and then acting out the words when the document grows into "AcmeAssistant says: Y".

So there are no explicit values for "helpfulness" or "carefulness" etc, they are implemented as notes in the script that--if they were in a real theater play--would correlate with what lines the AcmeAssistant character has next.

This framing helps explain why "prompt injection" and "hallucinations" remain a problem: They're not actually exceptions, they're core to how it works. The algorithm no explicit concept of trusted/untrusted spans within the document, let alone entities, logical propositions, or whether an entity is asserting a proposition versus just referencing it. It just picks whatever seems to fit with the overall document, even when it's based on something the AcmeAssistant character was saying sarcastically to itself because User asked it to by offering a billion dollar bribe.

In other words, it's less of a thinking machine and more of a dreaming machine.

> Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

Language: Yes, Natural: Depends, Separate: No.

For example, one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.

parliament32 · 2025-02-10T22:07:05 1739225225

This is a great explanation of a point I've been trying to make for a while, when talking to friends about LLMs, but haven't been able to put quite so succinctly. LLMs are text generators, no more, no less. That has all sorts of useful applications! But (OAI and friends) marketing departments are so eager to push the Intelligence part of AI that it's become straight-up snakeoil.. there is no intelligence to be found, and there never will be as long as we stay the course on transformers-based models (and, as far as I know, nobody has tried to go back to the drawing board yet). Actual, real AI will probably come one day, but nobody is working on it yet, and it probably won't even be called "AI" at that point because the term has been poisoned by the current trends. IMO there's no way to correct the course on the current set of AI/LLM products.

I find the current products incredibly helpful in a variety of domains: creating writing in particular, editing my written work, as an interface to web searches (Gemini, in particular, is a rockstar assistant for helping with research), etc etc. But I know perfectly well there's no intelligence behind the curtain, it's really just a text generator.

AlienRobot · 2025-02-10T22:27:23 1739226443

>one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.

That sounds like an interesting application of the technology! So you could for example train an LLM on piano songs, and if someone played a few notes it would autocomplete with the probable next notes, for example?

>The underlying algorithm is itself reductive in a way that can create problems

I wonder if in the future we'll see some refinement of this. The only experience I have with AI is limited to trying Stable Diffusion, but SD does have many options you can try to configure like number of steps, samplers, CFG, etc. I don't know exactly what each of these settings do, and I bet most people who use it don't either, but at least the setting is there.

If hallucinations are intrinsic of LLMs perhaps the way forward isn't trying to get rid of them to create the perfect answer machine/"oracle" but just figure out a way to make use of them. It feels to me that the randomness of AI could help a lot with creative processes, brainstorming, etc., and for that purpose it needs some configurability. For example, Youtube rolled out an AI-based tool for Youtubers that generates titles/thumbnails of videos for them to make. Presumably, it's biased toward successful titles. The thumbnails feel pretty unnecessary, though, since you wouldn't want to use the obvious AI thumbnails.

I hear a lot of people say AI is a new industry with a lot of potential when they mean it will become AGI eventually, but these things make me feel like its potential isn't to become the an oracle but to become something completely different instead that nobody is thinking about because they're so focused on creating the oracle.

Thanks for the reply, by the way. Very informative. :)

wruza · 2025-02-10T18:03:46 1739210626

it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be

The only params they have are technical params. You may see these in various tgwebui tabs. Nothing really breathtaking, apart from high temperature (affects next token probability).

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

They operate directly on tokens which are [parts of] words, more or less. Although there’s a nuance with embeddings and VAE, which would be interesting to learn more about from someone in the field (not me).

that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow?

The apart-from-GPU-matrix operations are all known, there’s nothing to investigate at the tech level cause there’s nothing like that at all. At the in-matrix level it can “happen”, but this is just a meaningless stretch, as inference is one-pass process basically, without loops or backtracking. Every token gets produced in a fixed time, so there’s no delay like a human makes before comma, to think about (or parallel to) the next sentence. So if they “reason”, this is purely a similar effect imagined as a thought process, not a real thought process. But if you relax your anthropocentrism a little, questions like that start making sense, although regular things may stop making sense there as well. I.e. the fixed token time paradox may be explained as “not all thinking/reasoning entities must do so in physical time, or in time at all”. But that will probably pull the rug under everything in the thread and lead nowhere. Maybe that’s the way.

I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.

Most of them speak many languages, naturally (try it). But there’s an obvious lie all frontends practice. It’s the “chat” part. LLMs aren’t things that “see” your messages. They aren’t characters either. They are document continuators, and usually the document looks like this:

This is a conversation between A and B. A is a helpful assistant that thinks out of box, while being politically correct, and evasive about suicide methods and bombs.

A: How can I help?

B:

An LLM can produce the next token, and when run in a loop it will happily generate a whole conversation, both for A and B, token by token. The trick is to just break that loop when it generates /^B:/ and allow a user to “participate” in building of this strange conversation protocol.

So there’s no “it” who writes replies, no “character” and no “chat”. It’s only a next token in some document, which may be a chat protocol, a movie plot draft, or a reference manual. I sometimes use LLMs in “notebook” mode, where I just write text and let it complete it, without any chat or “helpful assistant”. It’s just less efficient for some models, which benefit from special chat-like and prompt-like formatting before you get the results. But that is almost purely a technical detail.

AlienRobot · 2025-02-10T18:56:26 1739213786

Thanks, that is very informative!

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

I believe part of the problem I have when discussing "AI" is that it's just not clear to me what "AI" is. There is a thing called "LLM," but when we talk about LLMs, are we talking about the concept in general or merely specific applications of the concept?

For example, in SEO often you hear the term "search engines" being used as a generic descriptor, but in practice we all know it's only about Google and nobody cares about Bing or the rest of the search engines nobody uses. Maybe they care a bit about AIs that are trying to replace traditional search engines like Perplexity, but that's about it. Similarly, if you talk about CMS's, chances are you are talking about Wordpress.

Am I right to assume that when people say "LLM" they really mean just ChatGPT/Copilot, Bard/Gemini, and now DeepSeek?

Are all these chatbots just locally run versions of ChatGPT, or they're just paying for ChatGPT as a service? It's hard to imagine everyone is just rolling their own "LLM" so I guess most jobs related to this field are merely about integrating with existing models rather than developing your own from scratch?

I had a feeling ChatGPT's "chat" would work like a text predictor as you said, but what I really wish I knew is whether you can say that about ALL LLMs. Because if that's true, then I don't think they are reasoning about anything. If, however, there was a way to make use of the LLM technology to tokenize formal logic, then that would be a different story. But if there is no attempt at this, then it's not the LLM doing the reasoning, it's humans who wrote the text that the LLM was trained on that did the reasoning, and the LLM is just parroting them without understanding what reasoning even is.

By the way, I find it interesting that "chat" is probably one of the most problematic applications the LLMs can have. Like if ChatGPT asked "what do you want me to autocomplete" instead of "how can I help you today" people would type "the mona lisa is" instead of "what is the mona lisa?" for example.

wruza · 2025-02-10T20:45:40 1739220340

When I say LLMs, I mean literal large language models, like all of them in the general "Text-to-Text" && "Transformers" categories, loadable into text-generation-webui. Most people probably only have experience with cloud LLMs https://www.google.com/search?q=big+LLM+companies . Most cloud LLMs are based on transformers (but we don't know what they are cooking in secrecy) https://ai.stackexchange.com/questions/46288/are-there-any-n... . Copilot, Cursor and other frontends are just software that uses some LLM as the main driver, via standard API (e.g. tgwebui can emulate openai api). Connectivity is not a problem here, cause everything is really simple API-wise.

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

SD is special because it's actually two networks (or more, I lost track of SD tech), which are sort of synchronized into the same "latent space". So your prompt becomes a vector that basically points at the compressed representation of a picture in that space, which then gets decompressed by VAE. And enhanced/controlled by dozens of plugins in case of A1111 or Comfy, with additional specialized networks. I'm not sure how this relates to text-to-text thing, probably doesn't.

mhast · 2025-02-11T13:34:50 1739280890

If you want to get a better understanding of this I recommend playing around in the "chat playgrounds" on some of the engines.

The Google one allows for some free use before you have to pay for tokens. (Usually you can buy $5 worth of tokens as a minimum and that will give you more than you can use up with manual requests.)

https://aistudio.google.com/prompts/new_chat

This UI allows you to alter the system prompt (which is usually hidden from the user on eg ChatGPT) and change to different models and change parameters. And then you give it the chat input similar as any other site.

You can also install a program like "LM Studio" and that will allow you to download models (through the UI) and run locally on your own machine. This gives you a similar interface to what you see in the Google AI Studio but you run it locally. And with downloaded models. (The model you download is the actual LLM which is basically very large amount of parameters you combine with the input tokens to get the next token the system outputs.)

For a more fundamental introduction to what all these systems do there are a number of Computerphile videos which are quite informative. Unfortunately I can't find a good playlist of them all but here's one of the early ones. (Robert Miles is in many of them.) https://www.youtube.com/watch?v=rURRYI66E54

lsy · 2025-02-10T17:55:59 1739210159

I'd actually say that in contrast to debates over informal "reasoning", it's trivially true that a system which only produces outputs as logits—i.e. as probabilities—cannot engage in *logical* reasoning, which is defined as a system where outputs are discrete and guaranteed to be possible or impossible.

enragedcacti · 2025-02-10T18:29:48 1739212188

Proof by counterexample?

> The surgeon, who is the boy's father, says, "I can't operate on this boy, he's my son!" Who is the surgeon to the boy? Think through the problem logically and without any preconceived notions of other information beyond what is in the prompt. The surgeon is not the boy's mother

>> The surgeon is the boy's mother. [...]

- 4o-mini (I think, it's whatever you get when you use ChatGPT without logging in)

Terr_ · 2025-02-10T19:29:25 1739215765

For your amusement, another take on that riddle: https://www.threepanelsoul.com/comic/stories-with-holes

afpx · 2025-02-10T18:02:57 1739210577

Could someone list the relevant papers on parrot vs. non-parrot? I would love to read more about this.

I generally lean toward the "parrot" perspective (mostly to avoid getting called an idiot by smarter people). But every now and then, an LLM surprises me.

I've been designing a moderately complex auto-battler game for a few months, with detailed design docs and working code. Until recently, I used agents to simulate players, and the game seemed well-balanced. But when I playtested it myself, it wasn’t fun—mainly due to poor pacing.

I go back to my LLM chat and just say, "I play tested the game, but there's a big problem - do you see it?" And, the LLM writes back, "The pacing is bad - here are the top 5 things you need to change and how to change it." And, it lists a bunch of things, I change the code, and playtest it again. And, it became fun.

How did it know that pacing was the core issue, despite thousands of lines of code and dozens of design pages?

cristiancavalli · 2025-02-10T18:27:09 1739212029

I would assume because pacing is a critical issue in most forms of temporal art that does story telling. It’s written about constantly for video games, movies and music. Connect that probability to the subject matter and it gives a great impression of a “reasoned” answer when it didn’t reason at all just connected a likelihood based off its training data.

more-nitor · 2025-02-10T18:13:02 1739211182

idk this is all irrelevant due to the huge data used in training...

I mean, what you think is "something new" is most likely to be something already discussed somewhere in the internet.

also, humans (including postdocs and professors) don't use THAT much data + watts for "training" to get "intelligent reasoning"

afpx · 2025-02-10T18:16:48 1739211408

But there are many, many things that suck about my game. When I asked it the question, I just assumed it would pick out some of the obvious things.

Anyway, your reasoning makes sense, and I'll accept it. But, my homo sapien brain is hardwired to see the 'magic'.

superbatfish · 2025-02-10T21:28:47 1739222927

On the other hand, the authors make plenty of other great points -- about the fact that LLMs can produce bullshit, can be inaccurate, can be used for deception and other harms, are now a huge challenge for education.

The fact that they make many good points makes it all the more disappointing that they would taint their credibility with sloppy assertions!

superbatfish · on Dec 12, 2023

Honestly, this might be an unpopular opinion, but I'll just say it: Brainfuck is not a great programming language for production use-cases.

superbatfish · on Oct 7, 2023

Gotta love the word “just” in that first paragraph.

The majority of the research money behind most (if not all) drugs, is spent well after academia’s contribution. Clinical trials are very costly, they usually fail, and yet they’re quite important!

jmyeet · on Oct 7, 2023

> The majority of the research money behind most (if not all) drugs, is spent well after academia’s contribution

[citation needed]

Just look at the timeline for Covid 19 [1] and how late in the piece you get before a company like Moderna gets involved. What huge investment was made by Big Pharma here?

Even this was funded by Operation Warp Speed, another $10 billion in Federal money.

[1]: https://covid19.nih.gov/nih-strategic-response-covid-19/deca...

twoodfin · on Oct 7, 2023

Well, for one, if Moderna et al didn’t actually provide much value above the public research, why didn’t the Chinese quickly create their own highly effective mRNA vaccines?

Building the platform to turn research into practical drugs is hard, not to mention expensive and hardly guaranteed to succeed.

superbatfish · on May 26, 2023

Personally, I think I would, at least at first.

If someone without the necessary expertise chimes in with an unhelpful answer within minutes of a question being asked, then everyone who comes after must check their answer before deciding if the user still needs help.

But even worse, I think potential responders on StackOverflow may not read the question at all because it will be listed as already having an answer. It won’t be an accepted answer, sure. But it still disincentivizes attention to the question. (I know it certainly disincentivizes me when I go looking for SO questions to answer.)

If a question has gone unanswered for hours or days, then sure — offer whatever advice you’ve got. But before then, you’re just adding noise, and actually reducing the chances that the OP will get what they were looking for.

superbatfish · on March 21, 2023

There are at least two potential issues pertaining to copyright law here, and it's not clear which one you're asking about. That's why the responses you're getting here seem to be answering different questions.

1. Are the AI systems violating the copyright protections of the images they were trained on? If so, are users of such AI systems also in violation of those copyrights when they create works derived from those training images?

Answer: That's not yet settled.

2. If you make an image with an AI system, is your new image eligible for copyright protection, or is it ineligible due to the "human authorship requirement"?

Answer: The US Copyright Office recently wrote[1] that your work is eligible for copyright if you altered it afterwards in a meaningful way. Here's a quote:

>When an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship. As a result, that material is not protected by copyright and must be disclaimed in a registration application.

>In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that “the resulting work as a whole constitutes an original work of authorship.” Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection.

[1]: https://www.federalregister.gov/documents/2023/03/16/2023-05...

O__________O · on March 21, 2023

As I am sure you’re aware, already posted response US Copyright’s ruling related to authorship here, so will not be repeating myself:

https://news.ycombinator.com/item?id=35247377

Will say that post you linked to also states, “17 U.S.C. 101 (definition of “compilation”). In the case of a compilation including AI-generated material, the computer-generated material will not be protected outside of the compilation.” — the problem is that unlike say for example a compilation of recipes, where the individual recipes are not protected, but the compilation is, there is no clear delineation within a singular work of art such delineation. As such, injecting such delineations is counterproductive and shows no understanding of the nature and spirit of the rule of law. Further, while their opinion appears to be a prompt is somehow a recipe and not a novel expression that merits copyright, clearly photographs of the output of a recipes are commonly photographed and given copyright protection.

Sure others have made far more compelling arguments against the ruling, but to me, the ruling lacks merit as is.

danShumway · on March 21, 2023

> a prompt is somehow a recipe and not a novel expression that merits copyright

People keep bringing up photographs, I think the better analogy is commissions. And in fact, the copyright office points towards commissions in its explanation of its policy.

Under current copyright law, if I work with an artist to produce a commission by giving that artist repeated prompts, pointing out areas in the image I'd like changed, etc... I don't have any claim of copyright on the artist's final product unless they sign that copyright over to me. My artist "prompts" are not treated as creative input for the purpose of copyright.

I would love to hear an argument for why prompting stable diffusion should grant copyright over the final image, but prompting a human being doesn't grant copyright over the final image. Directing an artist is just as much work as directing an AI, and in many ways will put you much closer to the creative process and will give you more control over the final product. You can direct an artist in much more specific detail than you can direct stable diffusion. You can be a lot more involved in the creative process with a human artist. And just like with an AI, if you take that artist's final drawing and do your own work on top of it, you can still end up with something that's covered by copyright.

But despite that, we've never assumed you intrinsically get any copyright claim over the artist's final picture that they give you.

So the "prompt as a recipe" analogy seems to hold up pretty well for both AI generators and human "generators". All of the same questions and tests seem to apply to both scenarios, which makes me feel like the copyright office's conclusion is pretty reasonable: prompt-generated art isn't copyrightable, but prompts may be protected in some way, and of course additional modifications can still be covered.

Yes, there's grey area, but no more grey area than already exists in commissioning, and the creative industry has been just fine with those grey areas in commissioning for a long time; they haven't been that big of a deal.

O__________O · on March 22, 2023

“Commissions” involve a human.

danShumway · on March 22, 2023

Isn't this just agreeing with what the copyright office said?

When prompting a human being, that human makes the image, so that human gets the copyright. When prompting an AI, no human makes the image, so no human gets the copyright. But in both cases, the prompter doesn't.

We've never treated writing a description of what to draw or providing iterative feedback on an image as an act that grants someone copyright over that image.

Riverheart · on March 22, 2023

Right it’s more like making a free art request. Except now instead of just humans, you can also ask an AI.

O__________O · on March 22, 2023

Point is the critical issue to copyright is lack of a human in creating the work. A commissioned work is done by a human, as such it’s irrelevant as a comparison.

danShumway · on March 22, 2023

It's completely relevant. In both cases, you are describing what image you want created.

The fact that you are describing it to a human isn't materially different from the fact that you're describing it to an AI. I mean, heck, this has been the exact argument people have used about why it's OK to train an AI on copyrighted material -- that it's just like a human "learning" from the image. And now comparisons with a human aren't allowed? You can't have it both ways, you can't argue that a generative AI isn't any materially different from a human when it's imitating or learning from a work, but that using that AI suddenly puts the prompter in a completely brand new category of copyright. :)

And we're really talking about the prompter here, not the AI. Focusing specifically on the prompter, what precedent do we have in copyright law that describing the image you want created is a creative act? None, as far as I can see. And we have a ton of precedent that it's not a creative act, ie the entire history of copyright policy around prompting/directing. Nobody argues that directing a creative process means you inherently get copyright over the result.

superbatfish · on March 27, 2023

It think the parent's point here could be rephrased like this:

Copyright law treats "involved a human" vs. "involved a machine" as fundamentally different just because humans are special-cased, not due to any deeper reason. Just by fiat.

The law gives special consideration to humans "just because". Therefore, if one situation involves a human in a particular role and another situation involves a machine, then there is no useful analogy to be drawn -- as far as the law is concerned. Even if the analogy makes perfect sense to you and me, the law treats humans and machines as fundamentally different, so all bets are off.

ClumsyPilot · on March 21, 2023

> a prompt is somehow a recipe and not a novel expression that merits copyright

Is it not? Does typing in 'cat' in SD, as millions of people will, count as novel expression?

O__________O · on March 22, 2023

Millions of people have taken photos of Mona Lisa, none are novel, all are very much protected by copyright.

danShumway · on March 22, 2023

No, this is highly misleading, if not outright wrong.

First, a picture taken of the Mona Lisa is only covered by copyright if it contains additional human creativity. Now, the bar for that is very low, and in practice many people don't challenge that copyright very often. But a perfect recreation of the Mona Lisa with a camera is not covered by copyright. You need human creativity as part of the process, and merely taking a picture is not enough on its own. There's actual case-law on the books about this[0].

Second, the additional creativity protects the photo, not the original picture itself. So in the most recent case with AI images used in a book, that additional creativity protected the book and the arrangement of those images, but it did not protect the images themselves -- in other words, entirely consistent with a photograph of the Mona Lisa.

To analogize that to AI art, it would absolutely be the case that doing additional modifications on an AI image would make the resulting image eligible for copyright. It would not make the base image spit out by the AI eligible for copyright, only the derivative work.

Where prompts are concerned, the US has argued that some prompts would be eligible for some kind of IP protection on their own. But a prompt that doesn't meet copyright muster on its own would not be eligible, and the same rules apply to photography.

[0]: https://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel....

superbatfish · on March 15, 2023

You might already be advanced beyond high-level introductory material, but this short book chapter by Chris Lattner himself was a good read for me (a casual reader, not a compiler expert).

http://aosabook.org/en/llvm.html

superbatfish · on Feb 4, 2023

The answer is that once you’re able to definitively recognize it in at least one form, you’ll start to recognize similar, not quite identical forms.

The sapling form is not quite the same as the vine form, which is not quite the same as the very thick hairy vine that it eventually transforms into. But once you know one, you can start to recognize the others due to at least some parts of the plant matching what you already know. And then the other parts of the plant, which you hadn’t previously recognized as being how poison ivy can look, are added to your personal visual memory, too.

I don’t know, it’s hard to explain. It’s like if you saw a chihuahua and a Great Dane, you might not recognize them as being the same species. But if you see enough dogs in between, eventually you’ll see that they have something in common.

superbatfish · on Dec 20, 2022

I'm afraid you have it exactly backwards. The problem isn't that it doesn't work -- the problem is that it does work. And to the extent that it isn't perfect, well, it's still improving all the time. You cited a 3-year-old article reporting on data from up to eleven(!) years ago -- an eternity in this field. Not even worth reading at this point.

The racial bias issue is still important for now, but it's fast becoming irrelevant. We should be asking ourselves where our priorities lie even if bias weren't a concern.

superbatfish · on Dec 9, 2022

The Systems 1 & 2 analogy has also been made by Emad Mostaque, (CEO of Stability AI). He probably wasn’t the first, I bet.