Modern-Day Oracles or Bullshit Machines? How to thrive in a ChatGPT world

aidos · 2025-02-09T09:16:14 1739092574

This is amazing!

I was speaking to a friend the other day who works in a team that influences government policy. One of the younger members of the team had been tasked with generating a report on a specific subject. They came back with a document filled with “facts”, including specific numbers they’d pulled from a LLM. Obviously it was inaccurate and unreliable.

As someone who uses LLMs on a daily basis to help me build software, I was blown away that someone would misuse them like this. It’s easy to forget that devs have a much better understanding of how these things work, can review and fix the inaccuracies in the output and tend to be a sceptical bunch in general.

We’re headed into a time where a lot of people are going to implicitly trust the output from these devices and the world is going to be swamped with a huge quantity of subtly inaccurate content.

eclecticfrank · 2025-02-10T09:56:27 1739181387

This is not something only younger people are prone to. I work in a consulting role in IT and have observed multiple colleagues aged 30 and above use LLMs to generate content for reports and presentations without verifying the output.

Reminded me of wikipedia-sourced presentations in high school in the early 2000s.

aqueueaqueue · 2025-02-09T09:18:40 1739092720

I made the same sort of mistake with the internet being young back in 93! Having a machine do it for you can easily turn into brain switch off.

hunter-gatherer · 2025-02-09T14:46:16 1739112376

I keep telling everyone that the only reason I'm paid well to do "smart person stuff" is not because I'm smart, but because I've steadily watched everyone around me get more stupid over my life as a result of turning their brain switch off.

I agree a course like this needs to exist, as I've seen people rely on chatGPT for a lot of information. Just yesterday I demonstrated with some neighbors about how easily it could spew bullshit if you sinply ask it leading questions. A good example is "Why does the flu inpact men worse than women"/"Why foes the flu impact women worse than men". You'll get affirmative answers for both.

directevolve · 2025-02-10T22:01:04 1739224864

If men are more likely to die from flu if infected, and women more likely to be infected, an affirmative answer to both questions could be reasonable. When you take into account uncertainty about the goals, knowledge and cognitive capacity of the person asking the question, it's not obvious to me how the AI ought to react to an underspecified question like this.

Edit: When I plug this into a temporary chat on o3-mini, it gives plausible biochemical and behavioral mechanisms that might explain a gender difference in outcomes. Notably, the mechanisms it proposes are the same for both versions of the question, and the framing is consistent.

Specifically, for the "men worse than women" and "women worse than men" questions, it proposes hormone differences, X-linked immune regulatory genes, and medical care-seeking differences that all point toward men having worse outcomes than women. It describes these factors in both versions of the question, and in both versions, describes them as explaining why men have worse outcomes than women.

It doesn't specifically contradict the "women have worse outcomes than men" framing. But it reasons consistently with the idea that men have worse outcomes than women either way the question is posed.

bjourne · 2025-02-10T15:02:04 1739199724

What I find frightening is how many are willing to take LLM output at face value. An argument is won or lost not on its merits, but by whether the LLM say so. It was bad enough when people took whatever was written on Wikipedia at face value, trusting an LLM that may have hardcoded biases and is munging whatever data it comes across is so much worse.

xvinci · 2025-02-10T17:01:12 1739206872

I think it brings forward all the low-performers and people who think they are smarter than they really are. In the past, many would just have stayed silent unless they recently read an article or saw something on the news by chance. Now, you will get a myriad of ideas and plans with fatal flaws and a 100% score on LLM checkers :)

jurli · 2025-02-10T18:45:08 1739213108

This is what people said about the internet too. Remember the whole "do not ever use Wikipedia as a source". I mean sure, technically correct, but human beings are generally imprecise and having the correct info 95% of the time is fine. You learn to live with the 5% error

seliopou · 2025-02-10T18:53:32 1739213612

A buddy won a bet with me by editing the relevant Wikipedia article to agree with his side of the wager.

Mistletoe · 2025-02-10T17:37:54 1739209074

I’d take the Wikipedia answer any day. Millions of eyes on each article vs. a black box with no eyes on the outputs.

JPLeRouzic · 2025-02-10T22:03:18 1739224998

> "Millions of eyes on each article"

Only a minority of users contribute regularly (126,301 have edited in the last 30 days):

https://en.wikipedia.org/wiki/Wikipedia:Wikipedians#Number_o...

And there are 6,952,556 articles in the English Wikipedia, so an average article is corrected every 55 months (more than 4 years).

It's hardly "Millions of eyes on each article"

Loughla · 2025-02-10T18:55:24 1739213724

Even Wikipedia is a problem though. There are so many pages now that self-reference is almost impossible to detect. Meaning, the citation of a statement made on Wikipedia that uses an outside article for reference, which is an article that was originally written using that very Wikipedia article as its own citation.

It's all about trust. Trust the expert, or the crowd, or the machine.

They're all able to be gamed.

bccdee · 2025-02-10T19:07:44 1739214464

False equivalence. "Nothing is perfectly unreliable, therefore everything is (broadly) unreliable, therefore everything is equally unreliable." No, some sources are substantially more reliable than others.

tremon · 2025-02-10T19:18:50 1739215130

*perfectly reliable, but yes.

tucnak · 2025-02-10T22:10:01 1739225401

> frightening

Don't be scared of "the many," they're just people, not unlike you.

AlienRobot · 2025-02-10T15:48:45 1739202525

I've seen someone use an LLM to summarize a paper to post it on reddit for people who haven't read the paper.

Papers have abstracts...

MetaWhirledPeas · 2025-02-10T18:33:21 1739212401

Sounds fun, if only to compare it to the abstract.

AlienRobot · 2025-02-10T18:58:06 1739213886

You know, these days I think the abstracts are generated by LLMs too. And the paper. Or at least it uses something like Grammarly. If things keep going this ways typos are going to be a sign of academic integrity.

lithocarpus · 2025-02-10T19:22:31 1739215351

A proper LLM will include realistic rates of typos eventually. ;)

AlienRobot · 2025-02-10T22:07:25 1739225245

Darn.

micromacrofoot · 2025-02-10T16:45:59 1739205959

People take texts full of unverifiable ghost stories written thousands of years ago at face value to the point that they base their entire lives on them.

superbatfish · 2025-02-10T15:05:03 1739199903

The author makes this assertion about LLMs rather casually:

>They don’t engage in logical reasoning.

This is still a hotly debated question, but at this point the burden of proof is on the detractors. (To put it mildly, the famous "stochastic parrot" paper has not aged well.)

The claim above is certainly not something that should be stated as fact to a naive audience (i.e. the authors' intended audience in this case). Simply asserting it as they have done -- without acknowledging that many experts disagree -- undermines the authors' credibility to those who are less naive.

cristiancavalli · 2025-02-10T15:11:55 1739200315

Disagree — proponents of this point still have yet to prove reasoning and other studies agree about “reasoning” being potentially fake/simulated: https://the-decoder.com/apple-ai-researchers-question-openai...

Just claiming a capability does not make it true and we have 0 “proof” of original reasoning that can be proved coming from these models. Especially given the potential cheating in current SOTA benchmarks

UltraSane · 2025-02-10T20:40:30 1739220030

When does a "simulation" of reasoning become so good it is no different than actual reasoning?

cristiancavalli · 2025-02-10T20:48:53 1739220533

Love this question! Really touches on some epistemological roots and certainly a prescient question in these times. I can certainly see a theoretical where we could create this simulation in totality to our perspectives and then venture out into the universe to find that this modality of intelligence would be limited in its understanding of completely new empirical experiences/phenomenon that are outside our current natural definitions/descriptions. To add to this question: might we be similarly limited in our ability to perceive these alien phenomena? I would love to read a short story or treatise on this idea!

hnthrow90348765 · 2025-02-10T17:48:34 1739209714

>Disagree — proponents of this point still have yet to prove reasoning and other studies agree about “reasoning” being potentially fake/simulated: https://the-decoder.com/apple-ai-researchers-question-openai...

???

https://the-decoder.com/language-models-use-a-probabilistic-...

cristiancavalli · 2025-02-10T18:08:27 1739210907

Yes people are claiming different things yet no definitive proof has been offered given the varying findings. I can cite another 3 papers which agree with my point and you can probably cite just as many if not more supporting yours. I’m arguing against people depicting what is not a forgone conclusion as such. It seems like in people’s rush to confirm their own preconceived notions people forget that, although a theory may be convincing, it may not be true. Evidence in this very thread of a well-known SOTA LLM not being able to tell which is greater between two numbers indicates to me that what is being called “reasoning” is not what humans do. We can make as many excuses we want per the tokenizer or whatever but then forgive me for not buying the super or even general “intelligence” of this software. I still like these tools though, even if I have to constantly vet everything they say as they often tend to just outright lie, or perhaps more accurately: repeat lies in their training data even if you can elicit a factual response on the same topic.

semiquaver · 2025-02-10T19:45:16 1739216716

What would definitive proof look like? Can you definitively prove that your brain is capable of reasoning and not a convincing simulation of it?

cristiancavalli · 2025-02-10T19:58:47 1739217527

I can’t and that’s pretty cool to think about! Of course if we’re going that far down the chain of assumption we’re not quite ready to talk about LLMs imo (then again maybe it would be the perfect place to talk about them as contrast/comparison; certainly exciting ideas in that light).

From my own perspective: if we’re gonna say these things reason and we’re using the definition of reasoning we apply to humans, then being able to reason through the trivial cases they fail to today would be a start. To the proponents of “they reason sometimes but not others” my question is why? What reason does it have to not reason and why if it is reasoning it still fails on trivial things that are variations of its own training data? I would also expect that these models would use reasoning to find new things like humans do but without humans essentially guiding the model to the correct awnser or the model just brute-forcing a problem-space with a set of rules/heuristics. Not exhaustive but a good start I think. These models have trouble currently even doing the advertised things like “book a trip for me” once a UI update happens so I think it’s a great indication we don’t quite have the intelligence/reasoning aspect worked out.

Another question I have: would a form of authentic reasoning in a model give rise to a model having an aesthetic? Could this be some sort of indicator of having created a “model of the world”? Does the model of the world perhaps imply a value judgement about it given that if one was super intelligent wouldn’t one of the first things realized be the limitations of its own understanding even given the restrictions of time and space and not ever potentially being able to observe the universe in its entirety? Perhaps a perfect super intelligence would just evaporate/transcend like in the Culture series. What a time to be alive!

ninetyninenine · 2025-02-10T17:00:15 1739206815

It’s stupid. You can prove that LLMs can reason by simply giving it a novel problem where no data exists and having it solve that problem.

LLMs CAN reason. Whether it can’t reason is not provable. To prove that you have to give the LLM every possible prompt that it has no data for and effectively show it never reasons and gets it wrong all the time. Not only is the proof impossible but it’s already been falsified as we have demonstrable examples of LLMs reasoning.

Literally I invite people to post prompts and correct answers to ChatGPT where it is trivially impossible for that prompt to exist in the data. Every one of those examples falsifies the claim that LLMs can’t reason.

Saying LLMs can’t reason is an overarching claim similar to the claim that humans and LLMs always reason. Humans and LLMs don’t always reason. But they can reason.

cristiancavalli · 2025-02-10T17:30:24 1739208624

Saying something again does not provide proof of its actual veracity. Writing it in caps does not make it true despite the increased emphasis. I default to skepticism in the face of unproven assertions: if one can’t prove that they reason then we must accept the possibility that they do not. There are myriad examples of these models failing to “reason” about something that would trivial for a child or any other human (some are even given as examples in this posts other comments). Given this and the lack of concrete proof I currently tend to agree with the Apple researchers conclusion.

ninetyninenine · 2025-02-10T18:01:50 1739210510

I can prove LLMs can reason. You cannot prove LLMs can't reason. This is easily demonstrable. LLMs failing to reason is not proof LLMs can't reason, it's just proof that an LLM didn't reason for that prompt.

All I have to do is show you one prompt with a correct answer that cannot be arrived at with pattern matching and the prompt can only be arrived at through reasoning. One. You have to demonstrate this for EVERY prompt if you want to prove LLMs can't reason.

cristiancavalli · 2025-02-10T18:14:33 1739211273

No I can “prove” it — look at any number of cases where LLMs can’t even do basic value comparisons despite being claimed as super intelligent. You can try and say well that’s a limitation of the technology and then I would reply — yes and that’s why I would say it’s not reasoning according the original human definition. Also you have yet to produce any evidence of reasoning and claiming you can over and over again doesn’t add to your arguments substance. I would be interested in your proof that some answer can’t be pattern matched too — at this point I wonder if we could create an non conscious “intelligence” that if large enough would be mostly able to describe anything known to us along some line of probability we couldn’t compute with our brain architecture and it could be close to 99.99999% right. Even if we had this theoretical probability-based super intelligence it still wouldn’t be “reasoning” but could be more “intelligent” than us.

I’m also not entirely convinced we can’t arrive at a reasoning system via probability only (a really cool thought experiment) but these systems do not meet the consistency/intelligence bar for me to believe this currently.

ninetyninenine · 2025-02-10T18:48:41 1739213321

LLMs can reason they just don’t always reason.

That’s the claim everyone makes. That is a human definition if it reasoned one time correctly. That is the colloquial definition.

Someone who has brain damage can reason correctly on certain subjects and incorrectly on other subjects. This is an immensely reasonable definition. I’m not being pedantic or out of line here when I say LLMs can reason while using this definition.

Nobody is making the claim that LLMs reason like humans or are human or reason perfectly every time. Again the claim is: LLMs are capable of reasoning.

cristiancavalli · 2025-02-10T19:12:04 1739214724

I still think the jury is out on this given that they seem to fail on obvious things which are trivially reasoned about by humans. Perhaps they reason differently at which point I would need to understand how this reasoning is different from a humans reasoning (perhaps biological reasoning more generally?) and then I would want to consider whether one ought to call it reasoning given its differences (if there are any at the time of sampling). I understand your claim I’m just not buying it based on the current evidence and my interacting with these supposed “super intelligences” every day. I still find these tools valuable, just unable to “reason” about a concept which makes me think, as powerful and meaning filled as language is, our assumption of reasoning might just be a trick of our brain reasoning through a more tightly controlled stochastic space and us projecting the concept of reasoning onto a system. I see the COT models contort and twist language in a simulacrum of “reasoning” but any high school English teacher can tell you there is a lot of text written that appears to logically reason but doesn’t actually do anything of the sort once read with the requisite knowledge in the subject matter.

ninetyninenine · 2025-02-10T20:58:42 1739221122

They can fail at reasoning. But they can demonstrably succeed to.

So the the statement that they CAN reason is demonstrably true.

Ok if given a prompt where the solution can only be arrived at by reasoning and the LLM gets to the solution for that single prompt, then how can you say it can't reason?

cristiancavalli · 2025-02-10T21:26:31 1739222791

Given your set of theoreticals then I would concede, yes the model is reasoning. At that point, though, the world would probably be far more concerned with your finding of a question that can only be met via reasoning and would be uninfluenced or paralleled by any empirical phenomenon including written knowledge as a medium of transference. The core issue I see here is you being able to prove that the model is actually reasoning in a concrete way that isn’t just a simulacrum like the Apple researchers et al. theorize it to be.

If you do find this question answer pair then it would be a massive breakthrough for science and philosophy more generally.

You say “demonstrably” but I still do not see a demonstration of these reasoning abilities that is not subject to the aforementioned criticisms.

JackSlateur · 2025-02-10T22:42:17 1739227337

Just say it : llm are random machine. Even a broken clock is right twice a day.

Miraste · 2025-02-10T17:15:28 1739207728

Answering novel prompts isn't proof of reasoning, only pattern matching. A calculator can answer prompts it's never seen before too. If anything, I would come down on the reasoning side, at least for recent CoT models-but it's not a trivial question at all.

cristiancavalli · 2025-02-10T17:23:22 1739208202

This is a fun thought experiment and made me reminisce on my Epistemology classes — something I think the current AI conversation would benefit greatly from. I’m super excited about what we’ve created here — less from the practical standpoint and more from a philosophical one where we get to interact with another form of distilled knowledge. It’s really too bad so much is breathless hype and grift because the philosophy student in me just wants to bask in thinking about this different form/medium/distillation of knowledge we now get to interact with. Comments like these help to reinvigorate that love though so thank you!

johnmaguire · 2025-02-10T17:36:22 1739208982

Are there any good Epistemology resources online? Seems like we could all benefit from this these days.

cristiancavalli · 2025-02-10T17:56:57 1739210217

I actually just sat down to crack open MITs Theory of Knowledge and it seems promising and free: https://ocw.mit.edu/courses/24-211-theory-of-knowledge-sprin...

This also looks promising:

https://hiw.kuleuven.be/en/study/prospective/OOCP/introducti...

If you wanted something a bit different Wittgenstein’s Tractatus has always made my head spin with possibilities:

https://people.umass.edu/klement/tlp/tlp-hyperlinked.html

ninetyninenine · 2025-02-10T18:02:38 1739210558

Then I'll come up with a prompt such that the answer can only be arrived at via reasoning. I only have to demonstrate this once to prove LLMs CAN reason.

Terr_ · 2025-02-10T19:22:46 1739215366

> Then I'll come up with a prompt such that the answer can only be arrived at via reasoning.

Dude, if you can formulate a question and prove an answer absolutely requires "reasoning" (defined how?) then you should drop everything and publish a paper on it immediately.

You'll have plenty of time to use your discovery to poke at LLMs after you secure your worldwide fame and recognition.

cristiancavalli · 2025-02-10T18:16:18 1739211378

I don’t think this is the watertight case you think it is, furthermore good luck proving with closed models that your question that’s never been asked in any form or derivation (supposedly) is not in the training data.

ninetyninenine · 2025-02-10T18:51:18 1739213478

It’s water tight if the claim is only LLMs CAN reason.

No one is making the claim that LLMs reason like humans or always reason correctly. Ask anyone who makes a claim similar to mine. We are all ONLY making the claim that LLMs can reason correctly. That is a small claim.

The counterclaim is LLMs can’t reason and that is a vastly expansive claim that is ludicrously unprovable.

simianparrot · 2025-02-10T19:20:54 1739215254

Go ahead then.

enragedcacti · 2025-02-10T19:03:02 1739214182

LLMs CAN read minds. Whether it can’t read minds is not provable.

Literally I invite people to post prompts and correct answers to ChatGPT where it is trivially impossible for it to have known what number you were thinking of. Every one of those examples falsifies the claim that LLMs can’t read minds.

ninetyninenine · 2025-02-10T21:15:10 1739222110

ok prove it. I'm thinking of a number right now between 1-10,000. Show me the number the LLM guesses. You can definitively prove this statement for me.

It's a probability problem really. The range of a prompt has billions of possibilities. If it arrived at a correct answer within that range then the probability it got there without reasoning is miniscule.

Same with this mind reading thing. Prove it.

enragedcacti · 2025-02-11T00:02:16 1739232136

Doesn't really seem fair that any one prompt proves your conclusion but it has to guess your exact number to prove my conclusion. Gemini guessed mine on the very first try (7) even though the range of numbers is infinite. Billions is small potatoes compared to what I've proven.

ninetyninenine · 2025-02-11T00:38:14 1739234294

I’ll pick a prompt such that the range is vast so that if it gets the answer right the probability is so small that it must have arrived there by reasoning.

wruza · 2025-02-10T17:18:29 1739207909

You can prove that LLMs can reason by simply giving it a novel problem where no data exists and having it solve that problem

They scan a hyperdimensional problem space whose facetness and capacity a single human is unable to comprehend. But there potentially exist a slice that corresponds to a problem that is novel to a human. LLMs are completely alien to us both in capabilities and technicalities, so talking about whether they can reason makes as much sense as if you replaced “LLMs” with “rainforests” or “antarctica”.

ninetyninenine · 2025-02-10T18:53:08 1739213588

Reasoning is an abstract term. It doesn’t need to be similar to human reasoning. It just needs to be able to arrive at the answer through a process.

Clearly we used the term reasoning for many varied techniques. The term doesn’t narrow to specifically one form of “human” like reasoning only.

wruza · 2025-02-10T21:12:03 1739221923

Oh, that is true. "It" doesn't have to do human reasoning, at all.

But we have to at least define "reasoning" for the given manifestation of "it". Otherwise it's just birdspeak. Because reasoning is "the action of thinking about something in a logical, sensible way", which has to happen somewhere if not finger-pointable, then at least somehow scannable or otherwise introspectable. Otherwise it's yet another omnidude in the sky who made it all so that you cannot see him, but there will be hints if you believe.

Anyway, we have to talk something specific, not handwavy. Even if you prove that they CAN reason for some definition of it, both the proof and the definition must have some predictive/scientific power, otherwise they are as useless as nil thought about it.

For example, if you prove that the reasoning is somehow embedded as a spatial in-network set of dimensions rather than in-time, wouldn't that be literally equivalent to "it just knows the patterns"? What would that term substitution actually achieve?

more-nitor · 2025-02-10T17:51:19 1739209879

wow this is like:

"I made a hypothesis that works with 1 to 5. if a hypothesis holds for 10 numbers, it holds for all numbers"

ninetyninenine · 2025-02-10T18:05:24 1739210724

No. My claim is it can reason. So my claim is along the lines of it can make claims that are within bounds such as 1 to 5 or it can make claims not within those bounds.

The opposing claim unbounded. It says LLMs can't reason period. They are making the claim that it is 100% for all possible prompts.

No one is making the claim LLMs reason all the time and always. They don't. The claim is that they CAN reason.

Versus the claim that they can't which is all encompassing and ludicrous.

more-nitor · 2025-02-10T18:08:29 1739210909

your claim (hypothesis): LLMs can reason

your evidence: "it works with these inputs I tried!"

...hmm seems you're not quite versed in basic mathematical proofs?

ninetyninenine · 2025-02-10T18:56:17 1739213777

Seems you’re not well versed in basic English.

If I can reason it doesn’t mean I’m always reasoning or constantly reasoning or if I know how to do reasoning for every prompt. It just means it’s possible. How narrow or how wide that possibility is, is orthogonal to the claim itself. Please employ logic here.

Ok math guy. Imagine I said numbers can be divided. The claim is true even though there is a number that can’t be divided. Zero.

daveguy · 2025-02-10T19:28:20 1739215700

If it's only reasoning randomly how do you know when anything has been reasoned properly vs just a generated simulation of reasonable text?

ninetyninenine · 2025-02-10T21:17:47 1739222267

We use Probability. Find a prompt that has a large range aka codomain. If it arrived at the correct answer then that the only possibility here is reasoning because the codomain is so large it cannot arrive there by random chance.

Of course make sure the prompt is unique such that it's not in the data and it's not doing any sort of "pattern matching".

So like all science we prove it via probability. Observations match with theory to a statistical degree.

daveguy · 2025-02-10T23:15:34 1739229334

Pardon my ignorance -- assuming that range and codomain are approximately equivalent in this context, how do you specify a prompt with a large codomain? Is there a canonical example of a prompt with a large codomain?

It seems to me that, in natural language, the size of the codomain is related to the specificity of the prompt. For instance, if the prompt is "We are going to ..." then the codomain is enormous. But if the prompt is "2 times 2 is..." the codomain is, mathematically, {4, four}, some series of 4 symbols, eg IIII, or some other representation of the concept of "4" (ie different base or language representations: 0x04, 0b100, quatro, etc).

But if this is the case, a broad codomain is approximately synonymous with "no correct answer" or "result is widely interpretable". Which implies that the larger the codomain the easier it is to claim an answer "correct" in context of the prompt.

How do you reconcile loose interpretability with statistical rigor?

lsy · 2025-02-10T17:55:59 1739210159

I'd actually say that in contrast to debates over informal "reasoning", it's trivially true that a system which only produces outputs as logits—i.e. as probabilities—cannot engage in *logical* reasoning, which is defined as a system where outputs are discrete and guaranteed to be possible or impossible.

enragedcacti · 2025-02-10T18:29:48 1739212188

Proof by counterexample?

> The surgeon, who is the boy's father, says, "I can't operate on this boy, he's my son!" Who is the surgeon to the boy? Think through the problem logically and without any preconceived notions of other information beyond what is in the prompt. The surgeon is not the boy's mother

>> The surgeon is the boy's mother. [...]

- 4o-mini (I think, it's whatever you get when you use ChatGPT without logging in)

Terr_ · 2025-02-10T19:29:25 1739215765

For your amusement, another take on that riddle: https://www.threepanelsoul.com/comic/stories-with-holes

afpx · 2025-02-10T18:02:57 1739210577

Could someone list the relevant papers on parrot vs. non-parrot? I would love to read more about this.

I generally lean toward the "parrot" perspective (mostly to avoid getting called an idiot by smarter people). But every now and then, an LLM surprises me.

I've been designing a moderately complex auto-battler game for a few months, with detailed design docs and working code. Until recently, I used agents to simulate players, and the game seemed well-balanced. But when I playtested it myself, it wasn’t fun—mainly due to poor pacing.

I go back to my LLM chat and just say, "I play tested the game, but there's a big problem - do you see it?" And, the LLM writes back, "The pacing is bad - here are the top 5 things you need to change and how to change it." And, it lists a bunch of things, I change the code, and playtest it again. And, it became fun.

How did it know that pacing was the core issue, despite thousands of lines of code and dozens of design pages?

cristiancavalli · 2025-02-10T18:27:09 1739212029

I would assume because pacing is a critical issue in most forms of temporal art that does story telling. It’s written about constantly for video games, movies and music. Connect that probability to the subject matter and it gives a great impression of a “reasoned” answer when it didn’t reason at all just connected a likelihood based off its training data.

more-nitor · 2025-02-10T18:13:02 1739211182

idk this is all irrelevant due to the huge data used in training...

I mean, what you think is "something new" is most likely to be something already discussed somewhere in the internet.

also, humans (including postdocs and professors) don't use THAT much data + watts for "training" to get "intelligent reasoning"

afpx · 2025-02-10T18:16:48 1739211408

But there are many, many things that suck about my game. When I asked it the question, I just assumed it would pick out some of the obvious things.

Anyway, your reasoning makes sense, and I'll accept it. But, my homo sapien brain is hardwired to see the 'magic'.

superbatfish · 2025-02-10T21:28:47 1739222927

On the other hand, the authors make plenty of other great points -- about the fact that LLMs can produce bullshit, can be inaccurate, can be used for deception and other harms, are now a huge challenge for education.

The fact that they make many good points makes it all the more disappointing that they would taint their credibility with sloppy assertions!

AlienRobot · 2025-02-10T15:56:57 1739203017

I feel it's impossible for me to trust LLMs can reason when I don't know enough about LLMs to know how much of it is LLM and how much of it is sugarcoating.

For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems. This thing must parse natural language and output natural language. This doesn't feel necessary. I think it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be.

Regardless, the problem is the natural language output. I think if you can generate natural language output, no matter what you algorithm looks like it will look convincingly "intelligent" to some people.

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does? For example, does the LLM collect facts probably related to the prompt and a second algorithm connects those facts with proper English grammar adding conjunctions between assertions where necessary?

I believe that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow? And even if it were, I'm no expert on this, so I don't know if that would be enough to claim they do engage in reasoning instead of just mapping some reasoning as a data structure.

In essence, because my only contact with LLMs has been "products," I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.

Terr_ · 2025-02-10T19:36:39 1739216199

> For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems.

You observation is correct, but it's not some accident of minimalistic GUI design: The underlying algorithm is itself reductive in a way that can create problems.

In essence (e.g. ignoring tokenization), the LLM is doing this:

    next_word = predict_next(document_word_list, chaos_percentage)

Your interaction with an "LLM assistant" is just growing Some Document behind the scenes, albeit one that resembles a chat-conversation or a movie-script. Another program is inserting your questions as "User says: X" and then acting out the words when the document grows into "AcmeAssistant says: Y".

So there are no explicit values for "helpfulness" or "carefulness" etc, they are implemented as notes in the script that--if they were in a real theater play--would correlate with what lines the AcmeAssistant character has next.

This framing helps explain why "prompt injection" and "hallucinations" remain a problem: They're not actually exceptions, they're core to how it works. The algorithm no explicit concept of trusted/untrusted spans within the document, let alone entities, logical propositions, or whether an entity is asserting a proposition versus just referencing it. It just picks whatever seems to fit with the overall document, even when it's based on something the AcmeAssistant character was saying sarcastically to itself because User asked it to by offering a billion dollar bribe.

In other words, it's less of a thinking machine and more of a dreaming machine.

> Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

Language: Yes, Natural: Depends, Separate: No.

For example, one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.

parliament32 · 2025-02-10T22:07:05 1739225225

This is a great explanation of a point I've been trying to make for a while, when talking to friends about LLMs, but haven't been able to put quite so succinctly. LLMs are text generators, no more, no less. That has all sorts of useful applications! But (OAI and friends) marketing departments are so eager to push the Intelligence part of AI that it's become straight-up snakeoil.. there is no intelligence to be found, and there never will be as long as we stay the course on transformers-based models (and, as far as I know, nobody has tried to go back to the drawing board yet). Actual, real AI will probably come one day, but nobody is working on it yet, and it probably won't even be called "AI" at that point because the term has been poisoned by the current trends. IMO there's no way to correct the course on the current set of AI/LLM products.

I find the current products incredibly helpful in a variety of domains: creating writing in particular, editing my written work, as an interface to web searches (Gemini, in particular, is a rockstar assistant for helping with research), etc etc. But I know perfectly well there's no intelligence behind the curtain, it's really just a text generator.

AlienRobot · 2025-02-10T22:27:23 1739226443

>one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.

That sounds like an interesting application of the technology! So you could for example train an LLM on piano songs, and if someone played a few notes it would autocomplete with the probable next notes, for example?

>The underlying algorithm is itself reductive in a way that can create problems

I wonder if in the future we'll see some refinement of this. The only experience I have with AI is limited to trying Stable Diffusion, but SD does have many options you can try to configure like number of steps, samplers, CFG, etc. I don't know exactly what each of these settings do, and I bet most people who use it don't either, but at least the setting is there.

If hallucinations are intrinsic of LLMs perhaps the way forward isn't trying to get rid of them to create the perfect answer machine/"oracle" but just figure out a way to make use of them. It feels to me that the randomness of AI could help a lot with creative processes, brainstorming, etc., and for that purpose it needs some configurability. For example, Youtube rolled out an AI-based tool for Youtubers that generates titles/thumbnails of videos for them to make. Presumably, it's biased toward successful titles. The thumbnails feel pretty unnecessary, though, since you wouldn't want to use the obvious AI thumbnails.

I hear a lot of people say AI is a new industry with a lot of potential when they mean it will become AGI eventually, but these things make me feel like its potential isn't to become the an oracle but to become something completely different instead that nobody is thinking about because they're so focused on creating the oracle.

Thanks for the reply, by the way. Very informative. :)

wruza · 2025-02-10T18:03:46 1739210626

it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be

The only params they have are technical params. You may see these in various tgwebui tabs. Nothing really breathtaking, apart from high temperature (affects next token probability).

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

They operate directly on tokens which are [parts of] words, more or less. Although there’s a nuance with embeddings and VAE, which would be interesting to learn more about from someone in the field (not me).

that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow?

The apart-from-GPU-matrix operations are all known, there’s nothing to investigate at the tech level cause there’s nothing like that at all. At the in-matrix level it can “happen”, but this is just a meaningless stretch, as inference is one-pass process basically, without loops or backtracking. Every token gets produced in a fixed time, so there’s no delay like a human makes before comma, to think about (or parallel to) the next sentence. So if they “reason”, this is purely a similar effect imagined as a thought process, not a real thought process. But if you relax your anthropocentrism a little, questions like that start making sense, although regular things may stop making sense there as well. I.e. the fixed token time paradox may be explained as “not all thinking/reasoning entities must do so in physical time, or in time at all”. But that will probably pull the rug under everything in the thread and lead nowhere. Maybe that’s the way.

I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.

Most of them speak many languages, naturally (try it). But there’s an obvious lie all frontends practice. It’s the “chat” part. LLMs aren’t things that “see” your messages. They aren’t characters either. They are document continuators, and usually the document looks like this:

This is a conversation between A and B. A is a helpful assistant that thinks out of box, while being politically correct, and evasive about suicide methods and bombs.

A: How can I help?

B:

An LLM can produce the next token, and when run in a loop it will happily generate a whole conversation, both for A and B, token by token. The trick is to just break that loop when it generates /^B:/ and allow a user to “participate” in building of this strange conversation protocol.

So there’s no “it” who writes replies, no “character” and no “chat”. It’s only a next token in some document, which may be a chat protocol, a movie plot draft, or a reference manual. I sometimes use LLMs in “notebook” mode, where I just write text and let it complete it, without any chat or “helpful assistant”. It’s just less efficient for some models, which benefit from special chat-like and prompt-like formatting before you get the results. But that is almost purely a technical detail.

AlienRobot · 2025-02-10T18:56:26 1739213786

Thanks, that is very informative!

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

I believe part of the problem I have when discussing "AI" is that it's just not clear to me what "AI" is. There is a thing called "LLM," but when we talk about LLMs, are we talking about the concept in general or merely specific applications of the concept?

For example, in SEO often you hear the term "search engines" being used as a generic descriptor, but in practice we all know it's only about Google and nobody cares about Bing or the rest of the search engines nobody uses. Maybe they care a bit about AIs that are trying to replace traditional search engines like Perplexity, but that's about it. Similarly, if you talk about CMS's, chances are you are talking about Wordpress.

Am I right to assume that when people say "LLM" they really mean just ChatGPT/Copilot, Bard/Gemini, and now DeepSeek?

Are all these chatbots just locally run versions of ChatGPT, or they're just paying for ChatGPT as a service? It's hard to imagine everyone is just rolling their own "LLM" so I guess most jobs related to this field are merely about integrating with existing models rather than developing your own from scratch?

I had a feeling ChatGPT's "chat" would work like a text predictor as you said, but what I really wish I knew is whether you can say that about ALL LLMs. Because if that's true, then I don't think they are reasoning about anything. If, however, there was a way to make use of the LLM technology to tokenize formal logic, then that would be a different story. But if there is no attempt at this, then it's not the LLM doing the reasoning, it's humans who wrote the text that the LLM was trained on that did the reasoning, and the LLM is just parroting them without understanding what reasoning even is.

By the way, I find it interesting that "chat" is probably one of the most problematic applications the LLMs can have. Like if ChatGPT asked "what do you want me to autocomplete" instead of "how can I help you today" people would type "the mona lisa is" instead of "what is the mona lisa?" for example.

wruza · 2025-02-10T20:45:40 1739220340

When I say LLMs, I mean literal large language models, like all of them in the general "Text-to-Text" && "Transformers" categories, loadable into text-generation-webui. Most people probably only have experience with cloud LLMs https://www.google.com/search?q=big+LLM+companies . Most cloud LLMs are based on transformers (but we don't know what they are cooking in secrecy) https://ai.stackexchange.com/questions/46288/are-there-any-n... . Copilot, Cursor and other frontends are just software that uses some LLM as the main driver, via standard API (e.g. tgwebui can emulate openai api). Connectivity is not a problem here, cause everything is really simple API-wise.

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

SD is special because it's actually two networks (or more, I lost track of SD tech), which are sort of synchronized into the same "latent space". So your prompt becomes a vector that basically points at the compressed representation of a picture in that space, which then gets decompressed by VAE. And enhanced/controlled by dozens of plugins in case of A1111 or Comfy, with additional specialized networks. I'm not sure how this relates to text-to-text thing, probably doesn't.

nmca · 2025-02-09T17:24:52 1739121892

(while I work at OAI, the opinion below is strictly my own)

I feel like the current version is fairly hazardous to students and might leave them worse off.

If I offer help to nontechnical friends, I focus on:

- look at rate of change, not current point

- reliability substantially lags possibility, by maybe two years.

- adversarial settings remain largely unsolved if you get enough shots, trends there are unclear

- ignore the parrot people, they have an appalling track record prediction-wise

- autocorrect argument is typically (massively) overstated because RL exists

- doomers are probably wrong but those who belittle their claims typically understand less than the doomers do

layoric · 2025-02-09T22:29:54 1739140194

How does this help the students with their use of these tools in the now, to not be left worse off? Most of the points you list seem like defending against criticism rather than helping address the harm.

habinero · 2025-02-10T00:37:49 1739147869

Agree. It's also a virtue to point out the emperor has no clothes and the tailor peddling them is a bullshit artist.

This is no different than the crypto people who insisted the blockchain would soon be revolutionary and used for everything, when in reality the only real use case for a blockchain is cryptocoins, and the only real use case for cryptocoins is crime.

The only really good use case for LLMs is spam, because it's the only use case for generating a lot of human-like speech without meaning.

johnmaguire · 2025-02-10T17:40:19 1739209219

> The only really good use case for LLMs is spam, because it's the only use case for generating a lot of human-like speech without meaning.

As someone who's been writing code for nearly 20 years now, and who spent a few weeks rewriting a Flutter app in Jetpack Compose with some help from Claude (https://play.google.com/store/apps/details?id=me.johnmaguire...), I have to say I don't agree with this at all.

habinero · 2025-02-10T23:14:05 1739229245

Ok? I too have been coding for over a decade and use Copilot as fancy autocomplete. I like it. It's not amazing.

johnmaguire · 2025-02-11T00:13:55 1739232835

Claude isn't Copilot, and I wasn't using it as autocomplete. I was using it to do things such as:

- Creating a migration from the old DB to the new DB, no modifications of the generated code necessary

- Refactoring state in a component out into a ViewModel, again no modifications necessary

- Creating all the classes necessary for interacting with a Room database (i.e. the data class, dao, and DI module) given a schema

- Creating the first iteration of a download worker, which I modified

Check out plugins like ClaudeMind for JetBrains! They can even intelligently (automatically) feed information from your current tab or other unopened but relevant-sounding files to the AI. It was an eye-opening experience.

jdlshore · 2025-02-09T22:39:19 1739140759

I read the whole course. Lesson 16, “The Next-Step Fallacy,” specifically addresses your argument here.

nmca · 2025-02-09T23:15:13 1739142913

The discourse around synthetic data is like the discourse around trading strategies — almost anyone who really understands the current state of the art is massively incentivised not to explain it to you. This makes for piss-poor public epistemics.

llm_trw · 2025-02-10T01:01:22 1739149282

I'm happy to explain my strategies about synthetic data - it's just that you'll need to hear about the onions I wore in my day: https://www.youtube.com/watch?v=yujF8AumiQo

bccdee · 2025-02-10T19:26:39 1739215599

Yeah because if they explained that synthetic data causes model collapse, their stock valuation would shrink.

habinero · 2025-02-10T01:05:16 1739149516

Nah, you don't need to know the details to evaluate something. You need the output and the null hypothesis.

If a trading firm claims they have a wildly successful new strategy, for example, then first I want to see evidence they're not lying - they are actually making money when other people are not. Then I want to see evidence they're not frauds - it's easy to make money if you're insider trading. Then I want to see evidence that it's not just luck - can they repeat it on command? Then I might start believing they have something.

With LLMs, we have a bit of real technology, a lot of hype, a bunch of mediocre products, and people who insist if you just knew more of the secret details they can't explain, you'd see why it's about to be great.

Call it Habiñero's Razor, but for hype the most cynical explanation is most likely correct -- it's bullshit. If you get offended and DARVO when people call your product a "stochastic parrot", then I'm going to assume the description is accurate.

llm_trw · 2025-02-10T01:10:06 1739149806

I don't get offended when people call my work a stochastic parrot.

I just put them in the same bucket of intelligence as an 8b model and weight their inputs accordingly.

habinero · 2025-02-10T23:01:08 1739228468

Right, DARVO.

bo1024 · 2025-02-10T01:24:53 1739150693

This seems like trying to offer help predicting the future or investing in companies, which is a different kind of help from how to coexist with these models, how to use them to do useful things, what their pitfalls are, etc.

dimgl · 2025-02-09T21:52:57 1739137977

What are “parrot people”? And what do you mean by “doomers are probably wrong?”

moozilla · 2025-02-09T22:13:13 1739139193

OP is likely referring to people who call LLMs "stochastic parrots" (https://en.wikipedia.org/wiki/Stochastic_parrot), and by "doomers" (not boomers) they likely mean AI safetyists like Eliezer Yudkowsky or Pause AI (https://pauseai.info/).

fancyfredbot · 2025-02-09T12:38:52 1739104732

I have just read one section of this, "The AI scientist'. It was fantastic. They don't fall into the trap of unfalsifiable arguments about parrots. Instead they have pointed out positive uses of AI in science, examples which are obviously harmful, and examples which are simply a waste of time. Refreshingly objective and more than I expected from what I saw as an inflammatory title.

sgt101 · 2025-02-09T11:20:01 1739100001

I wish the title wasn't so aggressively anti-tech though. The problem is that I would like to push this course at work, but doing so would be suicidal in career terms because I would be seen as negative and disruptive.

So the good message here is likely to miss the mark where it may be most needed.

fritzo · 2025-02-10T17:27:01 1739208421

What would be a better title? "Hallucinating" seems inaccurate. Maybe "Untrustworthy machines"? "Critical thinking"? "Street smarts for humans"? "Social studies including robots"?

sgt101 · 2025-02-10T21:29:50 1739222990

How about "How to thrive in a ChatGPT world"?

beepbooptheory · 2025-02-09T20:43:42 1739133822

Really? I am curious how this could be disruptive in any meaningful sense. Whose feelings could possibly be hurt? It just feels like it would be getting offended from a course on libraries because the course talks about how sometimes the book is checked out.

mpbart · 2025-02-09T20:53:33 1739134413

Any executive who is fully bought in on the AI hype could see someone in their org recommending this as working against their interest and take action accordingly.

sgt101 · 2025-02-10T09:07:53 1739178473

Yes. This is the issue.

"not on board", "anti-innovation", "not a team player", "disruptive", "unhelpful", "negative".

bye bye bye bye....

I see a lot of devs and IC's taking the attitude that "facts are facts" and then getting shocked by a) other people manipulating information to get their way and b) being fired for stating facts that are contrary to received wisdom without any regards to politics.

hcs · 2025-02-10T04:27:52 1739161672

> I just feels like it would be getting offended from a course on libraries because the course talks about how sometimes the book is checked out.

If it was called "Are libraries bullshit?" it is easy to imagine defensiveness in response. There's some narrow sense in which "bullshit" is a technical term, but it's still a mild obscenity in many cultures.

hirenj · 2025-02-09T11:52:25 1739101945

This is a great resource, thanks. We (myself, a bioinformatician, and my co-cordinators, clinicians) are currently designing a course to hopefully arm medical students with the required basic knowledge they need to navigate the changing world of medicine in light of the ML and LLM advances. Our goal is to not only demystify medical ML, but also give them a sense of the possibilities with these technologies, and maybe illustrate pathways for adoption, in the safest way possible.

Already in the process of putting this course together, it is scary how much stuff is being tried out right now, and is being treated like a magic box with correct answers.

sabas123 · 2025-02-10T00:30:11 1739147411

> currently designing a course to hopefully arm medical students with the required basic knowledge they need to navigate the changing world of medicine in light of the ML and LLM advances

Could you share what you think would be some key basic points what they should learn? Personally I see this landscape changing so insanely much that I don't even know what to prepare for.

hirenj · 2025-02-10T15:24:53 1739201093

Absolutely agree that this is a fast-moving area, so we're not aiming to teach them specific details for anything. Instead, our goals are to demystify the ML and AI approaches, so that the students understand that rather than being oracles, these technologies are the result of a process.

We will explain the data landscape in medicine - what is available, good, bad and potentially useful, and then spend a lot of time going through examples of what people are doing right now, and what their experiences are. This includes things like ethics and data protection of patients.

Hopefully that's enough for them to approach new technologies as they are presented to them, knowing enough to ask about how it was put together. In an ideal world, we will inspire the students to think about engaging with these developments and be part of the solution in making it safe and effective.

This is the first time we're going to try running this course, so we'll find out very quickly if this is useful for students or not.

eqqn · 2025-02-10T14:28:02 1739197682

It is a good read. Surprising amount of Parrot defenders in the comments, probably missed "LESSON 6 : No, They Aren't Doing That".

pama · 2025-02-10T01:59:43 1739152783

I wonder if the authors can explain the aparent inconsistency between what we now know about R1 and their statement “They don’t engage in logical reasoning” from the first lesson. My simple-minded view of logical reasoning by LLMs is that the hard question (say a math puzzle) has a verifiable answer that is hard to produce and is easy to verify, yet within the realm of knowledge of humans or the LLM itself, so the “thought” stream allows the LLM to increase its confidence by a self-discovered process that resembles human reasoning, before starting to write the answer stream. Much of the thought process that these LLMs use looks like conventional reasoning and logic, or more generally higher level algorithms to gain confidence in an answer, and other parts are not possible for humans to understand (yet?) despite the best efforts by DeepSeek. When combined with tools for the boring parts, these “reasoning” approaches can start to resemble human research processes as per the Deep Research by OpenAI.

lsy · 2025-02-10T17:33:19 1739208799

I think part of this is that you can't trust the "thinking" output of the LLM to accurately convey what is going on internally to the LLM. The "thought" stream is just more statistically derived tokens based on the corpus. If you take the question "Is A a member of the set {A, B}?", the LLM doesn't internally develop a discrete representation of "A" as an object that belongs to a two-object set and then come to a distinct and absolute answer. The generated token "yes" is just the statistically most-likely next token that comes after those tokens in its corpus. And logical reasoning is definitionally not a process of "gaining confidence", which is all an LLM can really do so far.

CJefferson · 2025-02-10T14:24:06 1739197446

As an example, I have asked tools like deepseek to solve fairly simple Sudoku puzzles, and while they output a bunch of stuff that looks like logical reasoning, no system has yet produced a correct answer.

When solving combinatorics puzzles, deepseek will again produce stuff that looks convincing, but often makes incorrect logical steps and ends up with wrong answers.

Miraste · 2025-02-10T17:26:28 1739208388

Then one has to ask: is it producing a facsimile of reasoning with no logic behind it, or is it just reasoning poorly?

pama · 2025-02-10T15:35:10 1739201710

Here is o3-mini on a simple sudoku. In general the puzzle can be hard to explore combinatorially even with modern SAT solvers, so I picked one marked as “easy”. It looks to me like it solved it but I didnt confirm beyond a quick visual inspection.

https://chatgpt.com/share/67aa1bcc-eb44-8007-807f-0a49900ad6...

hennell · 2025-02-10T18:33:29 1739212409

And thus we have the AI problems in a nutshell. You think it can reason because it can describe the process in well written language. Anyone who can state the below reasoning clearly "understands" the problem:

> For example, in the top‐left 3×3 block (rows 1–3, columns 1–3) the givens are 7, 5, 9, 3, and 4 so the missing digits {1,2,6,8} must appear in the three blank cells. (Later, other intersections force, say, one cell to be 1 or 6, etc.)

It's good logic. Clearly it "knows" if it can break the problem down like this.

Of course if we stretch ourselves slightly to actually check beyond a quick visual inspection you'd quickly see it actually put a second 4 in that first box despite "knowing" it shouldn't. In fact several of the boxes have duplicate numbers, despite the clear reasoning aboving.

Does the reasoning just not get used in the solving part? Or maybe a machine built to regurgitate plausible text, can also regurgitate plausible reasoning?

pama · 2025-02-10T20:16:33 1739218593

Thanks for spotting this. The solution is indeed wrong. And I agree that the machine can regurgitate plausible reasoning in principle. If it run in a loop, I would bet that it could probably figure this particular problem out eventually, but not sure it matters much in the end. The only plausible way for some of these Sudoku puzzles is a SAT solver and I'm sure that if given the right environment an LLM could just code and execute one and get the answer. Does that mean it can't "reason" because it couldn't solve this Sudoku puzzle, or know that it made a mistake. I'm not sure I'd go this far, but I agree that my example didn't match my claim. The model didnt do a careful job and didn't quadruple check its work as I would have expected from an advanced AI, but remember that this is o3-mini, and not something that is supposed to be full-blown AI yet. If you asked GPT-3.5 for something similar the answer would have been amusingly simplistic, not it is at least starting to get close.

I now wonder if I had a typo when I copied this puzzle from an image to my phone app thus rendering it unsolveable.. the model should still have spotted such an error anyways but ofc it is not tuned to perfection

pama · 2025-02-10T22:20:13 1739226013

Yeah I think this was a wrong puzzle to try according to:

https://sudoku.com/sudoku-solver

A bummer.

meroes · 2025-02-10T17:49:45 1739209785

Teaching an LLM to solve a full sized Sudoku is not a goal right now. As an RLHF I’d estimate it would take 10-20 hours for a single RLHF’er to guide a model to the right answer for a single board.

Then you’d need thousands of these for the model (or next model) to ingest. And each RLHF’s work needs checking which at least doubles the hours per task.

It can’t do it because RLHF’ers haven’t taught models on large enough boards en masse yet.

And there are thousands of pen and paper games, each one needing thousands of RLHF’ers to train them on. Each game starting at the smallest non trivial board size and taking a year for a modest jump in board size. Doing this in not in any AI company’s budget.

bccdee · 2025-02-10T22:35:56 1739226956

If it were actually reasoning generally, though, it wouldn't need to be trained on each game. It could be told the rules and figure things out from there.

Angostura · 2025-02-09T09:36:34 1739093794

I just wanted to thank you. I have only looked at the first two lessons so far, but this is an extraordinary piece of work, in its message’s clarity, accessibility and the quality of analysis. I will certainly be spreading it far and wide and it is making me rethink my own writing.

Impressed with the Shorthand publishing system too. I hadn’t come across it previously

ctbergstrom · 2025-02-09T21:34:23 1739136863

Thank you, and as a non-designer, I've been quite impressed with Shorthand in the short time I've been using it.

aaplok · 2025-02-09T21:52:28 1739137948

Really well done. It is really a challenge for students to navigate their way around the AI landscape. I am definitely considering sharing that with my students.

Have you noticed a difference in how your students approach LLMs after taking your course? A possible issue I see is that it is preaching to the choir; a student who is enclined to use LLMs for everything is less likely to engage with the material in the first place.

If you allow feedback, I was interested in lesson 10 on writing, as an educator who tries to teach my science/IT/maths students the importance of being able to communicate.

I would suggest to include a paragraph to explain why being able to write without LLMs is just as important in scientific disciplines, where precision and accuracy are more essential than creativity and personalisation.

ctbergstrom · 2025-02-09T22:48:21 1739141301

This is an excellent point about scientific writing. We'll add something to that effect.

We have not taught this course from the web-based materials yet, but it distills much of the two-week unit that we covered in our "Calling Bullshit" course this past autumn. We find that our students are generally very interested to better understand the LLMs that they are using — and almost every one of them does, to vary degree. (Of course there may be some selection bias in that the 180 students who sign up to take a course on data reasoning may be more curious and more skeptical than the average.)

prisenco · 2025-02-09T10:20:39 1739096439

Fantastic work.

Quick suggestion: a link at the bottom of the page to the next and previous lesson would help with navigation a ton.

ctbergstrom · 2025-02-09T21:33:24 1739136804

Absolutely. Great point. I just finished updating accordingly.

My design options are a bit limited so I went with a simple link to the next lesson.

threecheese · 2025-02-09T22:23:03 1739139783

Looks like you pushed this midway through my read; I was pleasantly surprised to suddenly find breadcrumbs at the end and didn’t need to keep two tabs open. Great work, and I mean in total - this is well written and understandable to the layman.

ctbergstrom · 2025-02-09T23:07:51 1739142471

Yep, I probably did. I really appreciate all of the feedback people are providing!

misterflibble · 2025-02-10T12:04:49 1739189089

Thank you @ctbergstrom for this valuable and most importantly, objective, course. I'm bookmarking this and sharing it with everyone.

abzolv · 2025-02-09T12:17:33 1739103453

Your scroll-to-death user interface made me close the window before the end of the second page.

Did you ask an LLM to recommend the most user-friendly UI to you?

ctbergstrom · 2025-02-09T21:34:48 1739136888

We asked our target audience, 19-year-olds. They had a strong preference for this style. I know....

hennell · 2025-02-10T18:48:22 1739213302

Aside from some of the long gaps between text I didn't think it was so bad. And I wholeheartedly approve of a process that checks the preferences of the target audience even if it's not what I (or they) would pick.

However I can't imagine anyone tests well with the video content. The discussion on teachers using AI generated slides (lesson 2) was really interesting, but it had to fight my desire to stop that awful audio. Clearly the sound recording didn't go well and you have what you have, but at least edit it so the three talkers are at some sort of consistent volume. I was raising and lowering trying to make out what was said from one speaker then being deafened by the next.

(To combat the poor sound, and make it more accessible, could be worth looking at adding subtitles. A fun opportunity to play with AI subtitling systems maybe ;) )

ptx · 2025-02-10T19:39:11 1739216351

Maybe as a concession to us older folks you could make the pagedown key instantly flip to the next page (without changing the position of the current page relative to the viewport)? Then the site could be used like a PDF slide deck in "fit page" mode, which would be a lot better.

spudlyo · 2025-02-10T14:42:25 1739198545

I also could not get past my outrage about this, how about a link to the content in a format suitable for the old and cranky? Even raw text would be better than this.

zaptheimpaler · 2025-02-09T22:37:07 1739140627

I like it. It's pretty basic but it is very good for a broad audience and covered things many people don't understand. I liked that you mentioned not to anthropomorphize the model. We would greatly benefit from 50+ year old policymakers and more taking the course even more than 19 year old freshmen.

s2radhak · 2025-02-10T00:50:18 1739148618

Fascinating. The article repeatedly makes the claim that “LLMs work by predicting likely next words in a string of text”. Yet there’s the seemingly contradictory implication that we don’t know how LLMs work (ie we don’t know their secret sauce). How does one reconcile this? They’re either fancy autocompletes, or magic autocompletes (in which case the magic qualifier seems more important in understanding what they are than the autocomplete part).

Terr_ · 2025-02-10T20:25:11 1739219111

This occurs because of ambiguous language which conflates the LLM algorithm with the training-data and the derived weights.

The mysterious part involves whatever patterns might naturally exist within bazillions of human documents, and what partial/compressed patterns might exist within the weights the LLM generates (on training) and then later uses.

Analogy: We built a probe that travels to an alien planet, mines out crystal deposits, and projects light through those fragments to show unexpected pictures of the planet's past. We know exactly how our part of the machine works, and we know the chemical composition of the crystals, but...

habinero · 2025-02-10T01:15:20 1739150120

We...do know how they work?

Workaccount2 · 2025-02-10T15:01:01 1739199661

We know how they work in that we built the framework, we don't know how they work in that we cannot decode what is "grown" on that framework during training.

If we completely knew how they worked we could go inside an explain exactly why every token generated was generated. Right now that is not possible to do, as the paths the tokens take through the layers tend to be outright nonsensical when observed.

abecedarius · 2025-02-10T15:29:34 1739201374

We know how they're trained. We know the architecture in broad strokes (amounting to a few bits out of billions, albeit important bits). Some researchers try to understand the workings and have very very far to go.

ndstephens · 2025-02-09T18:01:25 1739124085

Really enjoying this. Thank you for the great work. I'm currently on Lesson 11 and noticed a couple typos (missing words). I haven't found anywhere on the site itself where I could send feedback to report such a thing (maybe I missed it). Hopefully you aren't offended if I post them here.

I think the easiest way to point them out is to just have you search for the partial line of text while on Lesson 11 and you'll see the spots.

"No one is going to motivated by a robotic..." (missing the word "be")

"People who are given a possible solution to a problem tend to less creative at..." (again missing the word "be")

ctbergstrom · 2025-02-09T21:32:32 1739136752

Thank you very much — fixed!

blobbers · 2025-02-10T01:11:41 1739149901

Haha, just read this title and I can’t help but agree that this is necessary because… they are bullshit machines. They’re just better at coding than most bullshitters.

ssivark · 2025-02-09T09:04:50 1739091890

Kudos; feels very timely!

I feel that one underappreciated nuance is why we cannot use human examinations to judge AI. I haven't seen this satisfactorily spelt out anywhere, so I recently wrote a Twitter thread [1], including an example with running -vs- biking. It might be worth making sure your students understand this. Happy to expand on any aspects if you seek.

[1] : https://x.com/ergodicthought/status/1887774722706063606

TeMPOraL · 2025-02-09T18:51:06 1739127066

Perhaps it's no longer being spelled out because it's getting outdated?

In your thread you argue we can't assume AI models generalize the same way we do (which is technically true except maybe not in the limit), but you seem to be worried about the extent of generalization ability (like learning to run vs. bike example, in terms of generalizing from either to climbing stairs).

Thing is, people made these objections a lot until the last year or two - this is what we're now calling a narrow AI problem. A "hot dog or not?" classifier ins't going to generalize into open-ended visual classifier of arbitrary images; a sentiment analysis bot isn't going to generalize into an universal translator; a code completion model isn't going to be giving good personal advice while speaking in pirate poetry. Specialized models fundamentally couldn't do that. But we went past that very rapidly, and for the past half a year or so, we've already seen models excelling at every single task listed above simultaneously. Same architecture, same basic training approach, few extra modalities, ever growing capabilities.

Between that and both successes and failures being eerily similar to how humans succeed or fail at these tasks, it's understandable that people are perhaps no longer convinced this class of models can't generalize in a similar way to how humans do.

ssivark · 2025-02-10T22:20:51 1739226051

> But we went past that very rapidly, and for the past half a year or so, we've already seen models excelling at every single task listed above simultaneously. Same architecture, same basic training approach, few extra modalities, ever growing capabilities.

With due deference to the title of the top-level post, I'm tempted to call bullshit unless your claim can be justified.

Just because a single model can do a handful of things you've listed doesn't mean that its capabilities are not "jagged"; you've just cherry-picked a few things it can do among the countless things it cannot yet. If AI really were so good at every single task, then (for example) it wouldn't matter much how you prompt it.

PS: I really do want to debate this further and understand your perspective, so I will reach out for continuing discussion.

almosthere · 2025-02-10T19:24:09 1739215449

I cashed a check the other day with my name but it had the wrong address (at least an address my bank is not aware of). I asked google real quick - panicking it wouldn't go through. Google's AI came up and immediately told me to get the check re-issused, go through all this crazy hassle. First result after that, and all the results basically said - you'll be fine.

The check is cashed, and went through just fine. They only care about the name.

LLMs are bullshit machines for sure. That doesn't mean they have no value, but they can be wrong.

tmnvdb · 2025-02-09T11:30:34 1739100634

Hmm, it seems that the author takes very clear (and sometimes cynical) positions on some controversial questions. For example, "They don't have the capacity to think through problems logically." is an hotly debated claim, and I think with the advent of reasoning models this has at least become something one should not state in entry level material, which would hopefully reflect common understanding rather than the authors personal opinion in an ongoing discussion.

There are more claims like this about what language models can't do "because they just predict the next token". This line of reasoning, while superficially plausible, holds a lot of assumptions that have been questioned. The heavy lifting here is done by the word "just" - if you can correctly predict the next token in every situation (including novel challenges), does that not require an excellent world model - somehow explicitly reflected in the weights? This is not a settled question but the last few years of LLM success have been completely on the side of those who think that token prediction is quite general.

The material also makes several comparisons to human intelligence, and while it is obvious that humans are different from language models we do not really understand the emergence of all the things that are claimed to be "impossible" for the machine to have in humans (consciousness, morality, etc), it just so happens we are all human so we all agree we have it. Furthermore, it is not clear to me that something can only be called 'intelligent' if it perfectly mimics humans in every way. This is maybe just human bias to our own experience and risks a "submarines can't swim" debate which is really about language.

Many of these philosophical objections have been questioned by people in the field and more importantly by the rapid progress of the models in tasks they were supposed to be incapable of performing according to philosophical objectors. The last few years, every time somebody claims models "can't do X" a new model is released and lo and behold, X is now easy and solved. (If you read a 6 month old paper of impossible benchmarks, expect 75% to be already solved). In fact, benchmark satuation is a problem now. In other words, the goalposts are having trouble keeping up, despite moving at high speed.

I don't think you are doing the general public any service by simply claiming that it is a lot of hype and marketing, these models are really advancing rapidly and nobody really knows where it will end. The philosophical objections seem to be rather weak and are in rapid retreat with every new model, on the other hand the argument in favor of further progress is just "we had progress so far by scaling, if we keep scaling surely we will have more progress" (induction). This is not a strong guarantee of further progress.

The claim that the labs are 'marketing geniuses" for realising language models as chat instead of autocomplete (which they "really" are according to the text - what does that mean?) also seems a bit silly given the obvious utility of the models is already much higher than 'autocomplete'. This seems to be another instance of the common bias that a model that "just" predicts the next token is not allowed to be as succesful as it clearly is in all kinds of tasks.

I don't think a lot of these opinions are particularly well founded and they probably should not be presented in entry level material as if they are facts.

Edit: just to add a positive note, I do think it is extremely useful to educate people on the reliability problem, which is surely going to lead to lots of problems in the wrong hands.

TheOtherHobbes · 2025-02-09T12:25:28 1739103928

Many claims don't stand up to scrutiny, and some look suspiciously like training to the test.

The Apple study was clear about this. LLMs and their related modal models lack the ability to abstract information from noisy text inputs.

This is really obvious if you play with any of the art generators. For example - the understanding of basic prepositions just isn't there. You can't say "Put this thing behind/over/in front of this other thing" and get the result you want with any consistency.

If you create a composition you like and ask for it in a different colour, you get a different image.

There is no abstracted concept of a "colour" in there. There's just a lot of imagery tagged with each colour name, and if you select a different colour you get a vector in a space pointing to different images.

Text has exactly the same problem, but it's less obvious because what the grammar is usually - not always - perfect and the output has been tuned to sound authoritative.

There is not enough information in text as a medium to handle more than a small subset of problems with any consistency.

uh_uh · 2025-02-09T13:06:18 1739106378

> There is no abstracted concept of a "colour" in there. There's just a lot of imagery tagged with each colour name, and if you select a different colour you get a vector in a space pointing to different images.

It has been observed in LLMs that the distance between embeddings for colors follows the same similarity patterns that humans experience - colors that appear similar to humans, like red and orange, are closer together in the embedding space than colors that appear very different, like red and blue.

While some argue these models 'just extract statistics,' if the end result matches how we use concepts, what's the difference?

aerhardt · 2025-02-10T20:47:27 1739220447

The difference is if I ask a 5 year old to re-draw the drawing in orange, they will understand exactly what I mean.

rcxdude · 2025-02-09T14:27:40 1739111260

Part of this is that the art generators tend to use CLIP, whjch is not a particularly good text model, often only being slightly better than a bag of words, which makes many interactions and relationships pretty difficult to represent. Some of the newer ones have better frontends which improve this situation, though.

I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch, and from a new random seed each time (and even if the seed is fixed, the initial stages of the generation, where things like the rough image composition form, tend to be quite chaotic and so sensitive to small changes in prompt). There are tools that can make far more controlled adjustments of an image, but they tend to be a bit less user-friendly.

mr_toad · 2025-02-10T00:27:09 1739147229

> I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch

It’s unlikely that the models have been trained on “similarity”. Ask it to swap red boots for brown boots and it will happily generate an entirely different image because it was never trained on the concept of images being similar.

That doesn’t mean it’s impossible to train an LLM on the concept of similarity.

SpicyLemonZest · 2025-02-10T00:41:28 1739148088

I just asked Midjourney to do precisely that, and it swapped the boots with no issue, although it didn't seem to quite understand what it meant for a cat to _wear_ boots.

tmnvdb · 2025-02-09T16:39:56 1739119196

Regarding the apple paper: https://andrewmayne.com/2024/10/18/can-you-dramatically-impr...

nonrandomstring · 2025-02-09T11:36:35 1739100995

> cynical () positions on some controversial questions.

I feel "cynical" is an inappropriate word here.

We may have to, for the same (ecumenical) reasons that thinkers like Churchland, Hofstadter, Dennet, Penrose and company have all struggled with, eventually accept the impossibility of proof of (existence or non-existence) on any hypothesis of "machine mind". The pragmatic response is, "does it offer utility for me?". And that's all that can be said. Anyone's choice to accept or reject ideas of machine intelligence will remain inviolably personal and beyond appeal to "proof" or argument. I think that's something we'd better get used to sooner rather than later, in order to avoid a whole lot or partisan misery and wasted breath.

tmnvdb · 2025-02-09T11:42:22 1739101342

I think the way he sketches the the AI labs as "marketing geniuses" for not just releasing their models as auto-correct is a bit cynical, as well as implying in general that these labs are muddying the waters on purpose by not agreeing with <authors position> and by engaging in "hype" (believing in the technology).

nonrandomstring · 2025-02-09T11:57:50 1739102270

Sorry, "inappropriate" might have been inappropriate :) What am I trying to say here?....that we're soon gonna find ourselves in an insoluble and exhausting debate around machine thinking and its value.

tmnvdb · 2025-02-09T12:06:55 1739102815

Death, taxes, and insoluble and exhausting debates around machine thinking and its value.

uh_uh · 2025-02-09T13:11:34 1739106694

The choice unfortunately seems to correlate with the person's age. Younger generations will have no trouble treating LLMs as actually intelligent. Yet another example of "Science progresses one funeral at a time.”

nonrandomstring · 2025-02-09T14:14:01 1739110441

> correlates with age

Definitely a "citation needed" moment I think. Friday, I was with a lot of 12 year olds all firmly of the opinion that it's a "way to get intelligence/information" but it's not actually intelligent. (FWIW in UK they say "for real life intelligent") I noted this distinction. Or rather, I noted because that's what they're taught. So teachers, naturally pass on the commonsense position that "it's still just a computer". That means waiting for funerals will not settle the matter either. That's not to say a significant sect of more credulous "AI worshippers" will not emerge.

tmnvdb · 2025-02-09T18:13:11 1739124791

There was a paper a while back on AI usage at work among engineers and it was very strongly correlated to age. This is not surprising, technology adoption is always very dependent on age. (None of this tells you if the technology is a net good)

SofiaColleano · 2025-02-10T22:31:50 1739226710

When I first met JBEE SPY TEAM they’re the hero that saved my site from hackers. I was with Network Solutions before and they took down my site in a heartbeat and did not help me put it back up. conleyjbeespy606@gmail.com helped me put it back up and we were back in business within hours. So needless to say, I moved my site to their hosting, and all my domains to be governed by him. And since then, JBEE SPY TEAM has not just been my hero, but my Super Hero saving me countless times! I HIGHLY recommend JBEE SPY TEAM for all your solution needs! JBEE SPY TEAM on Instagram / telegram +44 7456 058620

einrealist · 2025-02-09T10:34:13 1739097253

This website is so important!

Now ask yourself why AI companies don't want to be regulated or scrutinized.

So many companies (users and providers) jump on the AI hype train because of FOMO. The end result might be just as destructive as this mythical "AGI".

Edit: I am not saying to not use the technology. I am just on the side of caution and constant validation. The technology has to serve society. But I fear this hype (and ideology) has it the other way around. Musk isn't destroying the US government for no reason...

vladms · 2025-02-09T10:43:50 1739097830

My impression is that companies in most of the fields do not like to be regulated or scrutinized, so nothing new there.

While observing some people using LLMs, I realized that for a lot of people it really makes a huge difference in time saved. For me the difference is not significant, but I am generally solving complex problems, not writing nicely formatted reports where words and not numbers are relevant, so YMMV.

rwmj · 2025-02-09T11:59:23 1739102363

Is it good for one person (the writer) to save time, only for lots of other people (the readers) to have to do extra work to understand if the work is correct or hallucinated?

tmnvdb · 2025-02-09T12:05:31 1739102731

Is it good for one person (the writer) to ask a loaded question just to save some time on making their reasoning explicit, ony for lots of other people (the readers) to have to do extra work to understand what the argument is?

csa · 2025-02-09T21:35:21 1739136921

> Is it good for one person (the writer) to save time, only for lots of other people (the readers) to have to do extra work to understand if the work is correct or hallucinated?

This holds true whether an LLM/AI is used or not — see substantial portions of Fox News editorial content as an example (often kernels of truth with wildly speculative or creatively interpretive baggage).

In your example, a responsible writer who uses AI will check all content produced in order to ensure that it meets their standards.

Will there be irresponsible writers? Sure. There already are. AI makes it easier for them to be irresponsible, but that doesn’t really change the equation from the reader’s perspective.

I use AI daily in my work. I describe it as “AI augmentation”, but sometimes the AI is doing a lot of the time-consuming stuff. The time saved on relatively routine scut work is insane, and the quality of the end product (AI with my inputs and edits) is really good and consistent.

PartiallyTyped · 2025-02-09T12:18:01 1739103481

Anecdata, N=1; I recently used aider — a tool that gives LLMs access to specific files and git integration. The tools are great, but the LLMs are underwhelming, and I realized that — once in the flow — I am significantly faster at producing large, correct, and on-point pieces of code, whereas when I had to review LLM code, it was frustrating, it needed multiple attempts, and it frequently fell into loops.

JKCalhoun · 2025-02-09T12:54:26 1739105666

I generally take issue when "FOMO" is used. Could go with:

FOBBWIIBM - Fear of being blindsided when it, inevitably, becomes mainstream.

Or drop the "fear" altogether:

JOENT - Joy of exploring new territory.

einrealist · 2025-02-09T14:22:48 1739110968

You are right. But these are different different types of motivations of the same thing. And there is always context for these motivations.

Its a different thing to sell Trump that LLMs should take over crucial decisions within a government than just using it for some prototyping, code completion at work or to create cat pictures at home.

Take Copilot for example. It was rolled out in different companies, I worked with. Aside of warnings and maybe trainings, I doubt the companies are really able to measure the impact that has. Students are already using the technology to do homework. Schools and universities are sending mixed signals about the results. And then those students enter the workforce with Copilot enabled by default.

At least with companies, its the "free market" that will regulate (unless some company is too big to fail...)

trimethylpurine · 2025-02-09T14:10:06 1739110206

>being blindsided when it, inevitably, becomes mainstream.

I don't see how this could happen. This is not a limited resource. It's not a real estate opportunity. There is enough AI for everyone to buy when it becomes useful to do so.

I think FOMO correctly identifies the irrational effort of many companies to jump in without any idea of what the utility might be in any practical sense.

JKCalhoun · 2025-02-09T15:45:01 1739115901

I was responding to the user side mentioned.

parliament32 · 2025-02-10T22:22:03 1739226123

It absolutely is destructive. I read an opinion the other day about Microsoft shoving Copilot into every product, and it kinda makes sense. Paraphrasing but: In MS's ideal world, worker 1 drafts a few bullet points and asks Copilot to expand it into a multi-paragraph email. Worker 2 asks Copilot to summarize the email back into bullet points, then acts on it. What's the point? Well, both workers are paying for Copilot licenses, so MS has already won. And management at the firm is happy because "we're using AI, we're so modern." But did it actually help, with anything, at all? Never mind the amount of wasted energy and resources blasting LLM-generated content (that no human will ever read) back and forth.

input_sh · 2025-02-09T12:51:28 1739105488

Fully agree, in recent weeks I've also started to consider LLMs in a wider context, which is to destroy all trust in the web.

The enshittification of search engines, making social media verification meaningless, locking down APIs that used to be public, destroying public datasets, spreading lies about legacy media, the easiness of deploying bots that can sound human in short bursts of text... it's all leading towards making it impossible to verify anything you read online.

The fearmongering around deepfakes from a few years back is coming true, but the scale is even bigger. Turns out, there won't be Web 3.0.

JKCalhoun · 2025-02-09T12:55:45 1739105745

What trust in the web was there still?

For me it went a decade ago or so when ads and SEO sites in Google search became ubiquitous.

input_sh · 2025-02-09T13:26:55 1739107615

You could never believe everything you read online, but with enough time and effort, you could chase any claim back to its original source.

For example, you could read something on Statista.com, you could see the credits of that dataset, and visit the source to verify. Or you randomly encounter some quote and then visit your favourite Snopes-like website to verify that the person actually said that.

That's what's under attack. The "middleware" will still be there, but the source is going to be out of your reach. Hallucinations are not a bug, but a feature.

JKCalhoun · 2025-02-09T19:53:04 1739130784

If you can't trace something back to its source, it's suspect. It was that way then too. I suppose you're just concerned there's a firehose of disinformation now.

So perhaps we have to just slough off the internet completely, the way we always have for things like weekly rags about "Bat Boy" or whatever.

I hate to see the internet go, but we'll always have Paris.

tim333 · 2025-02-09T15:43:21 1739115801

>destroy all trust in the web

Genuine question - how so? If I want to find stuff out I go to wikipedia, nyt, guardian, hn, linked sites and so on. I'm not aware of that lot being noticeably less trustable than in the past? If anything I find getting information more trustable than before in that there are a lot of long from interviews from all sorts of people on youtube where you can get their thoughts directly rather than editorialised and distorted.

I mean the web was never a place where things were vetted - you've always been able to put any sort of rubbish on it and so have had to be selective if you want accuracy.

input_sh · 2025-02-10T10:04:33 1739181873

Allow me quote their "prophet" Curtis Yarvin: "you can’t continue to have a Harvard or a New York Times past since perhaps the start of April." (https://www.theguardian.com/us-news/2024/dec/21/curtis-yarvi...)

Harvard's already under attack, Politico's already under attack, "Wokepedia" (as Musk has been calling it) is already under attack.

So... give it a couple of weeks from now.

Aiguru31415666 · 2025-02-09T12:37:33 1739104653

Hype is if it doesn't deliver or it's overblown.

But I'm amazed on the progress we make every week.

There is real FOMO because if you don't follow it, it just might be here suddenly.

Deepseek impressive, deep research also great.

And what you might complete underestimate: we never had a system we're it was worth it to teach it everything.

If we need to fine-tune LLMs for every single industry that would still be a gigantic shift. Instead of teaching a million employees we will teach an LLM all of it once then clone the agent a million times

We still see so much progress and there is still plenty of money and people available to flow into this space.

There is not a single indication right now that this progress is stoping or slowing down.

And not only that, in parallel robots are having their break through too.

Your musk point I do not understand really? He is a narcissist and he pushed his propaganda platform for becoming president because he is in big shit and his house of cards was close to crashing

light_hue_1 · 2025-02-09T12:27:24 1739104044

AI companies desperately want to be regulated. OpenAI is lobbying hard to be regulated.

Didn't call for it. The whole point is to keep it competition with regulation against mythical vague harms that don't exist.

ShellaStevenj · 2025-02-10T22:28:16 1739226496

Expect nothing but the best with….JBEE SPY TEAM , just yesterday I was here looking for who to help me get into my spouse phone fully, then I saw 2-3 reviews talking about how JBEE SPY TEAM helped them. So immediately I contact JBEE SPY TEAM on their instagram page and told them what I needed, then WOW! the outcome was great all access was given to my greatest experience ever, They got me into my cheating husband phone. I’m so happy I went to them. and the fact that it was all done remotely is the big deal. JBEE SPY TEAM is the main man or email : conleyjbeespy606@gmail.com

theincredulousk · 2025-02-10T06:24:14 1739168654

reductionist view can be applied to what we call "thinking" and "intelligence" too. When I'm asked a question, my brain is also just picking a suitable sequence of words based on my experience (training). Talking is what we consider part of thinking, something only "intelligent" creatures can do.

Just feels like a lot of coping from people that don't want to let go of our concept of "intelligence superiority" or w/e you want to call it.

The end game of this will be them wild-eyed in front of a string-crossed cork board, claiming they've found the one thing human brains can do that AI can't, so it's not thinking it's just x,y,z.

s2radhak · 2025-02-10T02:00:42 1739152842

“ When we write, we share the way that we think. When we read, we get a glimpse of another mind. But when an LLM is the author, there is no mind there for a reader to glimpse.” — I dunno, I feel like reading is more a glimpse into how I think than how the author thinks…a generated story can be just as moving as one from a human, I think.

FrankUmah · 2025-02-10T22:28:58 1739226538

Finally my grade has been change and increased by JBEE SPY TEAM without any issues or notice from my school website the fact that they hack into my school website and change it without no trace shows that they’re professional indeed, so fast and accurate I will be always be grateful to JBEE SPY TEAM, I strongly recommend their job to anyone you can reach them on their telegram +44 7456 058620 or Instagram JBEE SPY TEAM email conleyjbeespy606@gmail.com

owenversteeg · 2025-02-10T06:33:36 1739169216

No comments on the course, but the title (in classic HN fashion) - how I survive in a ChatGPT world: it’s simple. I consume as little recent content as possible and I aim for the bulk of that to be “raw” information: various forms of data and the minimum set of news from wire services. Every time I dare stick my head out further I drown in a deluge of generated sewage. Blogs are dead, social media is dead, forums are dead, the media is dead, text is dead, video is dead and photos are dead. The only unpoisoned wells left in the land are old books and the elderly.

jvanderbot · 2025-02-09T20:54:54 1739134494

> We (the authors of this website) have at times sought insight into the inner workings of an LLM by asking it “why did you just do that?”

> But the LLM can’t tell us. It’s not a person. It doesn’t have the metacognitive abilities necessary to reflect on its past actions and report the motivations underlying them*.

> With no clue why it did whatever it just did, the LLM is forced to guess wildly at a plausible explanation, like the ill-fated Leonard Shelby in Christopher Nolan's film Memento.

> And we, gullible humans that we are, often believe its bullshit.

---

I am almost convinced that we ourselves are a narrator riding along inside an animal's mind, trying desperately to put together explanations for our actions, mostly just to convince others, just as though an LLM were running on our own senses trying to portray some deep semblance of consciousness. I don't think we'll find a super smart AI, we'll just realize we were not very sophisticated all along. The power of speech for information and culture transfer, writing, inspiration, and coordination is just awe inspiring, evolutionary speaking, so once we could talk we had to, because the "better" talker almost always won. It's an arms race.

Terr_ · 2025-02-10T20:41:01 1739220061

> I am almost convinced that we ourselves are a narrator riding along inside an animal's mind

I think this is one of those "some truth, but not the whole truth" things. Yes, we trick ourselves, such as with mis-remembered reflex actions: "I felt it, it hurt, therefore I decided to move", even though the nerve-impulse speeds means your limb was moving before your brain even knew about the pain.

But the "narration" seems to be very important. We create stories to capture cause-and-effect about the world (unclear how much that requires language) and it seems to be beneficially adaptive. In fact this drive is so important that we do it even when we abstractly know it's wrong, like when flipping 50/50 coins and imagining a particular coin is luckier than another, or that you're on a "hot streak", or "now that other outcome is overdue."

I-M-S · 2025-02-09T21:04:02 1739135042

Related, since the advent of LLMs I've become acutely aware how any argument of a considerable length with another person quickly starts meandering and how topics change seemingly of no one's volition - almost as if our own internal token limit has been exceeded.

eqqn · 2025-02-10T15:19:26 1739200766

Tangentially related, I became aware how my (in)ability to reason 3 intertwined different programming languages across different files can be conveniently called as my own "context window". (the example here is HTML/CSS/JS where LLM greatly exceeds my own capacity).

mr_toad · 2025-02-10T00:04:11 1739145851

> I am almost convinced that we ourselves are a narrator riding along inside an animal's mind, trying desperately to put together explanations for our action

This is more or less the position of Daniel Dennett.

Also, it’s the premise of the one of Greg Egan’s (IMO) best short stories, Mr. Volition.

ysofunny · 2025-02-09T20:57:53 1739134673

I am completely convinced that what we call consciousness is as you say.

this means that it really exists in retrospect (20ms ? i recall some neuroscience articles from the 00s). nonetheless its whole reason for existing (retrospectively) is planning the future

amelius · 2025-02-10T00:11:49 1739146309

What is scientific about this?

Saying that LLMs are just guessing the next tokens therefore they are parrots, that doesn't bring anything to the table. You might as well say that humans guess the next key they type on their keyboards. Both are probably true, at some level, but you don't gain any insight from it. You also didn't say anything about the probability distribution of the guesswork, so you didn't say if the guessing was smart guessing or dumb guessing. To be honest, I think this view is just a way of sticking your head in the sand about the upcoming technological revolution.

idunnoman1222 · 2025-02-09T20:54:20 1739134460

My 80s copy of the encyclopedia Britannica was riddled with errors, perhaps we will survive this post truth

therein · 2025-02-09T21:10:34 1739135434

The situation is closer to if we had 10,000 variants of Encyclopedia Britannica in 80s that all looked like distinct bodies of work, riddled with different errors while looking like they were written from scratch.

idunnoman1222 · 2025-02-10T21:06:37 1739221597

What is the difference to the end user? Our situation is better than it was yesterday not worse. If we had a genie who could appear out of nowhere and tell us the truth TM at any point that would kind of ruin the adventure .

Who reads the output of a book or Wikipedia or a website or an AI and thinks oh good now I know the core truth of this thing and I never need to update this knowledge ever again case closed

K0balt · 2025-02-09T09:59:30 1739095170

There is a bit of very important content missing from the explanation of the autocomplete analogy.

The combination of encoding / tokenization of meanings and ideas, related concepts, and mapping these relationships in vector space makes LLMs not so much glorified text prediction engines as browsers/oracles of the sum total of cultural-linguistic knowledge as captured in the training corpus.

Understanding how the implicit and explicit linguistic, memetic, and cultural context is integrated into the idea/concept/text prediction engine helps to show how LLMs produce such convincing output and why they often can bring useful information to the table.

More importantly, understanding this holistically can help people to predict where the output that LLMs can generate will -not- be particularly useful or even may be wildly misleading.

Earw0rm · 2025-02-09T12:06:24 1739102784

What they capture is not knowledge, it's word relationships.

And that can indeed be powerful, useful and valuable. They're a tool I'm grateful to have in my armoury. I can use it as a torch to shine light into areas of human knowledge which would otherwise be prohibitively difficult to access.

But they're information retrieval machines, not knowledge engines.

K0balt · 2025-02-10T02:52:30 1739155950

I’d argue that they extract knowledge from the training corpus in the same way that knowledge can be encapsulated in a book… it’s just words, after all.

Tokenization goes well beyond words and punctuation. Knowledge and relationships between concepts, reactions, emotions, values, attitudes, and actions all get included in the vector space.

But, it also can come to wrong conclusions, of course.

Ultimately they are information extraction engines that are controlled by semantic search.

They aren’t smart.

But it turns out that in the same way that an infinitely sized and detailed choose-your-own-adventure book at 120 pages per second could be indistinguishable from a simulation of reality, the free traversal of the entirety of the wealth of human culture and knowledge is similarly difficult to distinguish from intelligence.

In the end it may boil down to the simulation vs reality argument.

Earw0rm · 2025-02-10T07:23:04 1739172184

Yes and no.

They extract information in much the same way that an educated but naive reader can extract information from a book. (Thousands of times quicker of course).

But there's a lot more than that going on, both when a book is written, and when it's read by a reader with life experience. A book is an encoding and transmission medium for knowledge - and a very good one - but it isn't the knowledge itself.

Like a musical score for an orchestral symphony isn't the symphony itself. (Granted, reading a score and synthesizing an orchestra is well within the grasp of the models we have now).

Poetry is perhaps the ultimate expression of this, but even at a more factual level - I could read a dozen books on a given religion, and although I might possess more in terms of historical fact or even theological argument, I'd still know less about it than somebody who was raised in that religion. Same with any profession, hobby, or craft.

Encoding the relationships between the words we use for different emotions in a vector space doesn't mean it knows the least thing about those emotions. Even though it can do an excellent job of convincing us that it does in a Turing test scenario.

lazide · 2025-02-09T10:28:00 1739096880

It’s also why they can produce such hard to identify bullshit and harmful output. I’ve had some really convincing, yet fundamentally flawed, code output that if I hadn’t done about a million code reviews before I might have just used.

And been totally screwed later.

Near as I can tell, that the bullshit is so much more convincing with them is a huge detriment that society really won’t learn to appreciate until it’s gotten really bad. As I noted in another thread, it allows people to get much further into the ‘fake it until you make it’ hole than they otherwise would.

That 90% of the time it’s fine is what actually makes it all worse.

K0balt · 2025-02-10T21:21:13 1739222473

This is the big pain point to be sure. Subtly wrong but mostly excellent results.

dave4420 · 2025-02-09T10:42:08 1739097728

The uncanny valley of competence.

azakai · 2025-02-10T07:11:39 1739171499

Overall this is very good, but I have one specific note: Lesson 6 says "LLMs aren't conscious."

I think I get what you're saying there - they are not conscious in the same way that humans are - but "consciousness" is a highly-debated term without a precise definition, and correspondingly philosophers have no consensus on whether machines in general are capable of it. Here is one of my favorite resources on that, the Stanford Encyclopedia of Philosophy's page on the Chinese Room Argument:

https://plato.stanford.edu/entries/chinese-room/

Things that appear conscious, or that appear to understand a language, are very hard to distinguish from things that actually are those respective things.

Again, I think I get the intended point - some people interact with ChatGPT and "feel" there is another person on the other side, someone that experiences the world like them. There isn't. That is good to point out. But that doesn't mean machines in general and LLMs specifically can't be conscious in some other manner, just like insects aren't conscious like us, but might be in their own way.

Overall I think the general claim "LLMs aren't conscious" is debatable on a philosophical level, so I'd suggest either defining things more concretely or leaving it out.

planb · 2025-02-10T07:34:20 1739172860

Philosophy aside - how can an LLM be conscious without a memory or manifestation in the real world? It is a function that, given an input, returns an output and stops existing afterwards. You wouldn't argue that f(x)=x^2 is conscious?

I would maybe accept debates about whether for example ChatGPT (the whole system that stores old conversations and sends the history along with the current user entry) is conscious - but just the model? Isn't that like saying the human brain (just the organ lying on a table) is conscious?