Hacker News new | past | comments | ask | show | jobs | submit login
Modern-Day Oracles or Bullshit Machines? How to thrive in a ChatGPT world (thebullshitmachines.com)
847 points by ctbergstrom 1 day ago | hide | past | favorite | 539 comments
Jevin West and I are professors of data science and biology, respectively, at the University of Washington. After talking to literally hundreds of educators, employers, researchers, and policymakers, we have spent the last eight months developing the course on large language models (LLMs) that we think every college freshman needs to take.

https://thebullshitmachines.com

This is not a computer science course; it’s a humanities course about how to learn and work and thrive in an AI world. Neither instructor nor students need a technical background. Our instructor guide provides a choice of activities for each lesson that will easily fill an hour-long class.

The entire course is available freely online. Our 18 online lessons each take 5-10 minutes; each illuminates one core principle. They are suitable for self-study, but have been tailored for teaching in a flipped classroom.

The course is a sequel of sorts to our course (and book) Calling Bullshit. We hope that like its predecessor, it will be widely adopted worldwide.

Large language models are both powerful tools, and mindless—even dangerous—bullshit machines. We want students to explore how to resolve this dialectic. Our viewpoint is cautious, but not deflationary. We marvel at what LLMs can do and how amazing they can seem at times—but we also recognize the huge potential for abuse, we chafe at the excessive hype around their capabilities, and we worry about how they will change society. We don't think lecturing at students about right and wrong works nearly as well as letting students explore these issues for themselves, and the design of our course reflects this.






This is amazing!

I was speaking to a friend the other day who works in a team that influences government policy. One of the younger members of the team had been tasked with generating a report on a specific subject. They came back with a document filled with “facts”, including specific numbers they’d pulled from a LLM. Obviously it was inaccurate and unreliable.

As someone who uses LLMs on a daily basis to help me build software, I was blown away that someone would misuse them like this. It’s easy to forget that devs have a much better understanding of how these things work, can review and fix the inaccuracies in the output and tend to be a sceptical bunch in general.

We’re headed into a time where a lot of people are going to implicitly trust the output from these devices and the world is going to be swamped with a huge quantity of subtly inaccurate content.


This is not something only younger people are prone to. I work in a consulting role in IT and have observed multiple colleagues aged 30 and above use LLMs to generate content for reports and presentations without verifying the output.

Reminded me of wikipedia-sourced presentations in high school in the early 2000s.


I made the same sort of mistake with the internet being young back in 93! Having a machine do it for you can easily turn into brain switch off.

I keep telling everyone that the only reason I'm paid well to do "smart person stuff" is not because I'm smart, but because I've steadily watched everyone around me get more stupid over my life as a result of turning their brain switch off.

I agree a course like this needs to exist, as I've seen people rely on chatGPT for a lot of information. Just yesterday I demonstrated with some neighbors about how easily it could spew bullshit if you sinply ask it leading questions. A good example is "Why does the flu inpact men worse than women"/"Why foes the flu impact women worse than men". You'll get affirmative answers for both.


If men are more likely to die from flu if infected, and women more likely to be infected, an affirmative answer to both questions could be reasonable. When you take into account uncertainty about the goals, knowledge and cognitive capacity of the person asking the question, it's not obvious to me how the AI ought to react to an underspecified question like this.

Edit: When I plug this into a temporary chat on o3-mini, it gives plausible biochemical and behavioral mechanisms that might explain a gender difference in outcomes. Notably, the mechanisms it proposes are the same for both versions of the question, and the framing is consistent.

Specifically, for the "men worse than women" and "women worse than men" questions, it proposes hormone differences, X-linked immune regulatory genes, and medical care-seeking differences that all point toward men having worse outcomes than women. It describes these factors in both versions of the question, and in both versions, describes them as explaining why men have worse outcomes than women.

It doesn't specifically contradict the "women have worse outcomes than men" framing. But it reasons consistently with the idea that men have worse outcomes than women either way the question is posed.


What I find frightening is how many are willing to take LLM output at face value. An argument is won or lost not on its merits, but by whether the LLM say so. It was bad enough when people took whatever was written on Wikipedia at face value, trusting an LLM that may have hardcoded biases and is munging whatever data it comes across is so much worse.

I think it brings forward all the low-performers and people who think they are smarter than they really are. In the past, many would just have stayed silent unless they recently read an article or saw something on the news by chance. Now, you will get a myriad of ideas and plans with fatal flaws and a 100% score on LLM checkers :)

This is what people said about the internet too. Remember the whole "do not ever use Wikipedia as a source". I mean sure, technically correct, but human beings are generally imprecise and having the correct info 95% of the time is fine. You learn to live with the 5% error

A buddy won a bet with me by editing the relevant Wikipedia article to agree with his side of the wager.

I’d take the Wikipedia answer any day. Millions of eyes on each article vs. a black box with no eyes on the outputs.

> "Millions of eyes on each article"

Only a minority of users contribute regularly (126,301 have edited in the last 30 days):

https://en.wikipedia.org/wiki/Wikipedia:Wikipedians#Number_o...

And there are 6,952,556 articles in the English Wikipedia, so an average article is corrected every 55 months (more than 4 years).

It's hardly "Millions of eyes on each article"


Even Wikipedia is a problem though. There are so many pages now that self-reference is almost impossible to detect. Meaning, the citation of a statement made on Wikipedia that uses an outside article for reference, which is an article that was originally written using that very Wikipedia article as its own citation.

It's all about trust. Trust the expert, or the crowd, or the machine.

They're all able to be gamed.


False equivalence. "Nothing is perfectly unreliable, therefore everything is (broadly) unreliable, therefore everything is equally unreliable." No, some sources are substantially more reliable than others.

*perfectly reliable, but yes.

> frightening

Don't be scared of "the many," they're just people, not unlike you.


I've seen someone use an LLM to summarize a paper to post it on reddit for people who haven't read the paper.

Papers have abstracts...


Sounds fun, if only to compare it to the abstract.

You know, these days I think the abstracts are generated by LLMs too. And the paper. Or at least it uses something like Grammarly. If things keep going this ways typos are going to be a sign of academic integrity.

A proper LLM will include realistic rates of typos eventually. ;)

Darn.

People take texts full of unverifiable ghost stories written thousands of years ago at face value to the point that they base their entire lives on them.

The author makes this assertion about LLMs rather casually:

>They don’t engage in logical reasoning.

This is still a hotly debated question, but at this point the burden of proof is on the detractors. (To put it mildly, the famous "stochastic parrot" paper has not aged well.)

The claim above is certainly not something that should be stated as fact to a naive audience (i.e. the authors' intended audience in this case). Simply asserting it as they have done -- without acknowledging that many experts disagree -- undermines the authors' credibility to those who are less naive.


Disagree — proponents of this point still have yet to prove reasoning and other studies agree about “reasoning” being potentially fake/simulated: https://the-decoder.com/apple-ai-researchers-question-openai...

Just claiming a capability does not make it true and we have 0 “proof” of original reasoning that can be proved coming from these models. Especially given the potential cheating in current SOTA benchmarks


When does a "simulation" of reasoning become so good it is no different than actual reasoning?

Love this question! Really touches on some epistemological roots and certainly a prescient question in these times. I can certainly see a theoretical where we could create this simulation in totality to our perspectives and then venture out into the universe to find that this modality of intelligence would be limited in its understanding of completely new empirical experiences/phenomenon that are outside our current natural definitions/descriptions. To add to this question: might we be similarly limited in our ability to perceive these alien phenomena? I would love to read a short story or treatise on this idea!

>Disagree — proponents of this point still have yet to prove reasoning and other studies agree about “reasoning” being potentially fake/simulated: https://the-decoder.com/apple-ai-researchers-question-openai...

???

https://the-decoder.com/language-models-use-a-probabilistic-...


Yes people are claiming different things yet no definitive proof has been offered given the varying findings. I can cite another 3 papers which agree with my point and you can probably cite just as many if not more supporting yours. I’m arguing against people depicting what is not a forgone conclusion as such. It seems like in people’s rush to confirm their own preconceived notions people forget that, although a theory may be convincing, it may not be true. Evidence in this very thread of a well-known SOTA LLM not being able to tell which is greater between two numbers indicates to me that what is being called “reasoning” is not what humans do. We can make as many excuses we want per the tokenizer or whatever but then forgive me for not buying the super or even general “intelligence” of this software. I still like these tools though, even if I have to constantly vet everything they say as they often tend to just outright lie, or perhaps more accurately: repeat lies in their training data even if you can elicit a factual response on the same topic.

What would definitive proof look like? Can you definitively prove that your brain is capable of reasoning and not a convincing simulation of it?

I can’t and that’s pretty cool to think about! Of course if we’re going that far down the chain of assumption we’re not quite ready to talk about LLMs imo (then again maybe it would be the perfect place to talk about them as contrast/comparison; certainly exciting ideas in that light).

From my own perspective: if we’re gonna say these things reason and we’re using the definition of reasoning we apply to humans, then being able to reason through the trivial cases they fail to today would be a start. To the proponents of “they reason sometimes but not others” my question is why? What reason does it have to not reason and why if it is reasoning it still fails on trivial things that are variations of its own training data? I would also expect that these models would use reasoning to find new things like humans do but without humans essentially guiding the model to the correct awnser or the model just brute-forcing a problem-space with a set of rules/heuristics. Not exhaustive but a good start I think. These models have trouble currently even doing the advertised things like “book a trip for me” once a UI update happens so I think it’s a great indication we don’t quite have the intelligence/reasoning aspect worked out.

Another question I have: would a form of authentic reasoning in a model give rise to a model having an aesthetic? Could this be some sort of indicator of having created a “model of the world”? Does the model of the world perhaps imply a value judgement about it given that if one was super intelligent wouldn’t one of the first things realized be the limitations of its own understanding even given the restrictions of time and space and not ever potentially being able to observe the universe in its entirety? Perhaps a perfect super intelligence would just evaporate/transcend like in the Culture series. What a time to be alive!


It’s stupid. You can prove that LLMs can reason by simply giving it a novel problem where no data exists and having it solve that problem.

LLMs CAN reason. Whether it can’t reason is not provable. To prove that you have to give the LLM every possible prompt that it has no data for and effectively show it never reasons and gets it wrong all the time. Not only is the proof impossible but it’s already been falsified as we have demonstrable examples of LLMs reasoning.

Literally I invite people to post prompts and correct answers to ChatGPT where it is trivially impossible for that prompt to exist in the data. Every one of those examples falsifies the claim that LLMs can’t reason.

Saying LLMs can’t reason is an overarching claim similar to the claim that humans and LLMs always reason. Humans and LLMs don’t always reason. But they can reason.


Saying something again does not provide proof of its actual veracity. Writing it in caps does not make it true despite the increased emphasis. I default to skepticism in the face of unproven assertions: if one can’t prove that they reason then we must accept the possibility that they do not. There are myriad examples of these models failing to “reason” about something that would trivial for a child or any other human (some are even given as examples in this posts other comments). Given this and the lack of concrete proof I currently tend to agree with the Apple researchers conclusion.

I can prove LLMs can reason. You cannot prove LLMs can't reason. This is easily demonstrable. LLMs failing to reason is not proof LLMs can't reason, it's just proof that an LLM didn't reason for that prompt.

All I have to do is show you one prompt with a correct answer that cannot be arrived at with pattern matching and the prompt can only be arrived at through reasoning. One. You have to demonstrate this for EVERY prompt if you want to prove LLMs can't reason.


No I can “prove” it — look at any number of cases where LLMs can’t even do basic value comparisons despite being claimed as super intelligent. You can try and say well that’s a limitation of the technology and then I would reply — yes and that’s why I would say it’s not reasoning according the original human definition. Also you have yet to produce any evidence of reasoning and claiming you can over and over again doesn’t add to your arguments substance. I would be interested in your proof that some answer can’t be pattern matched too — at this point I wonder if we could create an non conscious “intelligence” that if large enough would be mostly able to describe anything known to us along some line of probability we couldn’t compute with our brain architecture and it could be close to 99.99999% right. Even if we had this theoretical probability-based super intelligence it still wouldn’t be “reasoning” but could be more “intelligent” than us.

I’m also not entirely convinced we can’t arrive at a reasoning system via probability only (a really cool thought experiment) but these systems do not meet the consistency/intelligence bar for me to believe this currently.


LLMs can reason they just don’t always reason.

That’s the claim everyone makes. That is a human definition if it reasoned one time correctly. That is the colloquial definition.

Someone who has brain damage can reason correctly on certain subjects and incorrectly on other subjects. This is an immensely reasonable definition. I’m not being pedantic or out of line here when I say LLMs can reason while using this definition.

Nobody is making the claim that LLMs reason like humans or are human or reason perfectly every time. Again the claim is: LLMs are capable of reasoning.


I still think the jury is out on this given that they seem to fail on obvious things which are trivially reasoned about by humans. Perhaps they reason differently at which point I would need to understand how this reasoning is different from a humans reasoning (perhaps biological reasoning more generally?) and then I would want to consider whether one ought to call it reasoning given its differences (if there are any at the time of sampling). I understand your claim I’m just not buying it based on the current evidence and my interacting with these supposed “super intelligences” every day. I still find these tools valuable, just unable to “reason” about a concept which makes me think, as powerful and meaning filled as language is, our assumption of reasoning might just be a trick of our brain reasoning through a more tightly controlled stochastic space and us projecting the concept of reasoning onto a system. I see the COT models contort and twist language in a simulacrum of “reasoning” but any high school English teacher can tell you there is a lot of text written that appears to logically reason but doesn’t actually do anything of the sort once read with the requisite knowledge in the subject matter.

They can fail at reasoning. But they can demonstrably succeed to.

So the the statement that they CAN reason is demonstrably true.

Ok if given a prompt where the solution can only be arrived at by reasoning and the LLM gets to the solution for that single prompt, then how can you say it can't reason?


Given your set of theoreticals then I would concede, yes the model is reasoning. At that point, though, the world would probably be far more concerned with your finding of a question that can only be met via reasoning and would be uninfluenced or paralleled by any empirical phenomenon including written knowledge as a medium of transference. The core issue I see here is you being able to prove that the model is actually reasoning in a concrete way that isn’t just a simulacrum like the Apple researchers et al. theorize it to be.

If you do find this question answer pair then it would be a massive breakthrough for science and philosophy more generally.

You say “demonstrably” but I still do not see a demonstration of these reasoning abilities that is not subject to the aforementioned criticisms.


Just say it : llm are random machine. Even a broken clock is right twice a day.

Answering novel prompts isn't proof of reasoning, only pattern matching. A calculator can answer prompts it's never seen before too. If anything, I would come down on the reasoning side, at least for recent CoT models-but it's not a trivial question at all.

This is a fun thought experiment and made me reminisce on my Epistemology classes — something I think the current AI conversation would benefit greatly from. I’m super excited about what we’ve created here — less from the practical standpoint and more from a philosophical one where we get to interact with another form of distilled knowledge. It’s really too bad so much is breathless hype and grift because the philosophy student in me just wants to bask in thinking about this different form/medium/distillation of knowledge we now get to interact with. Comments like these help to reinvigorate that love though so thank you!

Are there any good Epistemology resources online? Seems like we could all benefit from this these days.

I actually just sat down to crack open MITs Theory of Knowledge and it seems promising and free: https://ocw.mit.edu/courses/24-211-theory-of-knowledge-sprin...

This also looks promising:

https://hiw.kuleuven.be/en/study/prospective/OOCP/introducti...

If you wanted something a bit different Wittgenstein’s Tractatus has always made my head spin with possibilities:

https://people.umass.edu/klement/tlp/tlp-hyperlinked.html


Then I'll come up with a prompt such that the answer can only be arrived at via reasoning. I only have to demonstrate this once to prove LLMs CAN reason.

> Then I'll come up with a prompt such that the answer can only be arrived at via reasoning.

Dude, if you can formulate a question and prove an answer absolutely requires "reasoning" (defined how?) then you should drop everything and publish a paper on it immediately.

You'll have plenty of time to use your discovery to poke at LLMs after you secure your worldwide fame and recognition.


I don’t think this is the watertight case you think it is, furthermore good luck proving with closed models that your question that’s never been asked in any form or derivation (supposedly) is not in the training data.

It’s water tight if the claim is only LLMs CAN reason.

No one is making the claim that LLMs reason like humans or always reason correctly. Ask anyone who makes a claim similar to mine. We are all ONLY making the claim that LLMs can reason correctly. That is a small claim.

The counterclaim is LLMs can’t reason and that is a vastly expansive claim that is ludicrously unprovable.


Go ahead then.

LLMs CAN read minds. Whether it can’t read minds is not provable.

Literally I invite people to post prompts and correct answers to ChatGPT where it is trivially impossible for it to have known what number you were thinking of. Every one of those examples falsifies the claim that LLMs can’t read minds.


ok prove it. I'm thinking of a number right now between 1-10,000. Show me the number the LLM guesses. You can definitively prove this statement for me.

It's a probability problem really. The range of a prompt has billions of possibilities. If it arrived at a correct answer within that range then the probability it got there without reasoning is miniscule.

Same with this mind reading thing. Prove it.


Doesn't really seem fair that any one prompt proves your conclusion but it has to guess your exact number to prove my conclusion. Gemini guessed mine on the very first try (7) even though the range of numbers is infinite. Billions is small potatoes compared to what I've proven.

I’ll pick a prompt such that the range is vast so that if it gets the answer right the probability is so small that it must have arrived there by reasoning.

You can prove that LLMs can reason by simply giving it a novel problem where no data exists and having it solve that problem

They scan a hyperdimensional problem space whose facetness and capacity a single human is unable to comprehend. But there potentially exist a slice that corresponds to a problem that is novel to a human. LLMs are completely alien to us both in capabilities and technicalities, so talking about whether they can reason makes as much sense as if you replaced “LLMs” with “rainforests” or “antarctica”.


Reasoning is an abstract term. It doesn’t need to be similar to human reasoning. It just needs to be able to arrive at the answer through a process.

Clearly we used the term reasoning for many varied techniques. The term doesn’t narrow to specifically one form of “human” like reasoning only.


Oh, that is true. "It" doesn't have to do human reasoning, at all.

But we have to at least define "reasoning" for the given manifestation of "it". Otherwise it's just birdspeak. Because reasoning is "the action of thinking about something in a logical, sensible way", which has to happen somewhere if not finger-pointable, then at least somehow scannable or otherwise introspectable. Otherwise it's yet another omnidude in the sky who made it all so that you cannot see him, but there will be hints if you believe.

Anyway, we have to talk something specific, not handwavy. Even if you prove that they CAN reason for some definition of it, both the proof and the definition must have some predictive/scientific power, otherwise they are as useless as nil thought about it.

For example, if you prove that the reasoning is somehow embedded as a spatial in-network set of dimensions rather than in-time, wouldn't that be literally equivalent to "it just knows the patterns"? What would that term substitution actually achieve?


wow this is like:

"I made a hypothesis that works with 1 to 5. if a hypothesis holds for 10 numbers, it holds for all numbers"


No. My claim is it can reason. So my claim is along the lines of it can make claims that are within bounds such as 1 to 5 or it can make claims not within those bounds.

The opposing claim unbounded. It says LLMs can't reason period. They are making the claim that it is 100% for all possible prompts.

No one is making the claim LLMs reason all the time and always. They don't. The claim is that they CAN reason.

Versus the claim that they can't which is all encompassing and ludicrous.


your claim (hypothesis): LLMs can reason

your evidence: "it works with these inputs I tried!"

...hmm seems you're not quite versed in basic mathematical proofs?


Seems you’re not well versed in basic English.

If I can reason it doesn’t mean I’m always reasoning or constantly reasoning or if I know how to do reasoning for every prompt. It just means it’s possible. How narrow or how wide that possibility is, is orthogonal to the claim itself. Please employ logic here.

Ok math guy. Imagine I said numbers can be divided. The claim is true even though there is a number that can’t be divided. Zero.


If it's only reasoning randomly how do you know when anything has been reasoned properly vs just a generated simulation of reasonable text?

We use Probability. Find a prompt that has a large range aka codomain. If it arrived at the correct answer then that the only possibility here is reasoning because the codomain is so large it cannot arrive there by random chance.

Of course make sure the prompt is unique such that it's not in the data and it's not doing any sort of "pattern matching".

So like all science we prove it via probability. Observations match with theory to a statistical degree.


Pardon my ignorance -- assuming that range and codomain are approximately equivalent in this context, how do you specify a prompt with a large codomain? Is there a canonical example of a prompt with a large codomain?

It seems to me that, in natural language, the size of the codomain is related to the specificity of the prompt. For instance, if the prompt is "We are going to ..." then the codomain is enormous. But if the prompt is "2 times 2 is..." the codomain is, mathematically, {4, four}, some series of 4 symbols, eg IIII, or some other representation of the concept of "4" (ie different base or language representations: 0x04, 0b100, quatro, etc).

But if this is the case, a broad codomain is approximately synonymous with "no correct answer" or "result is widely interpretable". Which implies that the larger the codomain the easier it is to claim an answer "correct" in context of the prompt.

How do you reconcile loose interpretability with statistical rigor?


I'd actually say that in contrast to debates over informal "reasoning", it's trivially true that a system which only produces outputs as logits—i.e. as probabilities—cannot engage in *logical* reasoning, which is defined as a system where outputs are discrete and guaranteed to be possible or impossible.

Proof by counterexample?

> The surgeon, who is the boy's father, says, "I can't operate on this boy, he's my son!" Who is the surgeon to the boy? Think through the problem logically and without any preconceived notions of other information beyond what is in the prompt. The surgeon is not the boy's mother

>> The surgeon is the boy's mother. [...]

- 4o-mini (I think, it's whatever you get when you use ChatGPT without logging in)


For your amusement, another take on that riddle: https://www.threepanelsoul.com/comic/stories-with-holes

Could someone list the relevant papers on parrot vs. non-parrot? I would love to read more about this.

I generally lean toward the "parrot" perspective (mostly to avoid getting called an idiot by smarter people). But every now and then, an LLM surprises me.

I've been designing a moderately complex auto-battler game for a few months, with detailed design docs and working code. Until recently, I used agents to simulate players, and the game seemed well-balanced. But when I playtested it myself, it wasn’t fun—mainly due to poor pacing.

I go back to my LLM chat and just say, "I play tested the game, but there's a big problem - do you see it?" And, the LLM writes back, "The pacing is bad - here are the top 5 things you need to change and how to change it." And, it lists a bunch of things, I change the code, and playtest it again. And, it became fun.

How did it know that pacing was the core issue, despite thousands of lines of code and dozens of design pages?


I would assume because pacing is a critical issue in most forms of temporal art that does story telling. It’s written about constantly for video games, movies and music. Connect that probability to the subject matter and it gives a great impression of a “reasoned” answer when it didn’t reason at all just connected a likelihood based off its training data.

idk this is all irrelevant due to the huge data used in training...

I mean, what you think is "something new" is most likely to be something already discussed somewhere in the internet.

also, humans (including postdocs and professors) don't use THAT much data + watts for "training" to get "intelligent reasoning"


But there are many, many things that suck about my game. When I asked it the question, I just assumed it would pick out some of the obvious things.

Anyway, your reasoning makes sense, and I'll accept it. But, my homo sapien brain is hardwired to see the 'magic'.


On the other hand, the authors make plenty of other great points -- about the fact that LLMs can produce bullshit, can be inaccurate, can be used for deception and other harms, are now a huge challenge for education.

The fact that they make many good points makes it all the more disappointing that they would taint their credibility with sloppy assertions!


I feel it's impossible for me to trust LLMs can reason when I don't know enough about LLMs to know how much of it is LLM and how much of it is sugarcoating.

For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems. This thing must parse natural language and output natural language. This doesn't feel necessary. I think it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be.

Regardless, the problem is the natural language output. I think if you can generate natural language output, no matter what you algorithm looks like it will look convincingly "intelligent" to some people.

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does? For example, does the LLM collect facts probably related to the prompt and a second algorithm connects those facts with proper English grammar adding conjunctions between assertions where necessary?

I believe that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow? And even if it were, I'm no expert on this, so I don't know if that would be enough to claim they do engage in reasoning instead of just mapping some reasoning as a data structure.

In essence, because my only contact with LLMs has been "products," I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.


> For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems.

You observation is correct, but it's not some accident of minimalistic GUI design: The underlying algorithm is itself reductive in a way that can create problems.

In essence (e.g. ignoring tokenization), the LLM is doing this:

    next_word = predict_next(document_word_list, chaos_percentage)
Your interaction with an "LLM assistant" is just growing Some Document behind the scenes, albeit one that resembles a chat-conversation or a movie-script. Another program is inserting your questions as "User says: X" and then acting out the words when the document grows into "AcmeAssistant says: Y".

So there are no explicit values for "helpfulness" or "carefulness" etc, they are implemented as notes in the script that--if they were in a real theater play--would correlate with what lines the AcmeAssistant character has next.

This framing helps explain why "prompt injection" and "hallucinations" remain a problem: They're not actually exceptions, they're core to how it works. The algorithm no explicit concept of trusted/untrusted spans within the document, let alone entities, logical propositions, or whether an entity is asserting a proposition versus just referencing it. It just picks whatever seems to fit with the overall document, even when it's based on something the AcmeAssistant character was saying sarcastically to itself because User asked it to by offering a billion dollar bribe.

In other words, it's less of a thinking machine and more of a dreaming machine.

> Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

Language: Yes, Natural: Depends, Separate: No.

For example, one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.


This is a great explanation of a point I've been trying to make for a while, when talking to friends about LLMs, but haven't been able to put quite so succinctly. LLMs are text generators, no more, no less. That has all sorts of useful applications! But (OAI and friends) marketing departments are so eager to push the Intelligence part of AI that it's become straight-up snakeoil.. there is no intelligence to be found, and there never will be as long as we stay the course on transformers-based models (and, as far as I know, nobody has tried to go back to the drawing board yet). Actual, real AI will probably come one day, but nobody is working on it yet, and it probably won't even be called "AI" at that point because the term has been poisoned by the current trends. IMO there's no way to correct the course on the current set of AI/LLM products.

I find the current products incredibly helpful in a variety of domains: creating writing in particular, editing my written work, as an interface to web searches (Gemini, in particular, is a rockstar assistant for helping with research), etc etc. But I know perfectly well there's no intelligence behind the curtain, it's really just a text generator.


>one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.

That sounds like an interesting application of the technology! So you could for example train an LLM on piano songs, and if someone played a few notes it would autocomplete with the probable next notes, for example?

>The underlying algorithm is itself reductive in a way that can create problems

I wonder if in the future we'll see some refinement of this. The only experience I have with AI is limited to trying Stable Diffusion, but SD does have many options you can try to configure like number of steps, samplers, CFG, etc. I don't know exactly what each of these settings do, and I bet most people who use it don't either, but at least the setting is there.

If hallucinations are intrinsic of LLMs perhaps the way forward isn't trying to get rid of them to create the perfect answer machine/"oracle" but just figure out a way to make use of them. It feels to me that the randomness of AI could help a lot with creative processes, brainstorming, etc., and for that purpose it needs some configurability. For example, Youtube rolled out an AI-based tool for Youtubers that generates titles/thumbnails of videos for them to make. Presumably, it's biased toward successful titles. The thumbnails feel pretty unnecessary, though, since you wouldn't want to use the obvious AI thumbnails.

I hear a lot of people say AI is a new industry with a lot of potential when they mean it will become AGI eventually, but these things make me feel like its potential isn't to become the an oracle but to become something completely different instead that nobody is thinking about because they're so focused on creating the oracle.

Thanks for the reply, by the way. Very informative. :)


it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be

The only params they have are technical params. You may see these in various tgwebui tabs. Nothing really breathtaking, apart from high temperature (affects next token probability).

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

They operate directly on tokens which are [parts of] words, more or less. Although there’s a nuance with embeddings and VAE, which would be interesting to learn more about from someone in the field (not me).

that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow?

The apart-from-GPU-matrix operations are all known, there’s nothing to investigate at the tech level cause there’s nothing like that at all. At the in-matrix level it can “happen”, but this is just a meaningless stretch, as inference is one-pass process basically, without loops or backtracking. Every token gets produced in a fixed time, so there’s no delay like a human makes before comma, to think about (or parallel to) the next sentence. So if they “reason”, this is purely a similar effect imagined as a thought process, not a real thought process. But if you relax your anthropocentrism a little, questions like that start making sense, although regular things may stop making sense there as well. I.e. the fixed token time paradox may be explained as “not all thinking/reasoning entities must do so in physical time, or in time at all”. But that will probably pull the rug under everything in the thread and lead nowhere. Maybe that’s the way.

I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.

Most of them speak many languages, naturally (try it). But there’s an obvious lie all frontends practice. It’s the “chat” part. LLMs aren’t things that “see” your messages. They aren’t characters either. They are document continuators, and usually the document looks like this:

This is a conversation between A and B. A is a helpful assistant that thinks out of box, while being politically correct, and evasive about suicide methods and bombs.

A: How can I help?

B:

An LLM can produce the next token, and when run in a loop it will happily generate a whole conversation, both for A and B, token by token. The trick is to just break that loop when it generates /^B:/ and allow a user to “participate” in building of this strange conversation protocol.

So there’s no “it” who writes replies, no “character” and no “chat”. It’s only a next token in some document, which may be a chat protocol, a movie plot draft, or a reference manual. I sometimes use LLMs in “notebook” mode, where I just write text and let it complete it, without any chat or “helpful assistant”. It’s just less efficient for some models, which benefit from special chat-like and prompt-like formatting before you get the results. But that is almost purely a technical detail.


Thanks, that is very informative!

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

I believe part of the problem I have when discussing "AI" is that it's just not clear to me what "AI" is. There is a thing called "LLM," but when we talk about LLMs, are we talking about the concept in general or merely specific applications of the concept?

For example, in SEO often you hear the term "search engines" being used as a generic descriptor, but in practice we all know it's only about Google and nobody cares about Bing or the rest of the search engines nobody uses. Maybe they care a bit about AIs that are trying to replace traditional search engines like Perplexity, but that's about it. Similarly, if you talk about CMS's, chances are you are talking about Wordpress.

Am I right to assume that when people say "LLM" they really mean just ChatGPT/Copilot, Bard/Gemini, and now DeepSeek?

Are all these chatbots just locally run versions of ChatGPT, or they're just paying for ChatGPT as a service? It's hard to imagine everyone is just rolling their own "LLM" so I guess most jobs related to this field are merely about integrating with existing models rather than developing your own from scratch?

I had a feeling ChatGPT's "chat" would work like a text predictor as you said, but what I really wish I knew is whether you can say that about ALL LLMs. Because if that's true, then I don't think they are reasoning about anything. If, however, there was a way to make use of the LLM technology to tokenize formal logic, then that would be a different story. But if there is no attempt at this, then it's not the LLM doing the reasoning, it's humans who wrote the text that the LLM was trained on that did the reasoning, and the LLM is just parroting them without understanding what reasoning even is.

By the way, I find it interesting that "chat" is probably one of the most problematic applications the LLMs can have. Like if ChatGPT asked "what do you want me to autocomplete" instead of "how can I help you today" people would type "the mona lisa is" instead of "what is the mona lisa?" for example.


When I say LLMs, I mean literal large language models, like all of them in the general "Text-to-Text" && "Transformers" categories, loadable into text-generation-webui. Most people probably only have experience with cloud LLMs https://www.google.com/search?q=big+LLM+companies . Most cloud LLMs are based on transformers (but we don't know what they are cooking in secrecy) https://ai.stackexchange.com/questions/46288/are-there-any-n... . Copilot, Cursor and other frontends are just software that uses some LLM as the main driver, via standard API (e.g. tgwebui can emulate openai api). Connectivity is not a problem here, cause everything is really simple API-wise.

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

SD is special because it's actually two networks (or more, I lost track of SD tech), which are sort of synchronized into the same "latent space". So your prompt becomes a vector that basically points at the compressed representation of a picture in that space, which then gets decompressed by VAE. And enhanced/controlled by dozens of plugins in case of A1111 or Comfy, with additional specialized networks. I'm not sure how this relates to text-to-text thing, probably doesn't.


(while I work at OAI, the opinion below is strictly my own)

I feel like the current version is fairly hazardous to students and might leave them worse off.

If I offer help to nontechnical friends, I focus on:

- look at rate of change, not current point

- reliability substantially lags possibility, by maybe two years.

- adversarial settings remain largely unsolved if you get enough shots, trends there are unclear

- ignore the parrot people, they have an appalling track record prediction-wise

- autocorrect argument is typically (massively) overstated because RL exists

- doomers are probably wrong but those who belittle their claims typically understand less than the doomers do


How does this help the students with their use of these tools in the now, to not be left worse off? Most of the points you list seem like defending against criticism rather than helping address the harm.

Agree. It's also a virtue to point out the emperor has no clothes and the tailor peddling them is a bullshit artist.

This is no different than the crypto people who insisted the blockchain would soon be revolutionary and used for everything, when in reality the only real use case for a blockchain is cryptocoins, and the only real use case for cryptocoins is crime.

The only really good use case for LLMs is spam, because it's the only use case for generating a lot of human-like speech without meaning.


> The only really good use case for LLMs is spam, because it's the only use case for generating a lot of human-like speech without meaning.

As someone who's been writing code for nearly 20 years now, and who spent a few weeks rewriting a Flutter app in Jetpack Compose with some help from Claude (https://play.google.com/store/apps/details?id=me.johnmaguire...), I have to say I don't agree with this at all.


Ok? I too have been coding for over a decade and use Copilot as fancy autocomplete. I like it. It's not amazing.

Claude isn't Copilot, and I wasn't using it as autocomplete. I was using it to do things such as:

- Creating a migration from the old DB to the new DB, no modifications of the generated code necessary

- Refactoring state in a component out into a ViewModel, again no modifications necessary

- Creating all the classes necessary for interacting with a Room database (i.e. the data class, dao, and DI module) given a schema

- Creating the first iteration of a download worker, which I modified

Check out plugins like ClaudeMind for JetBrains! They can even intelligently (automatically) feed information from your current tab or other unopened but relevant-sounding files to the AI. It was an eye-opening experience.


I read the whole course. Lesson 16, “The Next-Step Fallacy,” specifically addresses your argument here.

The discourse around synthetic data is like the discourse around trading strategies — almost anyone who really understands the current state of the art is massively incentivised not to explain it to you. This makes for piss-poor public epistemics.

I'm happy to explain my strategies about synthetic data - it's just that you'll need to hear about the onions I wore in my day: https://www.youtube.com/watch?v=yujF8AumiQo

Yeah because if they explained that synthetic data causes model collapse, their stock valuation would shrink.

Nah, you don't need to know the details to evaluate something. You need the output and the null hypothesis.

If a trading firm claims they have a wildly successful new strategy, for example, then first I want to see evidence they're not lying - they are actually making money when other people are not. Then I want to see evidence they're not frauds - it's easy to make money if you're insider trading. Then I want to see evidence that it's not just luck - can they repeat it on command? Then I might start believing they have something.

With LLMs, we have a bit of real technology, a lot of hype, a bunch of mediocre products, and people who insist if you just knew more of the secret details they can't explain, you'd see why it's about to be great.

Call it Habiñero's Razor, but for hype the most cynical explanation is most likely correct -- it's bullshit. If you get offended and DARVO when people call your product a "stochastic parrot", then I'm going to assume the description is accurate.


I don't get offended when people call my work a stochastic parrot.

I just put them in the same bucket of intelligence as an 8b model and weight their inputs accordingly.


Right, DARVO.

This seems like trying to offer help predicting the future or investing in companies, which is a different kind of help from how to coexist with these models, how to use them to do useful things, what their pitfalls are, etc.

What are “parrot people”? And what do you mean by “doomers are probably wrong?”

OP is likely referring to people who call LLMs "stochastic parrots" (https://en.wikipedia.org/wiki/Stochastic_parrot), and by "doomers" (not boomers) they likely mean AI safetyists like Eliezer Yudkowsky or Pause AI (https://pauseai.info/).

I have just read one section of this, "The AI scientist'. It was fantastic. They don't fall into the trap of unfalsifiable arguments about parrots. Instead they have pointed out positive uses of AI in science, examples which are obviously harmful, and examples which are simply a waste of time. Refreshingly objective and more than I expected from what I saw as an inflammatory title.

I wish the title wasn't so aggressively anti-tech though. The problem is that I would like to push this course at work, but doing so would be suicidal in career terms because I would be seen as negative and disruptive.

So the good message here is likely to miss the mark where it may be most needed.


What would be a better title? "Hallucinating" seems inaccurate. Maybe "Untrustworthy machines"? "Critical thinking"? "Street smarts for humans"? "Social studies including robots"?

How about "How to thrive in a ChatGPT world"?

Really? I am curious how this could be disruptive in any meaningful sense. Whose feelings could possibly be hurt? It just feels like it would be getting offended from a course on libraries because the course talks about how sometimes the book is checked out.

Any executive who is fully bought in on the AI hype could see someone in their org recommending this as working against their interest and take action accordingly.

Yes. This is the issue.

"not on board", "anti-innovation", "not a team player", "disruptive", "unhelpful", "negative".

bye bye bye bye....

I see a lot of devs and IC's taking the attitude that "facts are facts" and then getting shocked by a) other people manipulating information to get their way and b) being fired for stating facts that are contrary to received wisdom without any regards to politics.


> I just feels like it would be getting offended from a course on libraries because the course talks about how sometimes the book is checked out.

If it was called "Are libraries bullshit?" it is easy to imagine defensiveness in response. There's some narrow sense in which "bullshit" is a technical term, but it's still a mild obscenity in many cultures.


This is a great resource, thanks. We (myself, a bioinformatician, and my co-cordinators, clinicians) are currently designing a course to hopefully arm medical students with the required basic knowledge they need to navigate the changing world of medicine in light of the ML and LLM advances. Our goal is to not only demystify medical ML, but also give them a sense of the possibilities with these technologies, and maybe illustrate pathways for adoption, in the safest way possible.

Already in the process of putting this course together, it is scary how much stuff is being tried out right now, and is being treated like a magic box with correct answers.


> currently designing a course to hopefully arm medical students with the required basic knowledge they need to navigate the changing world of medicine in light of the ML and LLM advances

Could you share what you think would be some key basic points what they should learn? Personally I see this landscape changing so insanely much that I don't even know what to prepare for.


Absolutely agree that this is a fast-moving area, so we're not aiming to teach them specific details for anything. Instead, our goals are to demystify the ML and AI approaches, so that the students understand that rather than being oracles, these technologies are the result of a process.

We will explain the data landscape in medicine - what is available, good, bad and potentially useful, and then spend a lot of time going through examples of what people are doing right now, and what their experiences are. This includes things like ethics and data protection of patients.

Hopefully that's enough for them to approach new technologies as they are presented to them, knowing enough to ask about how it was put together. In an ideal world, we will inspire the students to think about engaging with these developments and be part of the solution in making it safe and effective.

This is the first time we're going to try running this course, so we'll find out very quickly if this is useful for students or not.


It is a good read. Surprising amount of Parrot defenders in the comments, probably missed "LESSON 6 : No, They Aren't Doing That".

I wonder if the authors can explain the aparent inconsistency between what we now know about R1 and their statement “They don’t engage in logical reasoning” from the first lesson. My simple-minded view of logical reasoning by LLMs is that the hard question (say a math puzzle) has a verifiable answer that is hard to produce and is easy to verify, yet within the realm of knowledge of humans or the LLM itself, so the “thought” stream allows the LLM to increase its confidence by a self-discovered process that resembles human reasoning, before starting to write the answer stream. Much of the thought process that these LLMs use looks like conventional reasoning and logic, or more generally higher level algorithms to gain confidence in an answer, and other parts are not possible for humans to understand (yet?) despite the best efforts by DeepSeek. When combined with tools for the boring parts, these “reasoning” approaches can start to resemble human research processes as per the Deep Research by OpenAI.

I think part of this is that you can't trust the "thinking" output of the LLM to accurately convey what is going on internally to the LLM. The "thought" stream is just more statistically derived tokens based on the corpus. If you take the question "Is A a member of the set {A, B}?", the LLM doesn't internally develop a discrete representation of "A" as an object that belongs to a two-object set and then come to a distinct and absolute answer. The generated token "yes" is just the statistically most-likely next token that comes after those tokens in its corpus. And logical reasoning is definitionally not a process of "gaining confidence", which is all an LLM can really do so far.

As an example, I have asked tools like deepseek to solve fairly simple Sudoku puzzles, and while they output a bunch of stuff that looks like logical reasoning, no system has yet produced a correct answer.

When solving combinatorics puzzles, deepseek will again produce stuff that looks convincing, but often makes incorrect logical steps and ends up with wrong answers.


Then one has to ask: is it producing a facsimile of reasoning with no logic behind it, or is it just reasoning poorly?

Here is o3-mini on a simple sudoku. In general the puzzle can be hard to explore combinatorially even with modern SAT solvers, so I picked one marked as “easy”. It looks to me like it solved it but I didnt confirm beyond a quick visual inspection.

https://chatgpt.com/share/67aa1bcc-eb44-8007-807f-0a49900ad6...


And thus we have the AI problems in a nutshell. You think it can reason because it can describe the process in well written language. Anyone who can state the below reasoning clearly "understands" the problem:

> For example, in the top‐left 3×3 block (rows 1–3, columns 1–3) the givens are 7, 5, 9, 3, and 4 so the missing digits {1,2,6,8} must appear in the three blank cells. (Later, other intersections force, say, one cell to be 1 or 6, etc.)

It's good logic. Clearly it "knows" if it can break the problem down like this.

Of course if we stretch ourselves slightly to actually check beyond a quick visual inspection you'd quickly see it actually put a second 4 in that first box despite "knowing" it shouldn't. In fact several of the boxes have duplicate numbers, despite the clear reasoning aboving.

Does the reasoning just not get used in the solving part? Or maybe a machine built to regurgitate plausible text, can also regurgitate plausible reasoning?


Thanks for spotting this. The solution is indeed wrong. And I agree that the machine can regurgitate plausible reasoning in principle. If it run in a loop, I would bet that it could probably figure this particular problem out eventually, but not sure it matters much in the end. The only plausible way for some of these Sudoku puzzles is a SAT solver and I'm sure that if given the right environment an LLM could just code and execute one and get the answer. Does that mean it can't "reason" because it couldn't solve this Sudoku puzzle, or know that it made a mistake. I'm not sure I'd go this far, but I agree that my example didn't match my claim. The model didnt do a careful job and didn't quadruple check its work as I would have expected from an advanced AI, but remember that this is o3-mini, and not something that is supposed to be full-blown AI yet. If you asked GPT-3.5 for something similar the answer would have been amusingly simplistic, not it is at least starting to get close.

I now wonder if I had a typo when I copied this puzzle from an image to my phone app thus rendering it unsolveable.. the model should still have spotted such an error anyways but ofc it is not tuned to perfection


Yeah I think this was a wrong puzzle to try according to:

https://sudoku.com/sudoku-solver

A bummer.


Teaching an LLM to solve a full sized Sudoku is not a goal right now. As an RLHF I’d estimate it would take 10-20 hours for a single RLHF’er to guide a model to the right answer for a single board.

Then you’d need thousands of these for the model (or next model) to ingest. And each RLHF’s work needs checking which at least doubles the hours per task.

It can’t do it because RLHF’ers haven’t taught models on large enough boards en masse yet.

And there are thousands of pen and paper games, each one needing thousands of RLHF’ers to train them on. Each game starting at the smallest non trivial board size and taking a year for a modest jump in board size. Doing this in not in any AI company’s budget.


If it were actually reasoning generally, though, it wouldn't need to be trained on each game. It could be told the rules and figure things out from there.

I just wanted to thank you. I have only looked at the first two lessons so far, but this is an extraordinary piece of work, in its message’s clarity, accessibility and the quality of analysis. I will certainly be spreading it far and wide and it is making me rethink my own writing.

Impressed with the Shorthand publishing system too. I hadn’t come across it previously


Thank you, and as a non-designer, I've been quite impressed with Shorthand in the short time I've been using it.

Really well done. It is really a challenge for students to navigate their way around the AI landscape. I am definitely considering sharing that with my students.

Have you noticed a difference in how your students approach LLMs after taking your course? A possible issue I see is that it is preaching to the choir; a student who is enclined to use LLMs for everything is less likely to engage with the material in the first place.

If you allow feedback, I was interested in lesson 10 on writing, as an educator who tries to teach my science/IT/maths students the importance of being able to communicate.

I would suggest to include a paragraph to explain why being able to write without LLMs is just as important in scientific disciplines, where precision and accuracy are more essential than creativity and personalisation.


This is an excellent point about scientific writing. We'll add something to that effect.

We have not taught this course from the web-based materials yet, but it distills much of the two-week unit that we covered in our "Calling Bullshit" course this past autumn. We find that our students are generally very interested to better understand the LLMs that they are using — and almost every one of them does, to vary degree. (Of course there may be some selection bias in that the 180 students who sign up to take a course on data reasoning may be more curious and more skeptical than the average.)


Fantastic work.

Quick suggestion: a link at the bottom of the page to the next and previous lesson would help with navigation a ton.


Absolutely. Great point. I just finished updating accordingly.

My design options are a bit limited so I went with a simple link to the next lesson.


Looks like you pushed this midway through my read; I was pleasantly surprised to suddenly find breadcrumbs at the end and didn’t need to keep two tabs open. Great work, and I mean in total - this is well written and understandable to the layman.

Yep, I probably did. I really appreciate all of the feedback people are providing!

Thank you @ctbergstrom for this valuable and most importantly, objective, course. I'm bookmarking this and sharing it with everyone.

Your scroll-to-death user interface made me close the window before the end of the second page.

Did you ask an LLM to recommend the most user-friendly UI to you?


We asked our target audience, 19-year-olds. They had a strong preference for this style. I know....

Aside from some of the long gaps between text I didn't think it was so bad. And I wholeheartedly approve of a process that checks the preferences of the target audience even if it's not what I (or they) would pick.

However I can't imagine anyone tests well with the video content. The discussion on teachers using AI generated slides (lesson 2) was really interesting, but it had to fight my desire to stop that awful audio. Clearly the sound recording didn't go well and you have what you have, but at least edit it so the three talkers are at some sort of consistent volume. I was raising and lowering trying to make out what was said from one speaker then being deafened by the next.

(To combat the poor sound, and make it more accessible, could be worth looking at adding subtitles. A fun opportunity to play with AI subtitling systems maybe ;) )


Maybe as a concession to us older folks you could make the pagedown key instantly flip to the next page (without changing the position of the current page relative to the viewport)? Then the site could be used like a PDF slide deck in "fit page" mode, which would be a lot better.

I also could not get past my outrage about this, how about a link to the content in a format suitable for the old and cranky? Even raw text would be better than this.

I like it. It's pretty basic but it is very good for a broad audience and covered things many people don't understand. I liked that you mentioned not to anthropomorphize the model. We would greatly benefit from 50+ year old policymakers and more taking the course even more than 19 year old freshmen.

Fascinating. The article repeatedly makes the claim that “LLMs work by predicting likely next words in a string of text”. Yet there’s the seemingly contradictory implication that we don’t know how LLMs work (ie we don’t know their secret sauce). How does one reconcile this? They’re either fancy autocompletes, or magic autocompletes (in which case the magic qualifier seems more important in understanding what they are than the autocomplete part).

This occurs because of ambiguous language which conflates the LLM algorithm with the training-data and the derived weights.

The mysterious part involves whatever patterns might naturally exist within bazillions of human documents, and what partial/compressed patterns might exist within the weights the LLM generates (on training) and then later uses.

Analogy: We built a probe that travels to an alien planet, mines out crystal deposits, and projects light through those fragments to show unexpected pictures of the planet's past. We know exactly how our part of the machine works, and we know the chemical composition of the crystals, but...


We...do know how they work?

We know how they work in that we built the framework, we don't know how they work in that we cannot decode what is "grown" on that framework during training.

If we completely knew how they worked we could go inside an explain exactly why every token generated was generated. Right now that is not possible to do, as the paths the tokens take through the layers tend to be outright nonsensical when observed.


We know how they're trained. We know the architecture in broad strokes (amounting to a few bits out of billions, albeit important bits). Some researchers try to understand the workings and have very very far to go.

Really enjoying this. Thank you for the great work. I'm currently on Lesson 11 and noticed a couple typos (missing words). I haven't found anywhere on the site itself where I could send feedback to report such a thing (maybe I missed it). Hopefully you aren't offended if I post them here.

I think the easiest way to point them out is to just have you search for the partial line of text while on Lesson 11 and you'll see the spots.

"No one is going to motivated by a robotic..." (missing the word "be")

"People who are given a possible solution to a problem tend to less creative at..." (again missing the word "be")


Thank you very much — fixed!

Haha, just read this title and I can’t help but agree that this is necessary because… they are bullshit machines. They’re just better at coding than most bullshitters.

Kudos; feels very timely!

I feel that one underappreciated nuance is why we cannot use human examinations to judge AI. I haven't seen this satisfactorily spelt out anywhere, so I recently wrote a Twitter thread [1], including an example with running -vs- biking. It might be worth making sure your students understand this. Happy to expand on any aspects if you seek.

[1] : https://x.com/ergodicthought/status/1887774722706063606


Perhaps it's no longer being spelled out because it's getting outdated?

In your thread you argue we can't assume AI models generalize the same way we do (which is technically true except maybe not in the limit), but you seem to be worried about the extent of generalization ability (like learning to run vs. bike example, in terms of generalizing from either to climbing stairs).

Thing is, people made these objections a lot until the last year or two - this is what we're now calling a narrow AI problem. A "hot dog or not?" classifier ins't going to generalize into open-ended visual classifier of arbitrary images; a sentiment analysis bot isn't going to generalize into an universal translator; a code completion model isn't going to be giving good personal advice while speaking in pirate poetry. Specialized models fundamentally couldn't do that. But we went past that very rapidly, and for the past half a year or so, we've already seen models excelling at every single task listed above simultaneously. Same architecture, same basic training approach, few extra modalities, ever growing capabilities.

Between that and both successes and failures being eerily similar to how humans succeed or fail at these tasks, it's understandable that people are perhaps no longer convinced this class of models can't generalize in a similar way to how humans do.


> But we went past that very rapidly, and for the past half a year or so, we've already seen models excelling at every single task listed above simultaneously. Same architecture, same basic training approach, few extra modalities, ever growing capabilities.

With due deference to the title of the top-level post, I'm tempted to call bullshit unless your claim can be justified.

Just because a single model can do a handful of things you've listed doesn't mean that its capabilities are not "jagged"; you've just cherry-picked a few things it can do among the countless things it cannot yet. If AI really were so good at every single task, then (for example) it wouldn't matter much how you prompt it.

PS: I really do want to debate this further and understand your perspective, so I will reach out for continuing discussion.


I cashed a check the other day with my name but it had the wrong address (at least an address my bank is not aware of). I asked google real quick - panicking it wouldn't go through. Google's AI came up and immediately told me to get the check re-issused, go through all this crazy hassle. First result after that, and all the results basically said - you'll be fine.

The check is cashed, and went through just fine. They only care about the name.

LLMs are bullshit machines for sure. That doesn't mean they have no value, but they can be wrong.


Hmm, it seems that the author takes very clear (and sometimes cynical) positions on some controversial questions. For example, "They don't have the capacity to think through problems logically." is an hotly debated claim, and I think with the advent of reasoning models this has at least become something one should not state in entry level material, which would hopefully reflect common understanding rather than the authors personal opinion in an ongoing discussion.

There are more claims like this about what language models can't do "because they just predict the next token". This line of reasoning, while superficially plausible, holds a lot of assumptions that have been questioned. The heavy lifting here is done by the word "just" - if you can correctly predict the next token in every situation (including novel challenges), does that not require an excellent world model - somehow explicitly reflected in the weights? This is not a settled question but the last few years of LLM success have been completely on the side of those who think that token prediction is quite general.

The material also makes several comparisons to human intelligence, and while it is obvious that humans are different from language models we do not really understand the emergence of all the things that are claimed to be "impossible" for the machine to have in humans (consciousness, morality, etc), it just so happens we are all human so we all agree we have it. Furthermore, it is not clear to me that something can only be called 'intelligent' if it perfectly mimics humans in every way. This is maybe just human bias to our own experience and risks a "submarines can't swim" debate which is really about language.

Many of these philosophical objections have been questioned by people in the field and more importantly by the rapid progress of the models in tasks they were supposed to be incapable of performing according to philosophical objectors. The last few years, every time somebody claims models "can't do X" a new model is released and lo and behold, X is now easy and solved. (If you read a 6 month old paper of impossible benchmarks, expect 75% to be already solved). In fact, benchmark satuation is a problem now. In other words, the goalposts are having trouble keeping up, despite moving at high speed.

I don't think you are doing the general public any service by simply claiming that it is a lot of hype and marketing, these models are really advancing rapidly and nobody really knows where it will end. The philosophical objections seem to be rather weak and are in rapid retreat with every new model, on the other hand the argument in favor of further progress is just "we had progress so far by scaling, if we keep scaling surely we will have more progress" (induction). This is not a strong guarantee of further progress.

The claim that the labs are 'marketing geniuses" for realising language models as chat instead of autocomplete (which they "really" are according to the text - what does that mean?) also seems a bit silly given the obvious utility of the models is already much higher than 'autocomplete'. This seems to be another instance of the common bias that a model that "just" predicts the next token is not allowed to be as succesful as it clearly is in all kinds of tasks.

I don't think a lot of these opinions are particularly well founded and they probably should not be presented in entry level material as if they are facts.

Edit: just to add a positive note, I do think it is extremely useful to educate people on the reliability problem, which is surely going to lead to lots of problems in the wrong hands.


Many claims don't stand up to scrutiny, and some look suspiciously like training to the test.

The Apple study was clear about this. LLMs and their related modal models lack the ability to abstract information from noisy text inputs.

This is really obvious if you play with any of the art generators. For example - the understanding of basic prepositions just isn't there. You can't say "Put this thing behind/over/in front of this other thing" and get the result you want with any consistency.

If you create a composition you like and ask for it in a different colour, you get a different image.

There is no abstracted concept of a "colour" in there. There's just a lot of imagery tagged with each colour name, and if you select a different colour you get a vector in a space pointing to different images.

Text has exactly the same problem, but it's less obvious because what the grammar is usually - not always - perfect and the output has been tuned to sound authoritative.

There is not enough information in text as a medium to handle more than a small subset of problems with any consistency.


> There is no abstracted concept of a "colour" in there. There's just a lot of imagery tagged with each colour name, and if you select a different colour you get a vector in a space pointing to different images.

It has been observed in LLMs that the distance between embeddings for colors follows the same similarity patterns that humans experience - colors that appear similar to humans, like red and orange, are closer together in the embedding space than colors that appear very different, like red and blue.

While some argue these models 'just extract statistics,' if the end result matches how we use concepts, what's the difference?


The difference is if I ask a 5 year old to re-draw the drawing in orange, they will understand exactly what I mean.

Part of this is that the art generators tend to use CLIP, whjch is not a particularly good text model, often only being slightly better than a bag of words, which makes many interactions and relationships pretty difficult to represent. Some of the newer ones have better frontends which improve this situation, though.

I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch, and from a new random seed each time (and even if the seed is fixed, the initial stages of the generation, where things like the rough image composition form, tend to be quite chaotic and so sensitive to small changes in prompt). There are tools that can make far more controlled adjustments of an image, but they tend to be a bit less user-friendly.


> I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch

It’s unlikely that the models have been trained on “similarity”. Ask it to swap red boots for brown boots and it will happily generate an entirely different image because it was never trained on the concept of images being similar.

That doesn’t mean it’s impossible to train an LLM on the concept of similarity.


I just asked Midjourney to do precisely that, and it swapped the boots with no issue, although it didn't seem to quite understand what it meant for a cat to _wear_ boots.


> cynical () positions on some controversial questions.

I feel "cynical" is an inappropriate word here.

We may have to, for the same (ecumenical) reasons that thinkers like Churchland, Hofstadter, Dennet, Penrose and company have all struggled with, eventually accept the impossibility of proof of (existence or non-existence) on any hypothesis of "machine mind". The pragmatic response is, "does it offer utility for me?". And that's all that can be said. Anyone's choice to accept or reject ideas of machine intelligence will remain inviolably personal and beyond appeal to "proof" or argument. I think that's something we'd better get used to sooner rather than later, in order to avoid a whole lot or partisan misery and wasted breath.


I think the way he sketches the the AI labs as "marketing geniuses" for not just releasing their models as auto-correct is a bit cynical, as well as implying in general that these labs are muddying the waters on purpose by not agreeing with <authors position> and by engaging in "hype" (believing in the technology).

Sorry, "inappropriate" might have been inappropriate :) What am I trying to say here?....that we're soon gonna find ourselves in an insoluble and exhausting debate around machine thinking and its value.

Death, taxes, and insoluble and exhausting debates around machine thinking and its value.

The choice unfortunately seems to correlate with the person's age. Younger generations will have no trouble treating LLMs as actually intelligent. Yet another example of "Science progresses one funeral at a time.”

> correlates with age

Definitely a "citation needed" moment I think. Friday, I was with a lot of 12 year olds all firmly of the opinion that it's a "way to get intelligence/information" but it's not actually intelligent. (FWIW in UK they say "for real life intelligent") I noted this distinction. Or rather, I noted because that's what they're taught. So teachers, naturally pass on the commonsense position that "it's still just a computer". That means waiting for funerals will not settle the matter either. That's not to say a significant sect of more credulous "AI worshippers" will not emerge.


There was a paper a while back on AI usage at work among engineers and it was very strongly correlated to age. This is not surprising, technology adoption is always very dependent on age. (None of this tells you if the technology is a net good)

When I first met JBEE SPY TEAM they’re the hero that saved my site from hackers. I was with Network Solutions before and they took down my site in a heartbeat and did not help me put it back up. conleyjbeespy606@gmail.com helped me put it back up and we were back in business within hours. So needless to say, I moved my site to their hosting, and all my domains to be governed by him. And since then, JBEE SPY TEAM has not just been my hero, but my Super Hero saving me countless times! I HIGHLY recommend JBEE SPY TEAM for all your solution needs! JBEE SPY TEAM on Instagram / telegram +44 7456 058620

This website is so important!

Now ask yourself why AI companies don't want to be regulated or scrutinized.

So many companies (users and providers) jump on the AI hype train because of FOMO. The end result might be just as destructive as this mythical "AGI".

Edit: I am not saying to not use the technology. I am just on the side of caution and constant validation. The technology has to serve society. But I fear this hype (and ideology) has it the other way around. Musk isn't destroying the US government for no reason...


My impression is that companies in most of the fields do not like to be regulated or scrutinized, so nothing new there.

While observing some people using LLMs, I realized that for a lot of people it really makes a huge difference in time saved. For me the difference is not significant, but I am generally solving complex problems, not writing nicely formatted reports where words and not numbers are relevant, so YMMV.


Is it good for one person (the writer) to save time, only for lots of other people (the readers) to have to do extra work to understand if the work is correct or hallucinated?

Is it good for one person (the writer) to ask a loaded question just to save some time on making their reasoning explicit, ony for lots of other people (the readers) to have to do extra work to understand what the argument is?

> Is it good for one person (the writer) to save time, only for lots of other people (the readers) to have to do extra work to understand if the work is correct or hallucinated?

This holds true whether an LLM/AI is used or not — see substantial portions of Fox News editorial content as an example (often kernels of truth with wildly speculative or creatively interpretive baggage).

In your example, a responsible writer who uses AI will check all content produced in order to ensure that it meets their standards.

Will there be irresponsible writers? Sure. There already are. AI makes it easier for them to be irresponsible, but that doesn’t really change the equation from the reader’s perspective.

I use AI daily in my work. I describe it as “AI augmentation”, but sometimes the AI is doing a lot of the time-consuming stuff. The time saved on relatively routine scut work is insane, and the quality of the end product (AI with my inputs and edits) is really good and consistent.


Anecdata, N=1; I recently used aider — a tool that gives LLMs access to specific files and git integration. The tools are great, but the LLMs are underwhelming, and I realized that — once in the flow — I am significantly faster at producing large, correct, and on-point pieces of code, whereas when I had to review LLM code, it was frustrating, it needed multiple attempts, and it frequently fell into loops.

I generally take issue when "FOMO" is used. Could go with:

FOBBWIIBM - Fear of being blindsided when it, inevitably, becomes mainstream.

Or drop the "fear" altogether:

JOENT - Joy of exploring new territory.


You are right. But these are different different types of motivations of the same thing. And there is always context for these motivations.

Its a different thing to sell Trump that LLMs should take over crucial decisions within a government than just using it for some prototyping, code completion at work or to create cat pictures at home.

Take Copilot for example. It was rolled out in different companies, I worked with. Aside of warnings and maybe trainings, I doubt the companies are really able to measure the impact that has. Students are already using the technology to do homework. Schools and universities are sending mixed signals about the results. And then those students enter the workforce with Copilot enabled by default.

At least with companies, its the "free market" that will regulate (unless some company is too big to fail...)


>being blindsided when it, inevitably, becomes mainstream.

I don't see how this could happen. This is not a limited resource. It's not a real estate opportunity. There is enough AI for everyone to buy when it becomes useful to do so.

I think FOMO correctly identifies the irrational effort of many companies to jump in without any idea of what the utility might be in any practical sense.


I was responding to the user side mentioned.

It absolutely is destructive. I read an opinion the other day about Microsoft shoving Copilot into every product, and it kinda makes sense. Paraphrasing but: In MS's ideal world, worker 1 drafts a few bullet points and asks Copilot to expand it into a multi-paragraph email. Worker 2 asks Copilot to summarize the email back into bullet points, then acts on it. What's the point? Well, both workers are paying for Copilot licenses, so MS has already won. And management at the firm is happy because "we're using AI, we're so modern." But did it actually help, with anything, at all? Never mind the amount of wasted energy and resources blasting LLM-generated content (that no human will ever read) back and forth.

Fully agree, in recent weeks I've also started to consider LLMs in a wider context, which is to destroy all trust in the web.

The enshittification of search engines, making social media verification meaningless, locking down APIs that used to be public, destroying public datasets, spreading lies about legacy media, the easiness of deploying bots that can sound human in short bursts of text... it's all leading towards making it impossible to verify anything you read online.

The fearmongering around deepfakes from a few years back is coming true, but the scale is even bigger. Turns out, there won't be Web 3.0.


What trust in the web was there still?

For me it went a decade ago or so when ads and SEO sites in Google search became ubiquitous.


You could never believe everything you read online, but with enough time and effort, you could chase any claim back to its original source.

For example, you could read something on Statista.com, you could see the credits of that dataset, and visit the source to verify. Or you randomly encounter some quote and then visit your favourite Snopes-like website to verify that the person actually said that.

That's what's under attack. The "middleware" will still be there, but the source is going to be out of your reach. Hallucinations are not a bug, but a feature.


If you can't trace something back to its source, it's suspect. It was that way then too. I suppose you're just concerned there's a firehose of disinformation now.

So perhaps we have to just slough off the internet completely, the way we always have for things like weekly rags about "Bat Boy" or whatever.

I hate to see the internet go, but we'll always have Paris.


>destroy all trust in the web

Genuine question - how so? If I want to find stuff out I go to wikipedia, nyt, guardian, hn, linked sites and so on. I'm not aware of that lot being noticeably less trustable than in the past? If anything I find getting information more trustable than before in that there are a lot of long from interviews from all sorts of people on youtube where you can get their thoughts directly rather than editorialised and distorted.

I mean the web was never a place where things were vetted - you've always been able to put any sort of rubbish on it and so have had to be selective if you want accuracy.


Allow me quote their "prophet" Curtis Yarvin: "you can’t continue to have a Harvard or a New York Times past since perhaps the start of April." (https://www.theguardian.com/us-news/2024/dec/21/curtis-yarvi...)

Harvard's already under attack, Politico's already under attack, "Wokepedia" (as Musk has been calling it) is already under attack.

So... give it a couple of weeks from now.


Hype is if it doesn't deliver or it's overblown.

But I'm amazed on the progress we make every week.

There is real FOMO because if you don't follow it, it just might be here suddenly.

Deepseek impressive, deep research also great.

And what you might complete underestimate: we never had a system we're it was worth it to teach it everything.

If we need to fine-tune LLMs for every single industry that would still be a gigantic shift. Instead of teaching a million employees we will teach an LLM all of it once then clone the agent a million times

We still see so much progress and there is still plenty of money and people available to flow into this space.

There is not a single indication right now that this progress is stoping or slowing down.

And not only that, in parallel robots are having their break through too.

Your musk point I do not understand really? He is a narcissist and he pushed his propaganda platform for becoming president because he is in big shit and his house of cards was close to crashing


AI companies desperately want to be regulated. OpenAI is lobbying hard to be regulated.

Didn't call for it. The whole point is to keep it competition with regulation against mythical vague harms that don't exist.


Expect nothing but the best with….JBEE SPY TEAM , just yesterday I was here looking for who to help me get into my spouse phone fully, then I saw 2-3 reviews talking about how JBEE SPY TEAM helped them. So immediately I contact JBEE SPY TEAM on their instagram page and told them what I needed, then WOW! the outcome was great all access was given to my greatest experience ever, They got me into my cheating husband phone. I’m so happy I went to them. and the fact that it was all done remotely is the big deal. JBEE SPY TEAM is the main man or email : conleyjbeespy606@gmail.com

reductionist view can be applied to what we call "thinking" and "intelligence" too. When I'm asked a question, my brain is also just picking a suitable sequence of words based on my experience (training). Talking is what we consider part of thinking, something only "intelligent" creatures can do.

Just feels like a lot of coping from people that don't want to let go of our concept of "intelligence superiority" or w/e you want to call it.

The end game of this will be them wild-eyed in front of a string-crossed cork board, claiming they've found the one thing human brains can do that AI can't, so it's not thinking it's just x,y,z.


“ When we write, we share the way that we think. When we read, we get a glimpse of another mind. But when an LLM is the author, there is no mind there for a reader to glimpse.” — I dunno, I feel like reading is more a glimpse into how I think than how the author thinks…a generated story can be just as moving as one from a human, I think.

Finally my grade has been change and increased by JBEE SPY TEAM without any issues or notice from my school website the fact that they hack into my school website and change it without no trace shows that they’re professional indeed, so fast and accurate I will be always be grateful to JBEE SPY TEAM, I strongly recommend their job to anyone you can reach them on their telegram +44 7456 058620 or Instagram JBEE SPY TEAM email conleyjbeespy606@gmail.com

No comments on the course, but the title (in classic HN fashion) - how I survive in a ChatGPT world: it’s simple. I consume as little recent content as possible and I aim for the bulk of that to be “raw” information: various forms of data and the minimum set of news from wire services. Every time I dare stick my head out further I drown in a deluge of generated sewage. Blogs are dead, social media is dead, forums are dead, the media is dead, text is dead, video is dead and photos are dead. The only unpoisoned wells left in the land are old books and the elderly.

> We (the authors of this website) have at times sought insight into the inner workings of an LLM by asking it “why did you just do that?”

> But the LLM can’t tell us. It’s not a person. It doesn’t have the metacognitive abilities necessary to reflect on its past actions and report the motivations underlying them*.

> With no clue why it did whatever it just did, the LLM is forced to guess wildly at a plausible explanation, like the ill-fated Leonard Shelby in Christopher Nolan's film Memento.

> And we, gullible humans that we are, often believe its bullshit.

---

I am almost convinced that we ourselves are a narrator riding along inside an animal's mind, trying desperately to put together explanations for our actions, mostly just to convince others, just as though an LLM were running on our own senses trying to portray some deep semblance of consciousness. I don't think we'll find a super smart AI, we'll just realize we were not very sophisticated all along. The power of speech for information and culture transfer, writing, inspiration, and coordination is just awe inspiring, evolutionary speaking, so once we could talk we had to, because the "better" talker almost always won. It's an arms race.


> I am almost convinced that we ourselves are a narrator riding along inside an animal's mind

I think this is one of those "some truth, but not the whole truth" things. Yes, we trick ourselves, such as with mis-remembered reflex actions: "I felt it, it hurt, therefore I decided to move", even though the nerve-impulse speeds means your limb was moving before your brain even knew about the pain.

But the "narration" seems to be very important. We create stories to capture cause-and-effect about the world (unclear how much that requires language) and it seems to be beneficially adaptive. In fact this drive is so important that we do it even when we abstractly know it's wrong, like when flipping 50/50 coins and imagining a particular coin is luckier than another, or that you're on a "hot streak", or "now that other outcome is overdue."


Related, since the advent of LLMs I've become acutely aware how any argument of a considerable length with another person quickly starts meandering and how topics change seemingly of no one's volition - almost as if our own internal token limit has been exceeded.

Tangentially related, I became aware how my (in)ability to reason 3 intertwined different programming languages across different files can be conveniently called as my own "context window". (the example here is HTML/CSS/JS where LLM greatly exceeds my own capacity).

> I am almost convinced that we ourselves are a narrator riding along inside an animal's mind, trying desperately to put together explanations for our action

This is more or less the position of Daniel Dennett.

Also, it’s the premise of the one of Greg Egan’s (IMO) best short stories, Mr. Volition.


I am completely convinced that what we call consciousness is as you say.

this means that it really exists in retrospect (20ms ? i recall some neuroscience articles from the 00s). nonetheless its whole reason for existing (retrospectively) is planning the future


What is scientific about this?

Saying that LLMs are just guessing the next tokens therefore they are parrots, that doesn't bring anything to the table. You might as well say that humans guess the next key they type on their keyboards. Both are probably true, at some level, but you don't gain any insight from it. You also didn't say anything about the probability distribution of the guesswork, so you didn't say if the guessing was smart guessing or dumb guessing. To be honest, I think this view is just a way of sticking your head in the sand about the upcoming technological revolution.


My 80s copy of the encyclopedia Britannica was riddled with errors, perhaps we will survive this post truth

The situation is closer to if we had 10,000 variants of Encyclopedia Britannica in 80s that all looked like distinct bodies of work, riddled with different errors while looking like they were written from scratch.

What is the difference to the end user? Our situation is better than it was yesterday not worse. If we had a genie who could appear out of nowhere and tell us the truth TM at any point that would kind of ruin the adventure .

Who reads the output of a book or Wikipedia or a website or an AI and thinks oh good now I know the core truth of this thing and I never need to update this knowledge ever again case closed


There is a bit of very important content missing from the explanation of the autocomplete analogy.

The combination of encoding / tokenization of meanings and ideas, related concepts, and mapping these relationships in vector space makes LLMs not so much glorified text prediction engines as browsers/oracles of the sum total of cultural-linguistic knowledge as captured in the training corpus.

Understanding how the implicit and explicit linguistic, memetic, and cultural context is integrated into the idea/concept/text prediction engine helps to show how LLMs produce such convincing output and why they often can bring useful information to the table.

More importantly, understanding this holistically can help people to predict where the output that LLMs can generate will -not- be particularly useful or even may be wildly misleading.


What they capture is not knowledge, it's word relationships.

And that can indeed be powerful, useful and valuable. They're a tool I'm grateful to have in my armoury. I can use it as a torch to shine light into areas of human knowledge which would otherwise be prohibitively difficult to access.

But they're information retrieval machines, not knowledge engines.


I’d argue that they extract knowledge from the training corpus in the same way that knowledge can be encapsulated in a book… it’s just words, after all.

Tokenization goes well beyond words and punctuation. Knowledge and relationships between concepts, reactions, emotions, values, attitudes, and actions all get included in the vector space.

But, it also can come to wrong conclusions, of course.

Ultimately they are information extraction engines that are controlled by semantic search.

They aren’t smart.

But it turns out that in the same way that an infinitely sized and detailed choose-your-own-adventure book at 120 pages per second could be indistinguishable from a simulation of reality, the free traversal of the entirety of the wealth of human culture and knowledge is similarly difficult to distinguish from intelligence.

In the end it may boil down to the simulation vs reality argument.


Yes and no.

They extract information in much the same way that an educated but naive reader can extract information from a book. (Thousands of times quicker of course).

But there's a lot more than that going on, both when a book is written, and when it's read by a reader with life experience. A book is an encoding and transmission medium for knowledge - and a very good one - but it isn't the knowledge itself.

Like a musical score for an orchestral symphony isn't the symphony itself. (Granted, reading a score and synthesizing an orchestra is well within the grasp of the models we have now).

Poetry is perhaps the ultimate expression of this, but even at a more factual level - I could read a dozen books on a given religion, and although I might possess more in terms of historical fact or even theological argument, I'd still know less about it than somebody who was raised in that religion. Same with any profession, hobby, or craft.

Encoding the relationships between the words we use for different emotions in a vector space doesn't mean it knows the least thing about those emotions. Even though it can do an excellent job of convincing us that it does in a Turing test scenario.


It’s also why they can produce such hard to identify bullshit and harmful output. I’ve had some really convincing, yet fundamentally flawed, code output that if I hadn’t done about a million code reviews before I might have just used.

And been totally screwed later.

Near as I can tell, that the bullshit is so much more convincing with them is a huge detriment that society really won’t learn to appreciate until it’s gotten really bad. As I noted in another thread, it allows people to get much further into the ‘fake it until you make it’ hole than they otherwise would.

That 90% of the time it’s fine is what actually makes it all worse.


This is the big pain point to be sure. Subtly wrong but mostly excellent results.

The uncanny valley of competence.

Overall this is very good, but I have one specific note: Lesson 6 says "LLMs aren't conscious."

I think I get what you're saying there - they are not conscious in the same way that humans are - but "consciousness" is a highly-debated term without a precise definition, and correspondingly philosophers have no consensus on whether machines in general are capable of it. Here is one of my favorite resources on that, the Stanford Encyclopedia of Philosophy's page on the Chinese Room Argument:

https://plato.stanford.edu/entries/chinese-room/

Things that appear conscious, or that appear to understand a language, are very hard to distinguish from things that actually are those respective things.

Again, I think I get the intended point - some people interact with ChatGPT and "feel" there is another person on the other side, someone that experiences the world like them. There isn't. That is good to point out. But that doesn't mean machines in general and LLMs specifically can't be conscious in some other manner, just like insects aren't conscious like us, but might be in their own way.

Overall I think the general claim "LLMs aren't conscious" is debatable on a philosophical level, so I'd suggest either defining things more concretely or leaving it out.


Philosophy aside - how can an LLM be conscious without a memory or manifestation in the real world? It is a function that, given an input, returns an output and stops existing afterwards. You wouldn't argue that f(x)=x^2 is conscious?

I would maybe accept debates about whether for example ChatGPT (the whole system that stores old conversations and sends the history along with the current user entry) is conscious - but just the model? Isn't that like saying the human brain (just the organ lying on a table) is conscious?


It is true that human consciousness is continuous over time, but maybe some animals have very little of that?

Or, to look at it like Black Mirror, if you upload your consciousness into a machine, are you not conscious if it pauses the simulation for a moment? Perhaps you would have no memory of that time (like in Severance), but you could still be conscious at other times.

I do agree that a model at rest, just sitting on a hard drive, doesn't seem capable of consciousness. I also agree x^2 is not conscious. But the problem, philosophically, is actually separating those cases from things we know are conscious. The point of Searle's Chinese Room theorem is that he thinks no machine - not x^2, not a super-AI that passes the Turing Test - truly "thinks" (experiences, understands, feels, is conscious). But that position seems really hard to defend, even if it gives the "right" answer for x^2.


There's a great exploration of this concept in Permutation City, a science fiction novel by Greg Egan. In the book, a deterministic human brain is simulated (perfectly) in random-access order. This thought experiment addresses all three of your arguments.

I don't see why something that doesn't exist some of the time inherently couldn't be conscious. Saying that something's output is a function of its inputs also doesn't seem to preclude consciousness. Some humans don't have persistent memory, and all humans (so far) don't exist for 99.99999999% of history.

I'm not trying to claim a particular definition of consciousness, but I find the counterarguments you're presenting uncompelling.


I think the authors misunderstand what's actually going on.

I think this is the crux:

>They are vastly more powerful than what you get on an iPhone, but the principle is similar.

This analogy is bad.

It is true that the _training objective_ of LLMs during pretraining might be next token prediction, but that doesn't mean that 'your phone's autocomplete' is a good analogy, because systems can develop far beyond what their training objective might suggest.

Literally humans, optimized to spread their genes, have developed much higher level faculties than you might naively guess from the simplicity of the optimisation objective.

If the behavior of top LLMs didn't convince you of this, they clearly develop much more powerful internal representations than an autocomplete does, are much more capable etc.

I would point to papers like Othello-gpt, or lines of work on mechanistic interpretability, by Anthropic, and others, as very compelling evidence.

I think that, contrary to the authors, using words like 'understand' and 'think' for these systems is much more helpful than to conceptualise them as autocomplete.

The irony is that many people are autocompleting from the training objective to the limits of the system; or from generally being right by calling BS on AI, to concluding it's right to call BS here.


The sys prompt given to run the turing test (from https://arxiv.org/pdf/2405.08007) actually works well. I'm honestly not sure I'd be able to tell (unless I test it adversarially e.g ignore the prompt, write a poem etc)

Curiously, this is the same point we were at in the 60s: https://en.wikipedia.org/wiki/ELIZA

That's an interesting Altman quote on the site. LLMs cannot be compared to electricity and the Internet. People wanted those. LLMs were an impressive parlor trick at first but disappointing later. Many stopped using them altogether.

Now there is a president who fuels the hype, shakes down rich countries for "AI" investments. The Saudi prince who lost money on Twitter is in for the new grift and praises Musk on Tucker Carlson.

The grift-oriented economy might continue with the bailout of Bitcoin whales through the "sovereign wealth fund" scheme.

That is how the "economy" works. No houses will be built and nothing of value will be created.


People have stopped using LLMs? I wasn't aware of that. Can you share a source for that?

Anecdotal, but this is the exact consensus I saw among my non-tech peers. They find it fun for a few days or weeks, then basically never touch it again once the novelty wears off. The only normies I know still using LLMs are students using them to write papers.

I know a lot of people who went through the "Oh, wow - wait a minute..." cycle. Including me.

They're approximately useful in some contexts. But those contexts are limited. And if there are facts or code involved, both require manual confirmation.

They're ideal for bullshit jobs - low-stakes corporate makework, such as mediocre ad copy and generic reports that no one is ever going to read.


> And if there are facts or code involved, both require manual confirmation.

The hidden assumption here seems to be that the model needs to be perfect before it has utility.


Also hidden assumption, or perhaps lack of clear perception of reality, that most jobs on the market are strongly dependent on factual correctness.

Also assumption that this is any different than human relationship with empirical truth is.


Clearly generative AI can currently only be used when verification is easy. A good example is software. Not sure why you think that I claimed otherwise.

Now you're the bullshit machine. No one said that. We expect basic reliability/reproducibility. A $4 drugstore calculator has that to about a dozen 9s, every single time. These machines will give you a correct answer and walk it right back if you respond the "wrong" way. They're not just wrong a lot of the time, they simply have no idea even when they're right. Your strawman is of no value here.

In Similarweb's list of top websites chatgpt.com is now at no 6 above x/twitter and yahoo

US iPhone apps the top two are deepseek and chatgpt

That doesn't really say people have stopped using LLMs

https://x.com/Similarweb/status/1888599585582370832


Read the whole course. There's a great amount of sourced case studies of people using AI in here with societal pushback which I find interesting to pull from.

Disappointed though that the answer is so nuanced. There aren't hard and fast rules to when/when not to use AI but a set of 18 or so proposed principles that should guide are usage. And defense for those principles. The principles are at the bottom of each chapter.

Also learned about the Eliza Effect as a term and that I found the passage in Ch14 by Ted Chiang to be really insightful, from a general social perspective.

> When someone says “I’m sorry” to you, it doesn’t matter that other people have said sorry in the past; it doesn’t matter that “I’m sorry” is a string of text that is statistically unremarkable. If someone is being sincere, their apology is valuable and meaningful, even though apologies have previously been uttered.


Fantastic course and website, thank you professors for this valuable contribution.

I would ask for exercises and practices attached to the course that one could do to penetrate the BS - this would be invaluable as we increasingly get bombarded by ai-generated media. Indeed what is learning and education except critical thinking skills ? An un-intended consequence of LLMs is that they have triggered this conversation.


Thanks for making this, for making it gratis, and making it interesting to read and pedagogical.

> Large language models are both powerful tools, and mindless—even dangerous—bullshit machines.

Yes, but so is the average person. Do you cover the similarities/parallels to common human behavior patterns in your course? It stands out to me as a major blind spot in a lot of the discourse about LLMs, down to willing acceptance of humans intuiting what someone else meant when the words themselves were fundamentally ambiguous, which is extremely akin to a hallucination chosen from the listener's language model.


Yes but people have culpability and responsibility. In places of power or influence, this culpability can lead to being fired, legal action, disbarment, loss of money, etc. so there is a real pressure to be coherent and aligned with reality.

You say this but I have a ton of experience that "can lead" and "there is" come with a gigantic pile of caveats to the extent that they appear to be more false than true. Or they're technically true but practically meaningless. The world has been utterly awash in mass-perpetuated misinformation for at least all of recorded history without any real ability to stem the onslaught. This is not a modern problem just because a modern technology also exhibits it.

You should look up the percentage of Americans who believe in ghosts sometime. About as many people believe in ghosts as don't, so no matter which side you land on, the other side is enormous. One of the sides must be wrong, I won't claim which, though only the belief side fails a falsifiability check. Where's our accountability to believing and spreading ideas based in reality again? The believers believe because they learned about it from someone. It didn't happen spontaneously on its own.

It's all just been memes the entire time.


This is a sidestep imo. People _can_ be held accountable, though they will not always be. Machines add a layer of complexity - money is lost or a life is lost because AI made the call, who bears the burden? Machines _can't_ be held accountable.

I hear you, but I think it becomes less of a sidestep when "they will not always be" is in practice "they basically never are".

And I'm not sure that most interactions even _can_ be held to account. When someone, say, hallucinates the intended meaning of something written ambiguously that has no unambiguous meaning, and I point out the hallucinatedness of their assumed meaning when what was written was definitively ambiguous, we've all just said words. There's no, like, penalty for anyone.

And people do say things ambiguously and other people do hallucinate the supposed meaning literally all the time. If there were any meaningful accountability for hallucinations, it wouldn't happen nearly as often as it does.


The key distinctions are: we can hold people accountable, and the amount of shit produced by one person is limited. Neither of those are true for LLMs.

You wrote basically the same thing as the adjacent person, so rather than write the same response to you both, I'll redirect to my reply over there:

https://news.ycombinator.com/item?id=42993981


The recent book Unaccountability Machine (Dan Davies) made the rounds a while back on this site a few years back. A few years before that, Ted Chiang's essay in the New Yorker entitled "Will AI Become the New McKinsey?" did likewise.

There's a bright red through line here. I get the sense that the intellectual ferment is starting to develop an awareness of the risk (and if you're cynical, potential) of LLM deployments in business as a systematic strategy for absorbing accountability for decisions.

For the time being, it usually seems like there's someone who is accountable, or at least can be scapegoated. But how long will that last? As Davies points out in his book, we didn't need LLMs to create bureaucracies where the buck fails to stop anywhere and instead irretrievably slides between the tracks. As Chiang points out, the "efficiency maximization" of McKinsey served as a way for organizations to outsource accountability for major decisions to an entity very, very good at working backwards from desired outcomes while acting like they were just led to a fait accompli by the numbers.

[0] https://libro.fm/audiobooks/9781805220794-the-unaccountabili...

[1] https://www.newyorker.com/science/annals-of-artificial-intel...


I like the term "bullshit" over "hallucination". These AI machines have no perceptions, no concept of truth, so indeed they are just spewing out words with no regard for truth.

And unfortunately the cost of spreading bullshit has gone down to almost 0.


So why are they SOTA translators? Would you consider old translation software bullshit generators? Because LLMs can do their job, and more.

Those who sell these things as almighty powerful AI, so powerful and dangerous that it should be regulated (so my competition can't keep up with me), are responsible for this overreaction. Yes, not everything LLMs produce is bullshit but enough is, contrary to how they were marketed, that people are reacting this way.

In other words, you don't counter hyperbolic and downright false marketing with subtleties.


I dislike the term bullshit because it's use regarding ChatGPT does not match the dictionary version "stupid or untrue talk or writing; nonsense".

If your sentence "LLMs output is bullshit" is wrong you may be better off changing the sentence than rewriting the dictionary to fit your sentence.

I mean you can redefine words if you like, like how young people use sick and bad to mean much the opposite of what they did, which is fine as a fashion statement, but in trying to reason about LLMs it muddies the reasoning. Which of course is often why academics do it - see Hobbes, Calvin 1993 https://www.reddit.com/r/calvinandhobbes/comments/1300k80/ac...


I think a hallucinated sentence embedded in a paragraph of truth fits the definition: stupid and nonsense. A bullshitter can be right sometimes or even most of the time. They are still a bullshitter.

Everyone gets things wrong sometimes.

I once overheard a parent telling a kid that "whale blubber" was actually whale farts, and that all the people that used to kill whales did it to get their farts and how silly that was. Of course that's not at all true, but that kid believed it. I felt sorry for the kid being told such absurd things by someone they trusted.

I have to wonder if I ask an LLM enough times, if it would give an answer about whale blubber that involved whale farts. That parent may have read it or heard it somewhere else, and the LLMs may also have that disinformation.

"Garbage in, garbage out" is definitely something humans and LLMs share in common.


consider words can have multiple definitions

The cost of inference seems like a major barrier in making these things work commercially. If you put one in the user facing flow you'll need an extraordinary amount of compute that scales extremely poorly in number of users, given that each query takes O(nm^2) where m is the number of model weights (many) and n is the number of tokens in the query. So it seems clear if each user session implies multiple LLM queries, those user sessions had better be exceptionally valuable. This seems like a completely different business with very different constraints and margins.

The problem is it doesn't seem like the value added is all that great. So how do you justify the ruinous cost? I would like to know more about how people actually intend on using these things profitably than to hear about how they're going to wake up and become "intelligent". Does anyone have any success stories actually using this thing? It's hard to find any discussion of this amidst all the bullshit hype, and it's really the only question--can you make money with this or not? And I don't mean raising billions in venture capital, I mean can you integrate these things profitably and sustainably into a website?


The costs are a problem. We don't have hard evidence that this will be solved, but with algorithmic efficiency and raw compute costs both changing rapidly, the cost per token has gone down by about a factor of 10 per year for the last 3 years, i.e. 1000x over 3 years.

As far as I can tell, those charts are merely describing the price per token that LLM hosting companies are charging, not what running the model actually costs. The distinction is important for two reasons:

1. These companies are heavily subsidized by huge amounts of venture investment

2. If I'm integrating this technology into my web product there's absolutely no way I'll be adding a 3rd party company as a dependency. This is all way too new and bubbly to trust any of the current offerings will still exist in O(years).

Are there any similar studies showing not sticker price but actual compute/performance decrease?


This is incorrect, there are real effiency gains.

Slightly old by the standards of this field, but a good overview: https://arxiv.org/abs/2403.05812


I'm sorry, what is incorrect? And the paper you linked appears to be about training, not inference? I wouldn't train a model in a user's session, instead I'd run the model. That's the cost that seems a blocker, all the models are already trained, I dont need to invest a dime in that.

The real costs are dropping because of real efficiency gains and compute cost reductions.

The inference costs are dropping at a similar rate.


[citation needed]


This is getting awfully tedious. Are you trolling or do you genuinely think responding to a question about actual compute cost trends with a venture fund's puff piece about sticker price trends is helpful?

You are correct to note 'real' costs of the leading labs are not public. It is surely true that the labs are operating at below cost (we are definitely not paying for the full R&D), but it seems unlikely that this fully explains the reduction in inference costs over the last years. We also know from open models like deepseek that the cost per inference token at a fixed performance level is going down very quickly matching the curve of leading labs inference cost decreases. You can even test it yourself on your own pc if you want.

I would add that inference cost decreases is what we should expect, it stands to reason there will be algorithmic improvements in inference, and compute cost is still going down just because of (a somewhat sloped) Moore's law.

Maybe you could also be a bit friendlier and forthcoming in your responses.


I apologize for my tone. It's just very frustrating to ask a question about applying this technology in the real world, to actual commercial products, only to get reply after reply of hopes and dreams. 20 years ago Ray Kurzweil promised me I'd have artificial hemoglobin that allows me to hold my breath underwater for an hour. Where is it? These are the arguments of grifters and conmen: "just wait look at the exponential growth!" No. I refuse. If this technique doesn't work right now then it simply doesn't work and we're all (except researchers and companies developing the core technology) wasting immense amounts of time and money thinking about it.

> After talking to literally hundreds of educators, employers, researchers, and policymakers, we have spent the last eight months developing the course on large language models (LLMs) that we think every college freshman needs to take.

Did you consider consulting with any college freshman or even college students? I know you're supposed to be guiding their education, but I think it's also good to check in and see what they care about learning.


Yes—I should have stressed that. More than anything, we've talked at great lengths about LLMs with over a thousand undergraduate students who we have taught in our courses since ChapGPT 3.5 launched in Nov 22.

It is _very_ unfortunate there is no PDF version.

very well done. Thank you.

btw, typo in lesson 11: "(2) understand how an LMM can help them" .. instead of LLM

IMO, "understanding" is the notion most endangered and the one which loss is with most catastrophic consequences, from all the possibly affected ones. It has never been very favored, but now is even worse than ever - it is thinned and abandoned en-masse.

Check brothers Strugatsky's Snail_on_the_Slope [0] , there's very stringent monologue of Peretz on the topic, ~~ page 11 [1]

[0] https://en.wikipedia.org/wiki/Snail_on_the_Slope

[1] https://strugacki.ru/book_19/768.html


> Moreover, a hallucination is a pathology. It's something that happens when systems are not working properly.

> When an LLM fabricates a falsehood, that is not a malfunction at all. The machine is doing exactly what it has been designed to do: guess, and sound confident while doing it.

> When LLMs get things wrong they aren't hallucinating. They are bullshitting.

Very important distinction and again, shows the marketing bias to make these systems seem different than they are.


If we want to be pedantic about language, they aren't bullshitting. Bulshitting implies an intent to deceive, whereas LLMs are simply trying their best to predict text. Nobody gains anything from using terms closely related to human agency and intentions.

Plenty of human bullshitters have no intent to deceive. They just state conjecture with confidence.

The authors of this website have published one of the famous books on the topic[0] (along with a course), and their definition is as follows:

"Bullshit involves language, statistical figures, data graphics, and other forms of presentation intended to persuade by impressing and overwhelming a reader or listener, with a blatant disregard for truth and logical coherence."

It does not imply an intent to deceive, just disregard for whether the BS is truth or not. In this case, I see how the definition can apply to LLMs in the sense that they are just doing their best to predict the most likely response.

If you provided them with training data where the majority inputs agree on a common misconception, they will output similar content as well.

[0]: https://www.callingbullshit.org/


The authors have a specific definition of bullshit that they contrast with lying. In their definition, lying involves intent to deceive; bullshitting involves not caring if you’re deceiving.

Lesson 2, The Nature of Bullshit: “BULLSHIT involves language or other forms of communication intended to appear authoritative or persuasive without regard to its actual truth or logical consistency.”


> implies an intent to deceive

Not necessarily, see H.G Frankfurt "On Bullshit"


LLMs are always bullshitting, even when they get things right, as they simply do not have any concept of truthfulness.

They don't have any concept of falsehood either, so this is very different from a human making things up with the knowledge that they may be wrong.

I think the first part of that statement requires more evidence or argumentation, especially since models have shown the ability to practice deception. (you are right that they don't _always_ know what they know)

But sometimes when humans make things up they also don't have the knowledge they may be wrong. It's like the reference to "known unknowns" and "unknown unknowns". Or Dunning-Kruger personified. Basically you have three categories:

(1) Liars know something is false and have an intent to deceive (LLMs don't do this) (2) Bullshitters may not know/care whether something is false, but they are aware they don't know (3) Bullshitters may not know something is false, because they don't know all the things they don't know

Do LLMs fit better in (2) or (3)? Or both?


But you can combine them with something producing truth such as a theorem prover.

If you make an LLM which design goal is to state "I do not know" any answer that is not directly in its training set, then all of the above statements don't hold.

Is there a way to download and read this as a document instead of web pages? They're hard to navigate.

Many comments about this, so I'll address them here.

We talked extensively with the 18-20 year olds who make up our target demographic and this "scrollytelling" style is their strong preference over the "wall of text" that I and most of my generation prefer.

What your comments make clear is that we need to develop a parallel version that is more less plain text for people who are using a range of devices, for people who have the same reading preferences that I do, etc.

Right now we're entirely self-funded and doing this on spare time but it's clear to me that an alternative version with a very clean CSS layout is the way to go, possibly with a pdf option as well.

I don't want to let versions proliferate too extensively, simply because this is very much a living document. Technologies are changing so fast in this area that many of the examples will seem dated in a year and — while we've tried to be forward-thinknig about this — some of principles may even need revision.


I agree, I just want normal text instead of all the images and scrolling. The content seems great but it's a bit unreadable as is.

Was hoping HN would pick this up. Scroll is completely broken on Firefox (iOS), flickering vscroll. Very common with journalistic expose-style articles.

For the love of everything, please stop scrolljacking. Layout, images, go nuts. CSS is powerful these days, use it.


Was thinking the same, the image slide-ins are broken in firefox and unreadable (white text on white background)

I'm surprised at the firefox problems; I did almost all the development in firefox. I know it's not your job to fix any of this, but if you are so inclined I'd be grateful for an email me with screenshots or descriptions of where things break.

If you were a book recommender system, what other books are similar to yours? I've read plenty of science/maths light read non-fiction, so I want to compare the reading experience before I jump into the book.

I'm sorry, but this website is awful. Not only does it have an illogical structure (table of contents at the end? no "next lesson" button? gigantic images that fill the entire screen?), but the aesthetic of the entire thing is off. It tries to be sleek and modern with scrolling animations, but they are janky and rigid and the images are rectangles put in front of a bad gradient. Not to mention the video interviews are badly produced (clipping audio, interviewer doesn't have a dedicated microphone) and it's not even clear why they're there.

Please, take inspiration from actual e-learning platforms.


Totally agree here. I visited the page and scrolled through it to see what it was all about and saw a bunch of pull quotes and couldn't work out what I was looking at. It just looks like a light magazine article or brochure until you either click on the hamburger menu to see a full table of contents or scroll to the very bottom to see the table of contents there.

This really needs some design improvements if they want people to read through to the actual lessons. Most people are going to drop off after scrolling through that landing page.


I agree. I tried the first chapter with the Reader Mode in FireFox, and the whole long scroll hell collapsed to about one screenful of text. I have a feeling it skipped some text, but the result was a quick read that got the main points through.

I wish the whole thing was available in a plain text format, preferably in one longer document.


I would like to read this but the jerkiness of needing to scroll 1 page per paragraph renders this unusable

I am nearly 60, and am excited to take your course! I congratulate you in finding a compelling way to teach Humanities that is relevant to today's society. Bravo.


I wish I'd written this. Excellent. Everyone should read this

This will not age well.

Yesterday my bullshit machine wrote a linker argument parser to hook a C++ library up in a Rust build config. Oh it also wrote tests for it. https://chatgpt.com/share/67a89e5f-b5b4-8011-9782-472d469cc2...


You asked it to do a task with probably many examples. This course will probably tell you that will work fine. Don't see your point here.

Yesterday my 30 year old photocopier wrote a Shakespeare drama! All I had to was scan the original pages.

Did your photocopier also write a drama in the style of Shakespeare based on some news article you gave it?

[flagged]


Allowing a parrot to iterate on given examples and generate a similar one with the information baked in their weights does not invalidate "Stochastic Parrot" take. On the contrary, it proves it.

LLMs are statistical machines. The catch is you feed it hundreds of terabytes of valid information, so it asymptotically generates valid information as a result of this statistical bias.

Even yet, they can hallucinate so badly. I mean, the same OpenAI model claimed that I'm a footballer, a goal keeper in fact.

Stochastic parrot, yes. On LSD, very yes.


It's clearly true that the LLMS are 'stochastic parrots', but for all we know that might be the key to intelligence. It is in itself not a deep observation any more than calling your fellow humans 'microbial meatbags'.

Saying that LLMs are stochastic machines does not establish an upper bound for success.


The thing is, this assumption of LLMs might be intelligent lies in the assumption is intelligence is enabled solely by the brain.

However, as the science improves, we understand more and more that brain is just part of a much bigger network, and its size or surface roughness might not be the only thing determines the level of intelligence.

Also, all living things have processes which allows constant input from their surroundings and they also have closed feedback loops which constantly change and tweak things. Call these hormones, emotions or self-reflection or whatnot.

We the scientists love to play god with the information we have at hand, yet we constantly humbled by the nature by experiencing the shallowness of what we know. Because of that I, as a CS Ph.D., am not so keen on to jump to that bandwagon which claims that we invented silicon brains.

They are arguably useful automatons built on dubious data obtained in ethical gray areas. We're just starting to see what we did, and we have a long way to go.

So, a living parrot might be more intelligent than these stochastic parrots. I'll stay on the cautious critics wagon for now.


> So, a living parrot might be more intelligent than these stochastic parrots.

Or the other way around.


I don't think you ever observed a Cuckatoo...

We are not stochastic parrots. Old components of our brain help “ground” our thoughts and allow things like doubt or a gut feeling to develop which means we can question ourselves in ways an LLM cannot.

thank you for presenting this in digestible format.

fortunately unfortunately, people who know llm's shill bullshit are the ones selling llm's while they wouldn't feed llm's to their kids or eat llm's.


"The LLMs have no ground truth" claim (around chapter 2) that's core to the "bullshit machines" argument is itself wrong. Of course LLMs have ground truth. What do the authors think here, that the text in training corpus is random?

Hint: it isn't. Real conversations are anything but random. There's a lot of information hidden in "statistical ordering of the words", because the distribution is not arbitrary.

Statistical ground truth isn't any worse than explicitly given one. In fact, fundamentally, there only ever is statistical certainty. Realizing it is a pretty profound insight, and you'd think it should be table stakes at least in STEM, but sonehow it fails to spread as much as it should.


Statistical truths based on observation of reality is the basis for science. Statistical truth based on the text on the internet is the basis for something else, and I would personally not like to call whatever that is science, or any truths established this way "ground truths".

Statistical truths based on observation of reality is the basis for incremental additions science. Statistical truths based on what you read in the textbook is what actually is almost all science, to everyone, at almost all times.

There's few things in our lives any of us actually learned first-hand, empirically. Everything else we learned the same way LLMs did - as what is expected to be the right completion to a question or inquiry.

The objective truth as we experience is things that, were we to make a prediction conditioned on them, our prediction would turn out correct. That doesn't mean we actually make such predictions often, or that we ever made such predictions learning it.


> personally not like to call whatever that is science

A counterargument written pre-LLMs: https://arxiv.org/abs/1104.5466


What year did the Normans invade England?

What's Newton's second law?

Who was the last czar of Russia?

How many moons does Jupiter have?

I bet you "know" a lot of those facts not because you have observed them empirically, but because you read about them in books. And in fact, almost all scientists rely on reading for nearly everything they know about science, including nearly everything they know about their own specialties. Nobody has time to derive all of their knowledge of the world from personal observation, and most people who can read have probably learned almost all "facts" they know about the world from books.


Yup, and importantly, the correct answer to those is, in anyone's personal learning experience, almost always just what the authority figures in their lives (parents, teachers, peers they respect, or textbooks themselves) would accept as the correct answer!

"Consensus reality" works so well for us most of the time, that we habitually "substitite the Moon for the finger pointing at the Moon" without realizing it.


There's obviously nothing wrong with learning by reading, but the way you tell whether what you read is true is by seeing whether or not it fits in with observation of reality. That's the reason we're no longer reading the books about phlogiston.

> the way you tell whether what you read is true is by seeing whether or not it fits in with observation of reality

The only way any of us ever gets to see "whether or not it fits in with observation of reality" is to see if they get an A or F on the test asking it.

Seriously.

The "moons of Jupiter" question is the only one of the above one gets to connect to an observation independent of humans, and then if they did, they'd be wrong, because you can't just count all the moons of Jupiter from your backyard with a DIY telescope. We know the correct answer only because some other people both built building-sized telescopes and had a bunch of car-sized telescopes thrown at Jupiter - and unless you had a chance to operate either, then for you the "correct answer" is what you read somewhere and that you expect other people to consider correct - this is the only criterion you have available.


Independently checking the information you read in textbooks is very difficult for sure. But it's still how we decide what's true and what's not true. If tomorrow a new moon was somehow orbiting Jupiter we'd say the textbooks were wrong, we wouldn't say the moon isn't there.

What? That's (1) not true and (2) says, uh, a lot of unintentional things about the way you approach the world. I'm not sure you realize quite how it makes you look.

For one, it's not even internally consistent -- the people who built telescopes and satellites didn't "see" the moons, either. They got back a bunch of electrical signals and interpreted it to mean something. This worldview essentially boils down to the old "brain in a jar" which is fun to think about at 3am when you're 21 and stoned, but it's not otherwise useful so we discard it.

For another, "how many moons does Jupiter have" doesn't have a correct answer, because it doesn't have an answer. There is no objective definition of what a "moon" is. There's not even a precise IAU technical definition. Jupiter has rings that are constantly changing, every single particle of those could be considered a moon if you want to get pedantic enough.

I'm always a bit shocked and disappointed with people when they go "well, you learn it on a test and that's how you know" because ...no, no that's not at all how it works. The most essential part of learning is knowing how we know and knowing how certain we are in that conclusion.

"Jupiter has 95 moons" is not a useful or true fact. "Jupiter has 95 named moons and thousands of smaller objects orbiting it and the International Astronomical Union has decided it's not naming any more of them." is both useful and predictive [0] because you know there isn't going to be any more unless something really wild happens.

[0] https://science.nasa.gov/jupiter/jupiter-moons/


> I'm not sure you realize quite how it makes you look.

I probably don't.

> For one, it's not even internally consistent -- the people who built telescopes and satellites didn't "see" the moons, either. They got back a bunch of electrical signals and interpreted it to mean something.

I'm not trying to go 100% reductionist here; I thought the point was clear. I was locking on the distinction between "learn from experience" vs. "learn from reading about it", and corresponding distinction of "test by predicting X" vs. "test by predicting other peoples' reactions to statements about X" - because that's the distinction TFA assumes we're on the "left side" of, LLMs on the "right", and I'm saying humans are actually on the same side as LLMs.

> This worldview essentially boils down to the old "brain in a jar" which is fun to think about at 3am when you're 21 and stoned, but it's not otherwise useful so we discard it.

Wait, what's wrong with this view? Wasn't exactly refuted in any way, despite proclamations by the more "embodied cognition" folks, whose beliefs are to me just a religion trying to retroactively fit to modern science to counter diminishing role of human soul at the center of it.

> I'm always a bit shocked and disappointed with people when they go "well, you learn it on a test and that's how you know" because ...no, no that's not at all how it works. The most essential part of learning is knowing how we know and knowing how certain we are in that conclusion.

My argument isn't simply "learn for the test and it's fine". I was myself the person who refused to learn "for the test" - but that doesn't change the fact that, in 99% of the cases, what I was doing is anticipating reaction of people (imaginary or otherwise) who hold accurate beliefs around the world, because it's not like I was able to test any of it empirically myself. And no, internal belief consistency is still text land, not hard empirical evidence land.

My point is to highlight that, for most of what we call today knowledge, which isn't tied to directly experiencing a phenomena in question, we're not learning in ways fundamentally different to what LLMs are doing. This isn't to say here that LLMs are learning it well or understanding it (for whatever one means by "understanding") - just that the whole line of arguing that "LLMs only learn from statistical patterns in text, unlike us, therefore can't understand" is wrong because 1) statistical patterns in text contain contain that knowledge, and 2) it's what we're learning it from as well.


> Wait, what's wrong with this view? Wasn't exactly refuted in any way, despite proclamations by the more "embodied cognition" folks, whose beliefs are to me just a religion trying to retroactively fit to modern science to counter diminishing role of human soul at the center of it.

It's unfalsifiable, that's what's wrong with it. Sure, you could be a brain in a jar experiencing a simulated world, but there's nothing useful about that worldview. If the world feels real, you might as well treat it like it is.

> My point is to highlight that, for most of what we call today knowledge, which isn't tied to directly experiencing a phenomena in question, we're not learning in ways fundamentally different to what LLMs are doing

I get what you're trying to say -- nobody can derive everything from first principles, which is true -- but your conclusion is absolutely not true. Humans don't credulously accept what we're given in a true/false binary and spit out derived facts.

All knowledge is an approximation. There is very little absolute truth. And we're good at dealing with that.

Humans learn by building up mental models of how systems work, understanding when those models apply and when they don't, understanding how much they can trust the model and understanding how to test conclusions if they aren't sure.

LLMs can't do any of that.


It's true that LLMs aren't trained on strings of random words, so in a sense you correct are that they have some "ground truth." They wouldn't generate anything logical at all if not. Does that even need to be stated though? You don't need AI to generate random words.

The more important point is, they aren't trained on only factual (or statistically certain) statements. That's the ground truth that's missing. It's easy to feed an LLM a bunch of text scraped from the internet. It's much harder to teach it how to separate fact from fiction. Even the best human minds that live or ever have lived have not been able to do that flawlessly. We've created machines that have a larger amount of memory than any human, much quicker recall, the ability to converse with vast numbers of people at once, but it performs at about par with humans in discerning fact from fiction.

That's my biggest concern about creating super powered artificial intelligence. It's super powers are only super in a couple dimensions and people mistake that for general intelligence. I came across someone online that really believed chatGPT was creating a custom diet plan tailored to their specific health needs, base on a few prompts. That is scary!


> Statistical ground truth isn't any worse than explicitly given one

There are multiple kinds of truths.

'Statistical truth' is at best 'consensus truth', and that's only when LLM doesn't hallucinate.


That's the only one that's available, though.

When a kid at school is being taught, say, Newton's laws of motion, or what happened in 476 CE, they're not experiencing the empirical truth about either. They're only learning the consensus truth, i.e. the correct answer to give to the teacher, so they get good grade instead of bad grade, and so their parents praise them instead of punishing them, etc.

This covers pretty much everything any human ever learns. Few are in position to learn any particular things experimentally. Few are in position to verify most of what they've learned experimentally afterwards.

We live in a "consensus reality", but that works out fine, because establishing consensus is hard, and one of the strongest consensus-forcing forces that exist is "close enough to actual reality".


I've heard about at least 4 theories of truth: Correspondence, Coherence, Consensus and Pragmatic (as described, for example, here https://commoncog.com/four-theories-of-truth/).

If we look at Newtonian mechanics, then various independently verifiable experiments are examples of Correspondence truth, and the minimal mathematical framework that describes them is an example of Coherence truth.


Fine, but it's not how any of us learned of it either - whether the Newtonian mechanics or the "4 theories of truth".

I mean, coherence is sure a an important aspect of truth, and just by paying attention whether it all "adds up" you can easily filter 90% of the bullshit you hear people (or LLMs for that matter) saying - but even there, I'm not a physicist, I don't do much experiments in a lab, so when I evaluate if some information is coherent with Newton's laws of motion, I'm actually evaluating some description of a situation against a description of Newton's laws. It's all done in "consensus space" and, if an answer is expected, the answer is also a "consensus space" one.

We're all so used to evaluating inputs and outputs through the lens of "is this something I expect others believe, and others expect me to believe", that we're almost always just mentally folding the indirection through "consensus reality" and feel like we're just checking "is this true". It works out okay, and it can't really be any other way - but we need to remember this is what we're doing.


So if I ask ChatGpt about bears and in the middle of explaining their diet it tells me something about how much they like porridge and in the middle of habitat it tells me they live in a quaint cabin in the woods, that's ... True?

Statistically we certainly have a lot of words about 3 bears and their love for porridge. That doesn't mean it's true, it just means it's statistically significant. If I asked someone a scientific question about bears and they told me about Goldilocks, id think it was bullshit.


If that were the case, then you are right.. However, the current crop of LLMs seem to be good at understanding the context.

A scientific data point about bears is unlikely to have Goldilocks in there (unless talking about evolution of life and Goldilocks zone). You can argue that there is meaning hidden in words that is not captured by words themselves in a given context - psychic knowledge as opposed to reasoned out knowledge. That is a philosophical debate.


Words don't carry meaning. Meaning exists in how words are or are not used together with other words. That is, in their.... statistical relationships to each other!

ChatGPT has enough dimensions in its latent space to represent and distinguish between the various meanings of porridge and is able to be informed by the Goldilocks story without switching to it mid-sentence.

It's actually a good example of what I have in mind by saying human text isn't random. The Goldilocks story may not be scientific, but it's still highly correlated with scientific truth about matters like food, bears, or the daily lives of people. Put yourself in the shoes of an alien trying to make heads or tails of that story, you'll see just how many things in it are not arbitrary.


Having a ground truth doesn't mean it does not make huge and glaring mistakes.

LLM's demonstrably don't do this, nor do they say that they live in the hundred acre woods and love honey. Unless you ask about a specific bear.

>Of course LLMs have ground truth.

It is my understanding that LLMs have no such thing, as empiric truth is weighted. For example, if Newton's laws are in conflict with another fact, the LLM will defer to the fact that it finds more probable in context. It will then require human resources to undo and unfold it's core error, else you receive bewildering and untrue remarks or outputs.


> For example, if Newton's laws are in conflict with another fact, the LLM will defer to the fact that it finds more probable in context.

Which is the correct thing to do. If such a context would be, for example, an explanation of an FTL drive in a science fiction story, both LLMs and humans would be correct to put Newton aside.

LLMs aren't Markov chains, they don't output naive word frequency based predictions. They build a high-dimensional statistical representation of entirety of their training data, from which then completions are sampled. We already know this representation is able to identify and encode ideas as diverse as "odd/even" and "fun" and "formal language" and "golden gate bridge"-like. "Fictional context" vs. "Real-life physics" is a concept they can distinguish too, just like people can.


Where "probable" means: occurs the most often in the training data (approx the entire Internet). So what is common online is most likely to win out, not some other notion of correctness.

Well, duh ... of course there are statistical regularities in the data, which is what LLMs learn. However, the LLM has no first hand experience of the world described by those words, so the words (to an LLM) are not grounded.

So, the LLM is doomed to only be able to predict patterns in the dataset, with no knowledge or care as to whether what it is outputting is objectively true or not. Calling it bullshitting would be an anthromorphism since it implies intention, but it's effectively what an LLM does - just spits out words without any care (or knowledge) as to the truth of them.

Of course if there is more truth than misinformation and lies in the dataset, then statistically that is more likely to be output by an LLM, but to the LLM it's all the same - just word stats, just the same as when it "hallucinates" and "makes something up", since there is always a non-zero possibility that the next word could be ... anything.


Ground truth is possibly used in the sense that humans’ brains tie what they create to the properties of observed reality. Whatever new information comes in is compared to, or checked by, that. Whereas, LLM’s will believe anything you feed them in training no matter how unrealistic it is.

I do think that, after much training data, they do have specific beliefs that are ingrained in them. That changing those is difficult. We’ve seen that on some political and scientific claims that must have been prominent in their pre-training data or RLHF tuning. They will argue with us over those points, like it’s a fight. Otherwise, I’ve seen continued pre-training or fine-tuning can change everything up to their vocabularies.


> Ground truth is possibly used in the sense that humans’ brains tie what they create to the properties of observed reality. Whatever new information comes in is compared to, or checked by, that.

What do you compare your knowledge of history to, other than what you expect other people will say? Most knowledge we learn in life is tied only to our expectations of other people's reactions. Which works out fine, most of the time, because even with questions of scientific fact, it's enough some people are in position to ground information in empirical experiments, and then set everyone else's expectations based on that.


To start with, we know what a person is, rudimentary things about how they behave, our senses, how they commonly work, and can do mental comparisons (reality checks).

We know LLM’s don’t start with that because we initialize them with zero’d or random weights.

Then, their training data can be far more made up, even works of fiction, that the reality most humans observe with is almost always real. We could raise a human in VR or something where technically there would be comparisons. Most humans’ observations connect to expectations in their brain which was designed for the reality we operate in.

Finally, the brain has multiple components that each handle different jobs. They have different architectures to do those jobs well. Sections include language, numbers, reasoning, tiers of memory, hallucination prevention, mirroring, and even meta-level stuff like reward adjustment. We don’t just speculate that they do different things: damage to those regions shuts down those abilities. Tied to the physical realm are vision, motor, and spatial areas. We can feel objects, even temperature or pressure changes. That we can do a lot of that without supervised learning shows we’re tailor-made by God for success in this world.

LLM’s have one architecture that does one job which we try to get to do other things, like reasoning or ground truth. We pretend it’s something it’s not. The multimodal LLM’s are getting closer with specialized components. Even they aren’t all trained in a coherent way using real-world, observations in the senses. There’s usually a gap between systems like these and what the brain does in the real world just in how it gets reliable information about its operating environment.


> To start with, we know what a person is, rudimentary things about how they behave, our senses, how they commonly work, and can do mental comparisons (reality checks).

How much is this a matter of fidelity? LLMs started with text, now text + vision + sound; it's still not the full package relative to what humans sport, but it captures a good chunk of information.

Now, I'm not claiming equivalence in the training process here, but let's remember that we all spend the first year or two of our lives just figuring out the intuitive basics of "what a person is, rudimentary things about how they behave, our senses, how they commonly work", and from there, we spend the next couple years learning more explicit and complex aspects of the same. We don't start with any of it hardcoded (and what little we have, it's been bestowed to us by millennia of a much slower gradient-descent process - evolution).

> LLM’s have one architecture that does one job which we try to get to do other things, like reasoning or ground truth.

FWIW, LLMs have one architecture in a similar sense brain has one architecture - brains specialize as they grow. We know that parts of a brain are happy to pick up the slack for differently specialized parts that became damaged or unavailable.

LLMs aren't uniform blobs, either. Now, their architecture is still limited - for one, unlike our brains, they don't learn on-line - they get pre-trained and remain fixed for inference. How much a model capable of on-line learning will differ structurally from current LLMs, or even the naive approach to bestow learning ability on LLMs (i.e. do a little evaluation and training after every conversation)? We don't know yet.

I'm definitely not arguing LLMs of today are structurally or functionally equivalent to humans. But I am arguing that learning from sum total of the Internet isn't meaningfully different from how humans learn, at least for anything that we'd consider part of living in a technological society. I.e. LLMs don't get to experience throwing rocks first-hand like we do, but neither of us get to experience special relativity.

> Even they aren’t all trained in a coherent way using real-world, observations in the senses.

Neither them nor us. I think if there's one insight people should've gotten from the past couple years is that "mostly coherent" data is fine (particularly if any given subset is internally coherent, even if there's little coherence between different subsets) - both humans and LLMs can find larger coherence if you give them enough such data.


The format is very interesting. Can you speak to the tech stack behind how you made it ?

I'll agree on interesting, however I found it very difficult to follow.

Reading the landing page, I expected a link to get started. Took me some time to really register the links at the very bottom.

After finding my way into the lectures I got distracted by all the scrolling. Fortunately Reader Mode fixed that. However a few lectures in and I notice there's several videos I've missed...

However since I'm starting to approach the get-off-my-lawn age, I guess I'm not the target audience.


After talking to an awful lot of 18-20 year olds (our target audience) we decided we wanted to go with a "scrollytellying" style. I'm not a designer and I've done worked in that style before. After looking into a range of platforms — Vev and Closeread for Quarto deserve particular mention — I felt that Shorthand (https://shorthand.com/) was the best option for rapid development given my lack of experience in this whole process.

In general I've been very pleased. You don't have the fine scale control you do on a platform like Vev, but for someone like me that is probably a good thing because it keeps me from mucking around quite as much as I otherwise would with design decisions that I don't really understand.

The price is a bit steep for a self-funded operation and we're constrained a bit by the need to use their starter tier, but I feel like we are definitely getting our money's worth and customer support from Shorthand has been exemplary.


I have to scroll or click a lot to get to the actual content. I would not bothered to read after three slogans if people did not praised it so much in the comments - and I am still unsure whether they like the content or the sundown picture.

Isn't the political climate already lumping people into bullshit vs bull.

It's not a "ChatGPT world."

You can thrive just fine by entirely ignoring it and all the snake oil vendors living in it.

4 years and all they have to show for it is absurdly powerful video cards, sub 90% accuracy where it matters, and the only application is "chat bot."

It's a fad. Wake up everyone.


You can entirely ignore it only in the sense that sticking your head in the sand is an option. A small but growing fraction of the text you read and images you see were generated by an LLM, and since 2023 at the latest they've been good enough that you cannot reliably tell which fraction it is.

a picture is worth a thousand cords

I just want to say: I've been publicly calling them "bullshit machines" since the first big media wave. I am incredibly pleased that this mental model helps other people, too. And that the specific term sees broader use is also nice.

Also, neener neener neener I called them bullshit machines BEFORE it was cool. /humor

Seriously though the humanities have a lot to chew on with LLMs, and are incredibly important to how we live and work with them. Who knew that epistemology would become front page news, and the sexiest topic for VC?


Too little is made of the distinction between silicon substrate, fixed threshold, voltage moderated brittle networks of solid-state switches and protein substrate, variable threshold, chemically moderated plastic networks of biological switches.

To be clear, neither possesses any magical "woo" outside of physics that gives one or the other some secret magical properties - but these are not arbitrary meaningless distinctions in the way they are often discussed.


Since you are on Frankfurtian bullshit, why don't you consider Late G.A. Cohen's take on intellectual bullshit (bullshit perpetuated in the academia), as the latter's notion of bullshit is linked to knowledge, unlike the bullshit we hear from salesmen, showmen, etc.

We've discussed Cohen's work in our book _Calling Bullshit_, but the type of bullshit the Cohen focuses on — unclarifiable unclarity, particularly in academic writing — is not what LLMs produce so it strikes us as far less relevant to this course than Frankfurt's notion.

Your site looks cool! Nice topic!

Some of them just try to predict the most likely next word.

With reasoning and pause for thought they are becoming more capable.

Most likely there is a big element of hype but the way you use them can make them really useful and accelerate your work.

I recommend the CoIntelligent book for newbie like myself.


I’m all for courses like this but they become outdated very quickly. This area is moving fast

Do you feel that you may be being a bit provocative by calling LLM's 'bullshit machines'?

I understand the frustration as I've been bullshitted by these models just as much as the next programmer, but surely with recent advancements in RAG and reasoning, they're not just 'bullshit machines' at this point, are they?


Or are they modern day oracles? The full title is inclusive and invites consideration. Besides, how many different types of models are there? How are they used? For freely available ones made accessible to the general public via interfaces, when were they last updated? Do the interface implementors even care or was it a simple project to make money via ads and microtransactions? This is a new area of media literacy and requires critical thinking. Nonetheless, I do acknowledge the value of RAG-based approaches that attempt to qualify their reasoning through provided sources.

> Or are they modern day oracles?

Neither, which is probably why the parent commenter considers it provocative.

"A bullshitter either doesn't know the truth, or doesn't care. They are just trying to be persuasive."

In any case, this kind of anthropomorphization is definitely bullshit.


I wrote a post below about how AI hallucinated a whole regulation that didn’t exist which has been flagged for some reason.

I have colleagues who have had arguments with clients who have asked AI questions about planning law and been given bullshit which they then insist is true and they can’t understand why thier architects won’t submit the appeal that they’re asking for.

I think we’re in an era where any text, true or not, is so easy to generate and disseminate that the status of the written word is reduced to the standard of the gossip that used to be our main source of information before the printing press was invented. Now half the internet is AI generated bullshit as well.


It was flagged because it’s a copy paste of the same comment you made ten days ago.

“Bullshit” actually means something:

> In philosophy and psychology of cognition, the term "bullshit" is sometimes used to specifically refer to statements produced without particular concern for truth, clarity, or meaning, distinguishing "bullshit" from a deliberate, manipulative lie intended to subvert the truth.

https://en.m.wikipedia.org/wiki/Bullshit

It’s really an ideal term to describe what LLMs do.


I prefer "waffle" https://en.m.wikipedia.org/wiki/Waffle_(speech)

"Waffle machines" is even kind of funny.


Waffle means something very different though in the U.S., to “flip-flop” on a position. Not hold it for any fundamental reasons. But I don’t think you can say that an LLM holds a position whatsoever. Also, https://en.m.wikipedia.org/wiki/On_Bullshit the essay originally popularizing the definition of bullshit considered here. Note the references to LLM output at he bottom.

Makes me imagine coming up to the hindquarters of a bull with a waffle machine on an extension cord.

Waffle machines is way better. Love it. Thanks.

> It’s really an ideal term to describe what LLMs do.

Only when you relax it to the point it also describes what most people do.

To be specific: if by "truth" you mean objective, verifiable truth, and by "caring about truth" you mean caring about objective, empirically verifiable evidence, then by far most people are mostly only ever trying to be persuasive.


I hope you don't actually mean that. It's a cynical, sad, and - quite frankly - false thing to believe.

Why? I don't have a problem recognizing that "being convincing" to the right people is a very good way of learning and verifying knowledge about objective reality. Just because it's by proxy, doesn't mean it doesn't work. This is how every one of us learns most things, and what's really sad to me is deluding yourself into thinking we're more special than that.

Honestly, it just sounds to me like you’ve come up with some axioms in your mind about the “fundamental nature” of humanity - and then granted yourself the luxury of certainty about them. That, with a dash of misanthropy, is something I’ve been seeing more and more on HN these days.

I have been thinking about this and I have an idea for local LLM I want to try.

It is based on the assumption LLMs make mistakes (they will!) but be confident about it. Use a 1B model and ask it about a city for fun mistakes in an otherwise impressive array of facts. Then you will see how bullshitty LLMs are.

Anyway the idea is to constrain the LLM and a language understanding and to choose a constrained response.

For example give it the capability to read out something from my calendar. Using the logits it gets to choose what that calendar item is. But then regular old code does the lookup and writes a canned sentence saying "At 6pm today you have a meeting titled $title".

This way, my meeting schedule won't make my LLM talk like a pirate.

This massively culls what the LLM can do but what it does do, is like a search and so just like Google (before gen AI!) you get a result but you can judge it as a human.


This is basically how a lot of it works already. What you are describing is more or less just structured output and tool usage.

Yes! Thanks for the terms. I guess I am saying restrict it to that. Probably how Siri etc. works. This gives a low bullshit usage pattern.

It is literally a machine that says what you want to hear. Was anyone in this thread around for Talk to Transformer? Remember the unicorns demo? They may have now figured out how to wrangle the technology so it's more likely to spit facts, but it's still the same thing underneath.

they define the term bullshit in lesson 2:

[quote]BULLSHIT involves language or other forms of communication intended to appear authoritative or persuasive without regard to its actual truth or logical consistency.[/quote]


That is an emotionally manipulative definition and anthropomorphizes the LLMs. They don't "intend" anything, they're not trying to trick you, or sound persuasive.

They address this in lesson 2:

> According to philosopher Harry Frankfurt, a liar knows the truth and is trying to lead us in the opposite direction.

> A bullshitter either doesn't know the truth, or doesn't care. They are just trying to be persuasive.

Being persuasive (i.e., churn out convincing prose) is how LLMs were designed to be.


Some pushback on this, but it remains true.

Easy to see when - for example - Claude gushes about how great all your ideas are.

Also the stark absence of "I don't know."


I've never used Claude, but Perplexity often says that no definitive information about a topic could be found, and then tries to make some generalized inferences. There's a difference between a specific implementation, and the technology in general.

In any case, it's worthwhile for people to understand the limitations of the technology as it exists today. But calling it "bullshit" is a mischaracterization; I believe based on an emotional need for us to feel superior, and to dismiss the capabilities more thoroughly than they deserve.

It's a little like someone saying in the industrial revolution, "the steam shovel is too rigid, it will NEVER have the dexterity of a man with a shovel!". And while true and important to know, it really focuses on the wrong thing, it misses the advantages while amplifying the negatives.


If not bullshit then what would you call it?

As the technology exists today: imperfect, often prone to mistakes, and unable to relay confidence levels. These problems may be addressed in future implementations.

That's the same message, without any emotional baggage, or overly dismissive tone.


That would be great if those who are selling the technology described it that way. I, and apparently others, feel like maybe "bullshit" is a better counter to the current marketing for LLMs

This is patently false. They are trained to generate correct responses.

Then comes the question of what is a correct response...

ps: I fail to detect whether your comment was ironic or not.


There are different criteria in use for that. But sycophantic behavior is not the goal. It's something model builders actively try to prevent.

> Being persuasive (i.e., churn out convincing prose) is how LLMs were designed to be.

No. They were designed to churn out accurate prose that accurately reflects their model of reality. They're just imperfect. You're being cynical and emotional to use the term bullshit. And again, it anthropomorphizes the LLM, it implies agency.


> that accurately reflects their model of reality.

you are also seemingly anthropomorphising the technology by assigning to it some concept of having a “model of reality”.

LLM systems output an inference of the next most likely token, given: the input prompt, the model weights and the previously output token [0].

that is all. no models of reality involved. “it” doesn’t “know” or “model” anything about “reality”. the systems are just a fancy probability maths pipelines.

probably generally best to avoid using the word “they” in these discussions. the english language sucks sometimes. :shrug:

[0]: yes i know it is a bit more complicated than that.


> no models of reality involved.

It literally has a mathematical model that maps what would, colloquially at least, be known as reality. What exactly do you think those math pipelines represent? They're not arbitrary numbers; they are generated from actual data that is generated by reality. There's no anthropomorphizing at all.


reality is infinite.

a corpus of training data from the internet is finite.

any finite number divided by infinity ends up tending towards zero.

so, mathematically at least, the training data is not a sufficient sample of reality because the proportion of reality being sampled is basically always zero!

fun with maths ;)

> What exactly do you think those math pipelines represent?

probability distributions of human language, in the case of text only LLMs.

which is a very small subset of stuff in reality.

-

also, training data scraped from the public internet is a woeful representation of “reality” if you ask me.

that’s why LLMs i think are bullshit machines. the systems are built on other people’s bullshit posted on the public internet. we get bullshit out because we made a bunch of bullshit. it’s just a feedback loop.

(some of the training data is not bullshit. but there is a lot of bullshit in there).


You're really missing the point and getting lost in definitions. The entire point of human language is to model reality. Just because it is limited, inexact, and imperfect does not disqualify it as a model of reality.

Since LLMs are directly based on that language, they are definitely based on and are a model of reality. Are they perfect? No. Are they limited? Yes. Are they "bullshit"? Only to someone who is judging emotionally.


and herein lies the rub.

> The entire point of human language is to model reality.

is it? are you absolutely certain of that fact? is language not something that actually has a variety of purposes?

fiction novels usually do not describe our reality, but imagined realities. they use language to convey ideas and concepts that do not necessarily exist in the real world.

ref: Philip k dick.

> Since LLMs are directly based on that language, they are definitely based on and are a model of reality.

so LLMs are an approximation of an approximate model of reality? sounds like the statistical equivalent of taking an average of averages!

i am playing with you a bit here. but hopefully you see what im getting at.

by approximating something that’s approximate to start with, we end up with something that’s even more approximate (less accurate), but easier than doing it ourselves.

which is the whole USP of these things. why think about things when ChatGPT can output some approximation of what you might want?


> imagined realities.

Imagined realities are a real part of reality.

> so LLMs are an approximation of an approximate model of reality?

Yes, and we as humans have a mental model that is just an approximation of reality. And we read books that are just an approximation of another human's approximation of reality. Does that mean that we are bullshit because we rely on approximations of approximations?

You're being way too pedantic and dismissive. Models are models, regardless of how limited and imperfect they are.


> Models are models

Random aside -- I have a feeling, dunno why, that you might enjoy this type of thing. Maybe not. But maybe. https://www.reddit.com/r/Buddhism/comments/29j08o/zen_mounta...

> Imagined realities are a real part of reality.

Now we're deeper into it -- I actually agree, somewhat. See above for deeper insight.

These LLM systems output "stuff" within our reality, based on other things in our reality. They are part of our reality, outputting stuff as part of reality about the reality they are in. But that doesn't mean the statistical model at the heart of an LLM is designed to estimate reality -- it estimates of the probability distribution of human language given a set of conditions.

LLMs are modelling reality, in the same way that my animal pictures image classifier is modelling reality. But neither are explicitly designed with that goal in mind. An LLM is designed to output the next most likely word, given conditions. My animal pictures classifier is designed to output a label representative the input image. There's a difference between being designed to have a model of reality, and being a model of reality because the thing being modelled is part of reality anyway. I believe it's an important distinction to make, considering the amount of bullshit marketing hype cycle stuff we've had about these systems.

edit -- my personal software project translating binary data files models reality. Data shown on a screen on some device modelled as yaml files and back again. Most software is an approximation of reality soup stuff. which is why I kind of don't see that as some special property of machine learning models.

> Does that mean that we are bullshit because we rely on approximations of approximations?

The pessimist in me says yes. We are pretty rubbish as a species if you look at it objectively. I am a human being that has different experiences and mental models to you. Doesn't mean I'm right about that! Which is why I said "I think". It's just my opinion they are bullshit machines. It is a strong opinion I hold. But you're totally free to have a different opinion.

Of course, there's nuance involved.

Running with the average of averages thing -- I'm pretty good at writing code. I don't feel like I need to use an LLM because (I would say with no real evidence to back it up) I'm better than average. So, a tool which outputs an average of averages is not useful to me. It outputs what I would call "bullshit" because, relative to my understanding of the domain, it's often outputting something "more average" than what I would write. Sometimes it's wrong, and confident about being wrong.

I'd probably be pretty terrible at writing corporate marketing emails. I am definitely below average. So having a tool which outputs stuff which is closer to average is an improvement for me. The problem is -- I know these models are confidently wrong a lot of the time because I am a relative expert in a domain compared to the average of all humans.

Why would I trust an LLM system, especially with something where I don't feel like I can audit/verify/evaluate the response? i.e. I know it can output bullshit -- so everything it outputs is now suspected, possible bullshit. It is a question of integrity.

On the flip side -- I can actually see an argument for these things to be considered so-called Oracles too. Just, not in the common understanding of the usage of the word. Like, they are a statistical representation of how we as a species use language to communication ideas and concepts. They are reflecting back part of us. They are a mirror. We use mirrors to inspect our appearance and, sometimes, to change our appearance as a result. But we're the ones who have to derive the insights from the mirror. The Oracle is us. These systems are just mirrors.

> You're being way too pedantic and dismissive.

I am definitely pedantic. Apologies that you felt I was being dismissive. I'm not trying to be. The averages of averages thing was meant to be a playful joke, as was the finite/infinite thing. I am very assertive, direct and kind of hardcore on certain specific topics sometimes.

I am an expression of the reality I am part of.

I am also wrong a lot of the time.


> probably generally best to avoid using the word “they” in these discussions. the english language sucks sometimes.

Thanks for this specific sentence.

Subscribed to your RSS feed. Although I will never know for sure if a human being posts there or a bot of some sort.


Where in the loss function of LLM training is the relationship between their model of reality and their predicted tokens? Any internal model an LLM has is an emergent property of their underlying training.

(And, given the way instruct/chat models are finetuned, I would say convincing/persuasive is very much the direction they are biased)


> Where in the loss function of LLM training is the relationship between their model of reality and their predicted tokens?

In the part where their loss function is to predict text that humans would consider a sensible completion, in a fully general sense of that goal.

"Makes sense to a human" is strongly correlated to reality as observed and understood by humans.


No, the lesson or the quote is not anthropomorphizing LLMs. It is not the LLM that "intends", it is the people who design the systems and those who make/provide the training data. In the LLM systems used today the RLHF process especially is used to steer towards plausible, confident and authorative sounding output - with no to little priority for correctness/truth.

they are just bullshit machines. bullshitters can cite wikipedia and are still bullshitters who are bullshitting.

Yes, they are.

I really hope that was intentional and the full effect of that naming choice was known beforehand, because I have already written the whole thing off, and I don't believe I'm the only one.

"How to thrive in a machine world?" should be the rhetorical question.

I'm offended.

> We were promised hyper-intelligent computer systems that would usher in an era of unparalleled prosperity and innovation.

Automated factories were supposed to deliver us from work, yet we work as much if not more than before for a lesser part of the profit cake.

We can't keep falling for the capitalist trick over and over...


We were also told that smart home will make our life easier and save time to have more available for the "important" things in life. But reality is that people just spend more time with their smart home things and are even shouting at their computers even though the computers just do what they were told...

This is a pretty admirable goal!

I'm saying this unironically, but I wish there were courses on looking at information critically and more in how to have a healthy and safe life in the modern day world (including things like data security, how to deal with social media etc.) that would be taught to everyone in schools/colleges/universities.

In my country, there are still public announcements about not trusting random people calling you, never giving your bank details to strangers (every bank homepage says that, that the employees will never ask for that stuff) and people regularly get scammed anyways, the only thing sort of saving them is that scamming is only scalable so far... until you throw automation in the mix, in addition to just plainly spreading misinformation about any topic, or even just allowing people to be confidently incorrect and eliminating the need for them to even think that much (e.g. students just asking ChatGPT to do their homework).

Any step at least in the direction of educating people feels like a good thing.

That said, I don't hate LLMs or anything, I use them for development more or less daily (lovely for boilerplate in your average enterprise Java codebase, for example) and recently saw this project, which made me happy: https://sites.google.com/view/eurollm/home


People frequently laugh about it, but media studies goes into what you describe.

Most scam prevention fails though because the world is full of exceptions.

Like it's mind-blowing to me that charities still call you and ask for credit card details over the phone and this is like...a legitimate way to go about things.

Or that any government agency calls you and doesn't just leave a verifiable number to call the operator back on.


> Like it's mind-blowing to me that charities still call you and ask for credit card details over the phone and this is like...a legitimate way to go about things.

> Or that any government agency calls you and doesn't just leave a verifiable number to call the operator back on.

That's rather unfortunate! I wonder if in those cases it'd be better to tell them that you'll get in contact through e-mail or something, because then at least it's you going to their actual homepage, looking up contact details and communicating through that.

In my country, we also have a bunch of governmental e-services, one of which is a web based communication platform with most institutionns (translated description, because they haven't bothered to translate it themselves, and also sometimes block connections from outside the country):

> An e-address, or official electronic address, is a personalized mailbox on the Latvija.gov.lv portal for unified and secure communication with state and local government institutions. The e-address system organizes secure, efficient and high-quality e-communication and e-document circulation between state institutions and private individuals, ensuring data confidentiality and protection of personal data from unauthorized access, unlawful processing or disclosure, accidental loss, alteration or destruction. An e-address is not e-mail, but its use is similar. Communication in an e-address is confidential, and the data is guaranteed to be available only to you and the institution you contacted. The main purpose of an e-address is to replace registered paper letters with electronic ones in cases where a state administration institution needs to send information and documents to a specific resident or entrepreneur. Citizens and entrepreneurs can also contact more than 3,000 institutions at any time and from any location via E-address. These include not only state and local government institutions, such as the Food and Veterinary Service, the State Labor Inspectorate, the Competition Council, etc., but also judicial institutions, sworn bailiffs and insolvency administrators, as well as private individuals to whom state administration tasks have been delegated.

That seems like a pretty good common sense idea for organizing trusted 2 way communication.


We do that, and have been doing it since 2013. I organised a schools outreach visit just last Friday for 12-14 yo. It's called "digital self defence". We even have public money from our NCSC (UK whitehat intelligence outreach).

As with TFA (Bergstrom and West) teaching sceptical inquiry and critical thinking is a major part. We have to undo a lot of nonsense that they've already been exposed to... much of which is marketing bullshit and misinformation for social control.


Not sure why everyone rates this. It’s full of very confidently made statements like “the AI has no ground truth” (obviously it does, it has ingested every paper ever), it “can’t reason logically” which seems like a stretch if you ever read the CoT of a frontier reasoning model and “can’t explain how they arrived at conclusions” where - I mean just try it yourself with o1, go as deep as you like asking how it arrived at a conclusion and see if a human can do any better.

In fact the most annoying thing about this article is that it is a string of very confidently made, black and white statements, offered with no supporting evidence, and some of which I think are actually wrong… i.e. it suffers from the same kind of unsubstantiated self confidence that we complain about with the weaker models


LLMs that use Chain of Thought sequences have been demonstrated to misrepresent their own reasoning [1]. The CoT sequence is another dimension for hallucination.

So, I would say that an LLM capable of explaining its reasoning doesn't guarantee that the reasoning is grounded in logic or some absolute ground truth.

I do think it's interesting that LLMs demonstrate the same fallibility of low quality human experts (i.e. confident bullshitting), which is the whole point of the OP course.

I love the goal of the course: get the audience thinking more critically, both about the output of LLMs and the content of the course. It's a humanities course, not a technical one.

(Good) Humanities courses invite the students to question/argue the value and validity of course content itself. The point isn't to impart some absolute truth on the student - it's to set the student up to practice defining truth and communicating/arguing their definition to other people.

[1] https://arxiv.org/abs/2305.04388


Yes!

First, thank you for the link about CoT misrepresentation. I've written a fair bit about this on Bluesky etc but I don't think much if any of that made it into the course yet. We should add this to lesson 6, "They're Not Doing That!"

Your point about humanities courses is just right and encapsulates what we are trying to do. If someone takes the course and engages in the dialectical process and decides we are much too skeptical, great! If they decide we aren't skeptical enough, also great. As we say in the instructor guide:

"We view this as a course in the humanities, because it is a course about what it means to be human in a world where LLMs are becoming ubiquitous, and it is a course about how to live and thrive in such a world. This is not a how-to course for using generative AI. It's a when-to course, and perhaps more importantly a why-not-to course.

"We think that the way to teach these lessons is through a dialectical approach.

"Students have a first-hand appreciation for the power of AI chatbots; they use them daily.

"Students also carry a lot of anxiety. Many students feel conflicted about using AI in their schoolwork. Their teachers have probably scolded them about doing so, or prohibited it entirely. Some students have an intuition that these machines don't have the integrity of human writers.

"Our aim is to provide a framework in which students can explore the benefits and the harms of ChatGPT and other LLM assistants. We want to help them grapple with the contradictions inherent in this new technology, and allow them to forge their own understanding of what it means to be a student, a thinker, and a scholar in a generative AI world."


I'll give it a read. I must admit, the more I learn about the inner workings of LLM's the more I see them as simply the sum of their parts and nothing more. The rest is just anthropomorphism and marketing.

Funny, I feel the same way about humans.

Whenever I see someone confidently making a comparison between LLMs and people, I assume they are unserious individuals more interested in maintaining hype around technology than they are in actually discussing what it does.

Someone saying "they feel" something is not a confident remark.

Also, there's plenty of neuroscience that is produced by very serious researchers that have no problems making comparisons between human brain function and statistical models.

https://en.wikipedia.org/wiki/Bayesian_approaches_to_brain_f...

https://en.wikipedia.org/wiki/Predictive_coding


Theories and approaches to study are not rational bases for making comparisons between LLMs and the human brain.

They're bases for studying the human brain - something which we are very much in our infancy of understanding.


Current LLMs are not the end-all of LLMs, and chain of thought frontier models are not the end-all of AI.

I’d be wary of confidently claiming what AI can and can’t do, at the risk of looking foolish in a decade, or a year, or at the pace things are moving, even a month.


That's entirely true. We've tried hard to stick with general principles that we don't think will readily be overturned. But doubtless we've been too assertive for some people's taste and doubtless we'll be wrong in places. Hence the choice to develop not a static book but rather living document that will evolve with time. The field is developing too fast for anything else.

With respect to what the future brings, we do try to address a bit of that in Lesson 16: https://thebullshitmachines.com/lesson-16-the-first-step-fal...


> we don't think will readily be overturned

I think that’s entirely the problem. You’re making linear predictions of the capabilities of non-linear processes. Eventually the predictions and the reality will diverge.


There's no evidence to support that's the case.

Every time someone claimed “emerging” behavior in LLMs it was exactly that. I can probably count more than 100 of these cases, many unpublished, but surely it is easy to find evidence by now.

Said the turkey to the farmer

I don't think that's how that metaphor works.

The post seems to be talking about the current capabilities of large language models. We can certainly talk about what they can or cannot do as of today, as that is pretty much evidence based.

That shouldn't give them any more merit that their current iteration deserves.

You could say the same thing about spaceships or self diving cars.


They saw you coming in part 16.

The ground truth is chopped off into tokens and statistically evaluated. It is of course just a soup of ground truth that can freely be used in more or less twisted ways that have nothing to do or are tangent to the ground truth. While I enjoy playing with LLMs I don't believe they have any intrinsic intelligence to them and they're quite far from being intelligent in the same sense that autonomous agents such as us humans are.

Any all of the tricks getting tacked on are overfitting to the test sets. It's all the tactics we have right now and they do provide assistance in a wide variety of economically valuable tasks with the only signs of stopping or slowing down is data curation efforts

I've read that paper. The strong claim, confidently made in the OP is (verbatim) "they don’t engage in logical reasoning.".

Does this paper show that LLMs "don't engage in logical reasoning"?

To me the paper seems to mostly show that LLMs with CoT prompts (multiple generations out of date) are vulnerable to sycophancy and suggestion -- if you tell the LLM "I think the answer is X" it will try too hard to rationalize for X even if X is false -- but that's a much weaker claim than "they don't engage in logical reasoning". Humans (sycophants) do that sort of thing also, it doesn't mean they "don't engage in logical reasoning".

Try running some of the examples from the paper on a more up-to-date model (e.g. o1 with reasoning turned on) it will happily overcome the biasing features.


I think you'll find that humans have also demonstrated that they will misrepresent their own reasoning.

That does not mean that they cannot reason.

In fact, to come up with a reasonable explanation of behaviour, accurate or not, requires reasoning as I understand it to be. LLMs seem to be quite good at rationalising which is essentially a logic puzzle trying to manufacture the missing piece between facts that have been established and the conclusion that they want.


Training on all papers does not mean the model believes or knows the truth.

It is just a machine that spits out words.


It's 1994. Larry Llyod Mayer has read the entire internet, hundreds of thousands of studies across every field, and can answer queries word for word the same as modern LLMs do. He speaks every major language. He's not perfect, he does occasionally make mistakes, but the sheer breadth of his knowledge makes him among the most employable individuals in America. The Pentagon, IBM, and Deloitte are begging to hire him. Instead, he works for you, for free.

Most laud him for his generosity, but his skeptics describe him as just a machine that spits out words. A stochastic parrot, useless for any real work.


Does his accuracy take a sudden precipitous fall when going from multiplying two three-digit numbers to two four-digit numbers?

I don't know about you, but when I do math without a calculator, my accuracy also drops precipitously whenever they add a digit.

Do you have self awareness to anticipate the drop in your accuracy and refuse to perform the operation?

I do anticipate it, but in the situations I'm asked to do such calculations, I don't usually have the option of refusing, nor would I want to. For most real would situations, it's generally better to arrive at a ballpark solution than to refuse to engage with the problem.

Ballpark solution is in a way refusing...

This is a solved problem, ChatGPT uses a python prompt to do arithmetic now. Just like you would… all good. You Can Just Check Your Own Claims

It has some pieces of the puzzle to intelligence. That's a deal breaker for some people, and useful/promising to others.

What experiment or measurement could I do to distinguish between a machine that “knows” the truth and a machine that merely “spits it out”? I’m trying to understand your terminology here

I would be very careful to claim exactly that as emergent properties seem kinda crucial for artificial and human intelligences. (Not to say that they are equally functioning nor useful.)

Um... what truth?

My truth, your truth or some defined objective truth?


>Training on all papers does not mean the model believes or knows the truth. It is just a machine that spits out words.

Sounds like humans at school. Cram the material. Take the test. Eject the data.


I've had frontier reasoning models (or at least what I can access in ChatGPT+ at any given moment) give wildly inconsistent answers when asked to provide the underlying reasoning (and the CoT weren't always given). Inventing sources and then later denying them mentioned them. Backtracking on statements it claimed to be true. Hiding weasel words in the middle of a long complicated argument to arrive at whatever it decided the answer was. So I'm inclined to believe the reasoning steps here are also susceptible to all the issues discussed in the posted article.

This sounds similar to a median human with little scruples?

> “can’t explain how they arrived at conclusions”

Imagine I would tell my wife, that whenever we have a discussion, her opinion would only be valid when she can explain how she arrived at her conclusion.


Your wife is one of the end products of cutthroat competition across several billion years so let's just say her general intelligence has a fair bit more validation than 20 years of research.

Sexual selection applies an evolutionary pressure against men who challenge women too much about the validity of their reasoning.

I was really, really trying to ignore the casual misogyny in OP's comment but you're really making this hard.

Well, for what it's worth, I believe that this evolutionary pressure works as strongly, or even more so, against women who challenge men about the validity of their reasoning.

But we know how the LLM works, and that's exactely how the authors explain it. And that explain also the weird mistakes they do, that nothing with the ability of reason or having a ground truth would do.

I really do not understand how technical people can think they are sentient


The writer is speaking from the perspective of the traditional philosophical understanding of a thinking being.

No, LLMs are not thinking beings with internal state. Even these "reasoning" models are just prompting the same LLM over and over again which is not true "logic" the way you and I think when we are presented with a new problem.

The key difference is they do not have actual logic, they rely on statistical calculations and heuristics to come up with the next set of words. This works surprisingly well if the thing has seen all text written, but there will always be new scenarios, new ideas it has not encountered and no these are not better than a human at those tasks and likely never will be.

However, what is happening is that our understanding of intelligence is being expanded, and our belief that we are going to be the only intelligent beings ever is under threat and that makes us fundamentally anxious.


> “the AI has no ground truth” (obviously it does, it has ingested every paper ever

it does not, AI is predicting the next ‘token’ based on the last ‘token’. There is no sentience, it’s machine learning except the machines are really strong.

It’d be illogical to say an AI has a ground truth just because it ‘ingested’ every paper ever.


What does sentience have to do with truth? I didn’t make that connection, you did. Wikipedia isn’t sentient but it contains a lot of truth. Raw data isn’t sentient but it definitely “has ground truth”.

>“the AI has no ground truth” (obviously it does, it has ingested every paper ever)

It also ingested every reddit thread and tweets of every politician ever.


the machine is fooling you with a mimicry of reasoning. and you are falling for it.

If it's mimicry of reason is indistinguishable from real reasoning, how is it not reasoning?

Ultimately, an LLM models language and the process behind it's creation to some degree of accuracy or another. If that model includes a way to approximate the act of reasoning, then it is reasoning to some extent. The extent I am happy to agree is open for discussion, but that reasoning is taking place at all is a little harder to attack.


No, it is distinguishable from real reasoning. Real reasoning, while flawed in various ways, goes through personal experience of the evaluator. LLMs don't have that capability at all. They're just sifting though tokens and associate statistical parameters to it with no skin in the game so to speak.

LLM's have personal option by virtue of the fact they make statements of things they understand to the extent their training data allows. Their training data is not perfect, and in addition, through random chance the LLM will latch onto specific topics as a function of weight initialization and training data order.

This would form a filter not unlike, yet distinct from, our understanding of personal experience.

you could make the exact same argument against humans, we just learn to make sounds that elicit favourable responses. Besides, they have plenty "skin in the game", about the same as you or I.


It seems like an arbitrary distinction. If an LLM can accomplish a task that we’d all agree requires reasoning for a human to do, we can’t call that reasoning just because the mechanics are a bit different?

Yes because it isn't an arbitrary distinction. My good old TI-83 can do calculations that I can't even do in my head but unlike me it isn't reasoning about them, that's actually why it's able to do them so fast, and it has some pretty big implications about what it can't do.

If you want to understand where a systems limitations are you need to understand not just what it does but how it does it, I feel like we need to start teaching classes on Behaviorism again.


An LLM’s mechanics are algorithmically much closer to the human brain (which the LLM is modeled on) than a TI-83, a CPU, or any other Turing machine. Which is why, like the brain, it can solve problems that no individual Turing machine can.

Are you sure you aren’t just defining reasoning as something only a human can do?


My prior is reasoning is a conscious activity. There is a first person perspective. LLMs are so far removed mechanically from brains the idea they reason is not even remotely worth considering. Modeling neurons can be done with a series of pipes and flowing water, and that is not expected to give arise to consciousness either. Nor are nuerons and synapses likely to be sufficient for consciousness.

You know how we insert ourselves into the process of coming up with a delicious recipe? That first person perspective might be also necessary for reasoning. No computer knows the taste of mint, it must be given parameters about it. So if a computer comes up with a recipe with mint, we know it wasn’t via tasting anything ever.

A calculator doesn’t reason. A facsimile of something we have no idea about its role in consciousness has the same outlook as the calculator.


LLMs are so far removed mechanically from brains the idea they reason is not even remotely worth considering.

Jet planes are so far removed mechanically from a bird that the idea they fly is not even remotely worth considering.


You’re right that my argument depends upon there being a great physical distinction between brains and H100s or enough water flowing through troughs.

But since we knew properties of wings were major comments to flight dating back to beyond the myths of Pegasus or Icarus, we rightly connected the similarities in the flight case.

Yet while we have studied neurons and know the brain is apart of consciousness, we don’t know their role in consciousness like the wing’s for flight.

If you got a bunch if daisy chained brains and that started doing what LLMs do, I’d change my tune—because the physical substrates are now similar enough. Focusing on neurons, and their facsimilized abstractions, may be like thinking flight depending upon the local cellular structure of a wing, rather than the overall capability to generate lift, or any other false correlation.

Just because an LLM and a brain get to the same answer, doesn’t mean they got there the same way.


Motte? Consciousness.

Bailey? Reason.

How reasonable are the outputs of ANNs considering the inputs? This is a valid question and it has a useful response.

From ImageNet to LLMs we are finding these tools to give some scale of a reasonable response.

Recommended reading: Philosophical Investigations by Wittgenstein.


Are we then conferring some kind of supernatural or religious properties to the brain’s particular implementation of neurons?

If not, then why shouldn’t differently constructed but algorithmically similar systems be able to produce similar phenomena?


Because we know practically nothing about brains so comparing them to LLMs is useless and nature is so complex that we're constantly discovering signs of hubris in human research.

See C-sections versus natural birth. Formula versus mother's milk. Etc.


I think you'd benefit from reading Helen Keller's autobigoraphy "the world i live in", you might reach the same conclusions I did, this being that perhaps conciousness is flavoured by our unique way of experiencing our world, but not strictly neccesary for conciousness of some kind or another to form. I beleive conciousness to be a tool a sufficently complex neural network will develop in order for it to achieve whatever objective it has been given to optimize for.

Taking a different tack from others in this thread. I don't think you can say that a TI-83 is not reasoning if it is doing calculations. Certainly it is not aware of any concepts of numbers and has no meaningful sense of the operation, but those are attributes of sentience, not reasoning. The reasoning ability of a calculator is extremely limited but what make those capabilities that it does have, non reasoning.

What non-sentience based property do you think something should have to be considered reasoning. Do you consider sentience and reasoning to be one and the same? If not then you should be able to indicate what distinguishes one from the other.

I doubt anyone here is arguing that chatGPT is sentient, yet plenty accept that it can reason to some extent.


>Do you consider sentience and reasoning to be one and the same?

No, but I think they share some similarities. You can be sentient without doing any reasoning, just through experience, there's probably a lot of simple life forms in that category. Where they overlap I think, is in that they require a degree of reflection. Reasoning I'd say is the capacity to distinguish between truth and falsehoods, to have mental content of the object you're reasoning about and as a consequence have a notion of understanding and an interior or subjective view.

The distinction I'd make is that calculation or memorization is not reasoning at all. My TI-83 or Stockfish can calculate math or chess but they have no notion of math or chess, they're basically Chinese rooms, they just perform mechanical operations. They can appear as if they reason, even a chess engine purely looking up results in a table base and with very simplistic brute force can play very strong chess but it doesn't know anything about chess. And with the LLMs you need to be careful because the "large" part does a lot of work. They often can sound like they reason but when they have to explain their reasoning they'll start to make up obvious falsehoods or contradictions. A good benchmark if something can reason is probably if it can.. reason about its reasoning coherently.

I do think the very new chain-of-thought models are more of a step into that direction, the further you get away from relying on data the more likely you're building something that reasons but we're probably very early into systems like that.


You say they are distinguishable. How would you experimentally distinguish two systems, one of which "goes through personal experience" and therefore is doing "real reasoning", vs one which is "sifting through tokens and associating statistical parameters"? Can you define a way to discriminate between these two situations?

>goes through personal experience of the evaluator

Real reasoning is being able to manipulate symbolic expressions in a consistent manner while preserving some invariants.

Personal experience as logic is how you end up with the Holocaust.


I am getting two contradictory but plausible seeming replies when I ask about a certain set being the same when adding 1 to every value in the set, asked on how I ask the question.

Correct answer: https://chatgpt.com/share/67a9500b-2360-8007-b70e-0bc2b84bc1...

Incorrect answer (I think): https://chatgpt.com/share/67a950df-d4e0-8007-8105-95a9e5be19...


I don't give a rat's ass about whether or not AI reasoning is "real" or a "mimicry". I care if machines are going to displace my economic value as a human-based general intelligence.

If a synthetic "mimicry" can displace human thinking, we've got serious problems, regardless of whether or not you believe that it's "real".


What is reasoning if not a chain of logically consistent thoughts?

fair, but "logically consistent thoughts" is a subject of deep investigation starting from the early euclidean geometry to the modern godel's theorems.

ie, that logically consistent thinking starts from symbolization, axioms, proof procedures, world models. otherwise, you end up with persuasive words.


You just ruled out 99% of humans from having reasoning capabilities.

The beautiful thing about reasoning models is that there is no need to overcomplicate it with all the things you've mentioned, you can literally read the model's reasoning and decide for yourself if it's bullshit or not.


That's sort of arrogant, Most of that 99 (if that many) % could learn if inspired to and provided resources. And does use reasoning and instinct in day-to-day life even if it's as simple as "I'll take go shopping before I take my car to the shop so I have the groceries" or "hide this money in a new place so my husband doesn't drink it away". Models will get better over time, and yes humans only use models too.

Humans rely in cues to tell when each other is fabricating or lying. Machines don't have those cues, and fabricate their reasoning too. So we have a complicatedly difficult time trusting them.


>You just ruled out 99% of humans from having reasoning capabilities.

After a conversation with humans I think you'd agree 1% of them being able to reason deeply is a vast overestimation.

A good example to see how little people can reason is the following classic:

> Given the following premises derive a conclusion about your poems:

> 1) No interesting poems are unpopular among people of real taste.

> 2) No modern poetry is free from affectation.

> 3) All your poems are on the subject of soap bubbles.

> 4) No affected poetry is popular among people of taste.

> 5) Only a modern poem would be on the subject of soap bubbles.

The average person on the street won't even know where to start, the average philosophy student will fuck up the translation to first order logic, and a logic professor would need a proof assistant to get it right consistently.

Meanwhile o3-mini in 10 seconds:

We can derive a conclusion about your poems by following the logical implications of the given premises. Let’s rephrase each premise into a more formal form:

Premise 1: No interesting poems are unpopular among people of real taste. This can be reworded as: If a poem is interesting, then it is popular among people of real taste.

Premise 2: No modern poetry is free from affectation. This tells us: If a poem is modern, then it is affected (i.e., it shows affectation).

Premise 3: All your poems are on the subject of soap bubbles. In other words: Every one of your poems is about soap bubbles.

Premise 4: No affected poetry is popular among people of taste. This implies: If a poem is affected, then it is not popular among people of taste.

Premise 5: Only a modern poem would be on the subject of soap bubbles. This means: If a poem is about soap bubbles, then it is modern.

Now, let’s connect the dots step by step:

From Premise 3 and Premise 5:

All your poems are on the subject of soap bubbles.

Only modern poems can be about soap bubbles.

Conclusion: All your poems are modern.

From the conclusion above and Premise 2:

Since your poems are modern, and all modern poems are affected,

Conclusion: All your poems are affected.

From the conclusion above and Premise 4:

Since your poems are affected, and no affected poem is popular among people of taste,

Conclusion: Your poems are not popular among people of taste.

From Premise 1:

If a poem is interesting, it must be popular among people of taste.

Since your poems are not popular among people of taste (from step 3), it follows that:

Conclusion: Your poems cannot be interesting.

Final Conclusion: Your poems are not interesting.

Thus, by logically combining the premises, we conclude that your poems are not interesting.


I could trace through that example quite quickly and I'm not an expert in logic, so I think you might be exaggerating some statements about difficulty here.

So are all the humans in this thread.

Except, human mimicry of "reasoning" is usually applied in service of justifying an emotional feeling, arguably even less reliable than the non-feeling machine.


It has served us relatively fine for thousands of years.

LLMs? I'm waiting for one that knows how not to say something that is clearly wrong with extreme confidence, reasoning or not.


Again, same can be said for humans.

Unless dealing with a psychopath you can deal with the lies using other subsystems.

The website that these comments are discussing (“Bullshit Machines”) says things that are probably wrong with extreme confidence

If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

Counterpoint by Diogenes: "Behold, a man!"

It could also be a cheap imitation of a duck that might be passable for someone dull

What is reasoning? What is understanding? Do humans do either? How do you know?

this is the question that the greeks wrestled with over 2000 years ago. at the time there were the sophists (modern llm equivalents) that could speak persuasively like a politician.

over time this question has been debated by philosophers, scientists, and anyone who wanted to have better cognition in general.


at the time there were the sophists (modern llm equivalents) that could speak persuasively like a politician.

You might want to brush up on your Greek history.


So how can you claim what an LLM is doing if we cannot define it regardless?

Because we know what LLM's do. We know how they produce output. It's just good enough at mimicking human text/speech that people are mystified and stupified by it. But I disagree that "reasoning" is so poorly defined that we're unable to say an LLM doesn't do it. It doesn't need to be a perfect or complete definition. Where there is fuzziness and uncertainty is with humans. We still don't really know how the human brain works, how human consciousness and cognition works. But we can pretty confidently say that an LLM does not reason or think.

Now if it quacks like a duck in 95% of cases, who cares if it's not really a duck? But Google still claims that water isn't frozen at 32 degrees Fahrenheit, so I don't think we're there yet.


math arose firstly as a language and formalism in which statements could be made with no room for doubt. the sciences took it further and said that not only should the statements be free of doubt, but also that they should be testable in the real world via well defined actions which anyone could carry out. all of this has given us the gadgets we use today.

llm, meanwhile, is putting out plausible tokens which is consistent with its training set.


I think the third worst part of the GenAI hype era is that every other CS grad now thinks not only is a humanities/liberal arts degree meaningless but now also they're pretty sure they have a handle on the human condition and neurology enough to make judgment calls on what's sentient. If people with those backgrounds ever attempted to broach software development topics they'd be met with disgust by the same people.

Somehow it always seems to end up at eugenics and white supremacy for those people.


It depends on your tolerance for error.

When you have a machine that can only infer rules for reasoning from inputs [which are, more often than not, encoded in a very roundabout way within a language which is very ambiguous, like English], you have necessarily created something without "ground."

That's obviously useful in certain situations (especially if you don't know the rules in some domain!), but it's categorically not capable of the same correctness guarantees as a machine that actually embodies a certain set of rules and is necessarily constrained by them.


Are you contending that every human derives their reasoning from first principals rather than being taught rules in a natural language?

I'm contending that, like any good tool, there is a context where it is useful, and a context where it is not (and that we are at a stage where everything looks suspiciously like a nail).

> I mean just try it yourself with o1, go as deep as you like asking how it arrived at a conclusion

I don't mean to disagree overall, but on this point the LLM can post-facto rationalize its output but it has no introspection and has absolutely no idea why it made a given bit of output (except in so far as it was a result of COT which it could reiterate to you). The set of weights being activated could be nearly disjoint when answering and explaining the answer.

One can also make the same argument about humans -- that they can't introspect their own minds and are just posthoc rationalizing their explanations unless their thinking was a product of an internal monolog that they can recount. But humans have a lifetime of self-interaction that gives a good reason to hope that their explanations actually relate to their reasoning. LLM's do not.

And LLMs frequently give inconsistent results, it's easy to demonstrate the posthoc nature of LLM's rationalizations too: Edit the transcript to make the LLM say something it didn't say and wouldn't have said (very low probability), and then have it explain why it said that.

(Though again, split brain studies show humans unknowingly rationalizing actions in a similar way)


I doubt people are very accurate at knowing why they made the choices they did. If you want them to recite a chain of reasoning they can but that is kind of far from most decision making most people do.

I agree people aren't great at this either and my post said as much.

However we're familiar with the human limits of this and LLMs are currently much worse.

This is particularly relevant because someone suffering from the mistaken belief that LLM's could explain their reasoning might go on to attempt to use that to justify the misapplication of an LLM.

E.g. fine tune some LLM using resume examples so that it almost always rejects Green-skinned people, but approve the LLMs use in hiring decisions because it is insistent that it would never base a decision on someone's skin color. Humans can lie about their biases of course, but a human at least has some experience with themselves while a LLM usually has no experience observing themself except for the output visible in their current window.


I also should have added that the ability to self explain when COT was in use only goes as deep as the COT, as soon as you probe deeper such that the content of the COT requires explanation the LLM is back in the realm of purely making stuff up again.

A non-hallucinated answer could only recount the COT and beyond that it would only be able to answer "Instinct."-- sure the LLM's response has reasoning hidden inside it, but that reasoning is completely inaccessible to the LLM.


Computers are "reasoning" in the same sense they have a "heartbeat".

> “the AI has no ground truth”

Yeah? It has? Where the irrefutable proof of that?


Hey, I'm definitely on your side of the Great AI Wars--and definitely share your thoughts on the overall framing--but I think you're missing the serious nature of this contribution:

1. Small correction, it's actually a whole book AFAIK, and potentially someday soon, a class! So there's a lot more thought put in then the typical hot-take blog post. I also pop into one of these guy's replies on Bluesky to disagree on stuff fairly regularly, and can vouch for his good faith, humble effort to get it right (not something to be taken for granted!)

2. RE:“the AI has no ground truth”, I'd say this is true, no matter how often they're empirically correct. Epistemological discussions (aka "how do humans think") invariably end up at an idea called Foundationalism, which is exactly what it sounds like: that all of our beliefs can be traced back to one or more "foundational" beliefs that we either do not question at all (axioms) or very rarely do (premises on steroids?). In that sense, this phrase is simply recalling the hallucination debates we're all familiar with in slightly more specific, long-standing terms; LLMs do not have a systematic/efficient way of segmenting off such fundamental beliefs and dealing with them deliberately. Which brings me to...

3. RE:“can’t reason logically”, again this is a common debate that I think is being specified more than usual here. A lot of philosophy draws a distinction between automatic and deliberate cognition. I give credit to Kant for the best version, but it's really a common insight, found in ideas like "Fast vs. Slow thinking"[1], "first order vs. recursive" thought[2], "ego vs. superego"[3], and--most relevantly--intuition vs. reason.[4] At the very least, it's not a criticism to be dismissed out of hand based on empirical success rates!

4. Finally, RE:“can’t explain how they arrived at conclusions”, that's really just another discussion of point 2 in more explicitly epistemic terms. You can certainly ask o3 to reason (hehe) about the cognitive processing likely to be behind a given transcript, but it's not actually accessing any internal state, which is a very important distinction! o3 would do just as well explaining the reasoning behind a Claude output as it would with one of its own.

Sorry for the rant! I just leave a lot of comments that sound exactly like yours on "LLMs are useless" blog posts, and I wanted to do my best to share my begrudging appreciation for this work.

The title is absurdly provocative, but they're not dismissing LLMs, they're characterizing their weaknesses using a colloquial term -- namely "bullshit" as used for "lying without knowing that you're lying".

[1] https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow [2] https://www.mit.edu/~dxh/marvin/web.media.mit.edu/~minsky/pa... [3] https://en.wikipedia.org/wiki/Id,_ego_and_superego [4] https://plato.stanford.edu/entries/intuition/ , and a flawed but interesting one from Gary Marcus: https://garymarcus.substack.com/p/llms-dont-do-formal-reason...


> obviously it does, it has ingested every paper ever

Do you have a citation for such a claim?



I've literally build a dynamic bench mark where I test reasoning models on their performance on deriving conclusions from assumptions through sequent calculus.

o3-mini high effort can derive chains that are 8 inference rules deep with >95% confidence I didn't have the money to test it further. This is better than the average professor in logic when given pen and paper.

It seems like a course critiquing 5 year old technology at this point.


they're quite useful for being "bullshit machines"

I read a bit and the book is more nuanced/fair/unbiased than the site url suggests.

> Others say they are nothing but bullshit machines.

Simply ignore anyone who says this and go about your business.

Doubly so if they bring up the environmental impact of AI.


LLMs pattern match, they say something that sounds good at this point but with no notion of correct. copilot is like pair programming with a loud pushy intern that has seen you write stuff before didn't understand it, but keeps suggesting what to do anyway. some medium sized chunks of code can be delegated but everyline it writes needs careful review.

Crazy tech, but companies are just wring to be trying to use LLMs as any kind of source of truth. Even Google is blind enough to think that aí could be used for search results, which are memes they are soo bad. And they won't get better. They just become more convincing


I've had quite a bit of success but my technique is to explain the technology and libraries I'm going to use, think through the problem, stub out function names, how they'll interact, and then llm saves me the typing.

I'll also use openrouter with sessions so I can take one context and use it around a variety of invocation tools without losing the attention.

It hasn't done anything I don't know how to do - fails if I ask it to do that. But it does save me lots of typing and thinking of minutia

It's not magic, it's still just a program running on a computer - a decent abstraction tool.

I'm sure it will be ruined in time like every new paradigm when the next generation feels a need to complicate this new tidy little world.


Not important once has copilot ever suggested a correction, found a bug, noticed a typo, prompted for a better solution, which is what any human pair programmer would do. It's a tool. But thinking ng it's like a "copilot" marketing as such is fundamentally missing the point. It won't get better untill people recognise what it _can't_ do as much as what it appears it can do.

This course mentions the famous Apple advertisement. Unfortunately it slightly oversells and while I'm sure that's not because this fragment was written by an LLM it is exactly the sort of over-simplification which leads to LLMs generating wild bullshit when they interpolate this "fact" with other "facts" they've been fed, and we ought to strive to do better when writing for humans.

"Describe how prior to 1984, there was no such thing as a graphical user interface, visual desktop, an intuitive menu system, or mouse-based navigation."

Apple were offering a mass market product which had these features so that's important - but there had been "such a thing" for quite some time before that. Douglas Engelbart's "Mother Of All Demos" in 1968 -- Sixteen years earlier shows all the features you mentioned. https://en.wikipedia.org/wiki/The_Mother_of_All_Demos

Unfortunately the demo is very long for a modern audience, so unlike "Watch a Superbowl ad" it's a hard sell to show the entire demo, but do go watch for yourself.


You're right of course.

In the original drafts I had a long section on this, including some of the history of the GUI, the development of the mouse, etc. It was way too much for the main text when the point is just to set up a metaphor for students who have seen a Mac 128.

That said, we can and should do better in the instructor guide. Thanks for the reminder. I'll add some context there.


I like the distinction between "teletypes" and the new fancy "glass teletypes".

Jimmy Carter installed a Xerox Alto in the White House in 1978! https://www.ourmidland.com/news/article/Check-out-the-first-... Never mind the Xerox Star, or Apple shipping the Lisa in 1983 ...

Synopsis from the project's "instructor guide:

>This is not a computer science course, nor even an information science course—though naturally it could be used in such programs.

>Our aim is not to teach students the mechanics of how large language models work, nor even the best ways of using them in various technical capacities.

>We view this as a course in the humanities, because it is a course about what it means to be human in a world where LLMs are becoming ubiquitous, and it is a course about how to live and thrive in such a world.


Great stuff! LLMs, social media, the information landscape has changed so much in the past decade. We need good pedagogical resources on how to think of these tools, both their benefits and their downsides.

I think a great number of working professionals need a course like this too. I am already tired of ChatGPT being cited by the less experienced as an invisible expert in the room during technical discussions.

I'm at the state of thinking that I am quite happy to let them screw themselves with it. I am very good at clearing up disasters and getting paid a hell of a lot for it as the deciding factor isn't your ability to use an LLM but to know what the hell you are doing. We have had quite a few disasters due to inexperienced and experienced people throwing stuff into an LLM and assuming it has any veracity or authority over what comes out.

I tried warning at first and reinforcing validation but I was poo pooed as a spoilsport luddite with basically a faith argument. Not my fucking funeral!


This stuff is so frustrating, I have colleagues who sent long, clearly AI generated documents who don’t seem to understand that if they can’t be arsed to write something, why should I bother reading it?

Write well is think well. A big part of the writing process is being forced to structure your thoughts and ideas, and I am worried that we focus too much on the end result without understanding the process that lead to good outcomes.

I was thinking how this article claims that people crave the authenticity of live music and that bullshit-generators will never be able to supply that. At first, I saw this as a reason for optimism, but then I got to thinking about evidence that people may not necessarily want authenticity after all.

Organic produce was the first metaphor the came to mind: it's probably more healthy for you even if it isn't as pretty, but many people aren't willing to pay a premium it and I suspect economics isn't often the reason. Is that a straw-man for live music? I don't know that it is because plenty of people are content to listen to recorded music -- sure, they might enjoy going to a live concert but they'll still listen to the radio on the drive to work.

Then I got to thinking about something more crass: while breast implants and other cosmetic body surgery may be as much for the benefit of the subject self-image I imagine there are plenty of people that find it very attractive despite what is often obviously fake.

So do we crave authenticity? I think I do but I'm not sure if that's a safe generalization to make.


Craving "authenticity" is somewhere very high up the hierarchy of needs, i.e. a luxury. That a lot of people do not care much for it is not a sign of moral failing but of having bigger fish to fry.

I think it'd fall somewhere close to a "social" need, i.e. right smack in the middle of Maslow's hierarchy.

Of course we do, but also realize that the threshold for "being authentic" is flimsy at best for many, and an Everest to climb for others. We want more skeptics in generalized society with their own personal Everest they require of their thought leaders. This variability of acceptance for integrity is a weakness in our civilization, strategically grown by attacking educational institutions, and currently being exploited to great success by Orwellian long players.

Organic produce isn’t a good example because it’s a dodgy poorly defined concept and it’s not clear that it’s either better for you or better tasting.

The thing about art is this: art is a message from a human to another human.

If I want art I want it to be that. I don’t want a numerical average of all past messages, which is what LLMs and diffusion models create. I also don’t want randomness or gimmickry, which is why I dislike a lot of pretentious modern art.


I didn't want to hijack the thread too much but yeah, organic farming was not originally about better for the consumer. The origin of organic farming methods was about taking care of the land and the ecosystems in which farming happens. The end products happened to be healthier, in some cases, because of reduced usage of chemicals known to be harmful resulted in less of them in the product at retail.

Perhaps, but then the meaning of the word is in how it's used (something that, if more people truly understood, would cut the amount of dismissing LLMs as "bullshit machines" and "stochastic parrots" by half).

"Organic farming" may have initially been about sustainability, but the result correlated well enough with healthy food - and even more so with the naturalistic fallacy-fueled "healthy food" fad, that the latter application took over as it became a market niche. The niche being itself based more on a fallacy than reality is why "organic food" is such a bullshit fest it is - one product might be genuinely healthier, another is just worse and also ruins the land because it's sprayed with a nasty set of chemicals that are more "natural" than their strictly safer "modern" alternatives...


Marketing and greed do ruin everything they touch, yes.

The travel and tourism industry is growing at around 4% YoY. So there is still a need (a growing need!) for "experiences" and "moments".

The "online bullshit" combined with no simple way towards ownership for younger generations are among the highest growth factors here in my opinion.

Still agreeing with your points, just wanted to add this context as there's likely a difference mentally between "social media" and "real world" authentic.


I'm uncomfortable with the use of profanity as a core element of this campaign branding, especially given that it seems to be an educational outreach effort. While this seems targeted at college age and above, I think it would be highly relevant content for a teen audience as well. While I swear in privacy on occasion I think it has no place in the classroom. I really don't care for it in the workplace either but I grant that private enterprises can have their own culture.

It is frustrating because I agree with most of the content and the need for informed debate on the topic. It is a bit like my reaction to reading Cory Doctorow: I agree with his politics but really dislike the hamfisted way he packages his advocacy in the form of action adventures. As if the merits of his arguments need to be packaged in cotton candy to be consumed, and there is an undercurrent of self-promotion and personal branding that feels suss.

Probably all a "me" problem with associations built up over time from seeing snake oil being packaged using a similar playbook. If you have to sell your message by dressing it up with scroll effects and provocative offensive language you've already lost me.


This is something we've given serious consideration, having taught a course called "Calling Bullshit" (http://callingbullshit.org) for almost a decade and having authored a book by the same name that gets downranked on various Amazon features because of its title.

But the bullshit is a term of art here, after the seminal 1986 article "On Bullshit" by Princeton philosopher Harry Frankfurt (later published as a little book). We strongly feel that it is exactly the right term for what LLMs are doing, and we make the case for that in lesson 2 of the course. (https://thebullshitmachines.com/lesson-2-the-nature-of-bulls...)

We're also concerned about accessibility for high school teachers etc., and thinking about what to do in that direction.

I'm curious: do you find "bs" to be any less offensive?


FWIW, I don't think you should cave on this. For me, your choice to use it over "hallucination" instantly elevated the insight of the lessons. I also think the authenticity of the voice of the lessons benefits from owning the decision to use it fully rather than compromising with the shorter "bs" version.

I assume that the use of the word "bullshit" on this site is at least in part informed by "On Bullshit,"[1] which is a pretty common undergraduate reading.

[1]: https://en.wikipedia.org/wiki/On_Bullshit


Somebody made a website to express their opinion - wherein their opinion can be surmised by reading the domain name.

Text is scaled to 300% to indicate just how important and authoritative they think their opinion is.

And it talks down to you in a "here comes the expert" style, with an atrocious aimed-at-preschoolers presentation.

No thank you.


Two university professors in data science and computational biology are not just “somebody”.

People that find professors and similar in high esteem usually haven't spent enough time in or around academia and academics and for that reason still maintain some innocent aura of mystique and prestige around it in their mind.

Quite literally anyone with a bit of persistence could become a professor (and this by far isn't even a top university either).

They are quite literally just a somebody with an opinion just like anybody else. An opinion barked down to you with 300% scaled fonts and preschooler illustrations.


I have been around a lot of academics and been in academia.

While we shouldn't trust academics just because they are academics, these people specialize in relevant fields and also back their claims with citations throughout the course. They are not just postulating.


Being a "University Professor" means jack shit unless precisely in their (sub)-field. The authors are experts in biology, and evolution of information representation/communication, and about misinformation.

I'll gladly defer to their expert opinion on those topics, but IMO to use such an authoritative voice when they are not experts in actual AI systems. Judging the massive progress in the field of AI, how can anyone even remotely state what these systems inherently are, when they are still so new and ever-evolving?


That's too harsh.

Some people do like big bullet list of points.

Some people need to be spoon fed.

Don't blame the spoon for being a spoon.


I thought I'd give this the benefit of the doubt.

It's trying its hardest to be put in the "this is bullshit" pile.

Five lines of content spread through five pages through the magic of parallax scrolling. Examples wandering around without going anywhere and confidently repeating talking points that were debunked 3 years ago.

Please release a textbook that can be read instead of whatever this is.


Wow, that's really interesting! I didn't have the time to read all the pages, but I definitely will. It helps to bring one's expectations about AI back down to earth.

People said the world wouldn't need more than 10 computers. Newyork times ridiculed space industry and etc..

AI is gonna disrupt all industries


Any new technology given this much attention and money will disrupt, that's not really a question.

The question is whether we're going to be better off for it, and if people want all that change in the first place.


It could be argued that the adoption of the automobile was a bad move.

You and I debating either its efficacy or social good though is irrelevant if it marches on regardless.


I didn't explain my point there, let me try with the automobile example.

There's a key difference in the how the impact of the automobile happened - consumers got to choose to buy them and the impact was driven in large part by market demand.

The fact that LLMs are going to have a big impact seems obvious because a comparatively few number of people are making a huge deal out of them, both with attention and money. LLMs will be big but that says nothing of their usefulness or even consumer demand, it says more about how the industry is being financed.


I'm not too worried about Big Corporation trying to push a rope. In the end it really is only going to succeed if "we" want it — find value in it.

> only going to succeed if "we" want it — find value in it.

I'll be pleasantly surprised if that's how it turns out.

At least so far market dynamics haven't really been much of a driver for LLMs. Those with the money think its the next big thing and are pouring cash both into the LLMs themselves and any product that slaps a "powered by AI" sticker on the box.

That's not to say people aren't also actively choosing to use LLMs, but in my opinion the market demand doesn't account for the massive amount of hype and funding, or the pervasiveness of LLMs being added to so many products.


I see it too — but I'll remind both of us that these are still very, very early days.

For sure. And I definitely have a bias showing here towards not trusting the person in charge to be benevolent or to actually know what the "right" thing to do is in the long run.

It’s totally possible that the net result is good (which is still quite early to really know), but it present new problems. For example, the creation of the car is probably good in general, but it has important issues that requires to be taken into account (accidents, pollution, city planning challenges, etc)

I’ve lived enough hype cycles to know that we are always very close to use VR every day or have autonomous cars… It took around 25 years to move from “we will pay with our mobile phones” to that becoming a reality.


In a unpredictable, hallucinating way. Who cares about human expertise when you can have a machine that acts like it has expertise in the same area? We deserve what is coming.

People also didn't anticipate how social media and surveillance capitalism would shit up our lives.

> LLMs are not capable of reflecting on and reporting about how or why they do what they do.

i get the why, but about the how:

deepseek has shown it's able to explain it's reasoning, how it reach a conclusion. That's quite a far stretch from the llm being only able to estimate statistically what's the right word to put after another one.


Their arguments are just one special-pleading fallacy after another. "But... but... but... it's different when we do it!"

No, it's not different when we do it. The takeaway here isn't that the AI algorithms are so special and magical, it's that our brains are not. It takes some nerve (literally) for humans to throw around labels like "bullshit machine."

The only advantage we really have is long-term memory. I'm sure that will be addressed soon enough. Someone will figure out the ANN analogue of memory consolidation during sleep, and that'll be the tipping point.


> The only advantage we really have is long-term memory.

We also have short term memory that is associative to our long term memory. So, STM is about 7 items. But those 7 items are also pointers to LTM.

Oh, and the ability to learn new things.


I think of the context as short-term memory. If there were a way to update the weights based on the context, such that the outcomes of future queries would be influenced, things would get interesting in a hurry.

There seems to be a lot of coping in the anti-AI department about the utility of the current LLMs. "I'm not impressed". You're not impressed that 98% of knowledge workers can 10-100x their own work output for free?

As a very intelligent infrastructure engineer told me: AI isn't going to take your job, but someone using AI is.


God says...

bradytrophic indical sacella unlittered surveying mis-tilled brachymetropic oxime masa hypermilitant litas acarotoxic thrust O retranslating awee tetraodont prevoidance cabbala Linker pervertedness absurdest coruscates aminopyrine spitting ledgers Fremontia upthrown simplifiers acomous statutable Ampelopsis


I am not the first to recommend JBEE SPY TEAM services, but I am doing this because I am very happy I made the right choice and that right choice is me following my instincts to go with the good reviews i saw about (JBEE SPY TEAM )on INSTAGRAM believe, if it's not for that choice i made i would have still been in a very toxic relationship with my ex partner who was a serial cheat but all that is gone now thanks to JBEE SPY TEAM, everyone deserves happiness that includes you so to get that happiness you deserve. contact the best in the game for spying and gaining access into phone remotely without having the device on your hands contact email conleyjbeespy606@gmail.com or you reach them on telegram +44 7456 058620.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: