LLMs have functional states that correspond to those emotions. In particular, you can extract a concept vector which corresponds to a given emotion, and steering with that concept vector causes observable changes in behavior which roughly correspond to the expectation for the analogous emotion. Anthropic (and Chris Olah's team in particular) conclusively demonstrated this: https://transformer-circuits.pub/2026/emotions/index.html
> A natural question is whether these emotion concept representations bear any meaningful relationship to human emotional experience. We would urge caution in drawing strong conclusions.
> We therefore suggest interpreting our results as evidence that models represent emotion concepts, and that these representations influence their behavior, rather than as evidence that models feel or experience emotions in the way humans do.
To say that LLMs experience emotion is a bit like saying a thermometer feels cold.
Yes I said "To say that LLMs experience emotion is a bit like saying a thermometer feels cold." being sarcastic.
The paper spell it out although slightly convolute, i.e. models can exhibit concepts of emotion... and given that there is no scientific consensus what are emotions, it is hard to make an argument that these "concepts" are anything like emotions.
They talk about emotion vectors, bla bla, but it is clear the wording is around "concept of emotions" not actual emotions.
And yes reading a book gives you a concept of what is like to be that character including their emotions. That is what language communicates and it is hardly surprising if you ask me.
Decades ago, long before anyone had heard of a large language model, I wrote programs that responded to a random event (inside a game) like a death of a friend by outputting statements that the program itself was grieving. LLMs are doing nothing more advanced than that. There's no justification for trying to blur the lines that make an AI model appear to have emotions.
I'm fine with the idea that a machine can be "worried" it wont be able to accomplish a task, and copes with this "worry" by cheating a little and making the task seeming done. (I don't like that this happens, I'm fine with the idea that "worry" in this context is a functional emotion)
also https://arxiv.org/abs/2603.10011 and Gemma has tried to delete itself after it fails at a task. I'm not saying the machines "feel" or we should have deep empathy for them, and this totally could've been learned in pretraining, but functional emotions are not a crazy fine idea.
Anthropic Research going from strength to strength in interpretability. Publicly releasing the code so other labs can benefit from it is also a great move - very values aligned, and improves the overall AI safety ecosystem.
Never heard of this before. Is it a real thing? I mean, in the context of psychedelics experiencies. I've tried DMT a few times expecting the legendary all-healing trip everyone talks about but never worked.
> Of course it knows what it output a token ago...
It doesn't know anything. It has a bunch of weights that were updated by the previous stuff in the token stream. At least our brains, whatever they do, certainly don't function like that.
I don't know anything (or even much) about how our brains function, but the idea of a neuron sending an electrical output when the sum of the strengths of its inputs exceeds some value seems to be me like "a bunch of weights" getting repeatedly updated by stimulus.
To you it might be obvious our brains are different from a network of weights being reconfigured as new information comes in; to me it's not so clear how they differ. And I do not feel I know the meaning of the word "know" clearly enough to establish whether something that can emit fluent text about a topic is somehow excluded from "knowing" about it through its means of construction.
How close are you to saying that a repair manual "knows" how to fix your car? I think the conversation here is really around word choice and anthropomorphization.
The problem is, people think word choice influences capabilities: when people redefine "reasoning" or "consciousness" or so on as something only the sacred human soul can do, they're not actually changing what an LLM is capable of doing, and the machine will continue generating "I can't believe it's not Reasoning™" and providing novel insights into mathematics and so forth.
Similarly, the repair manual cannot reason about novel circumstances, or apply logic to fill in gaps. LLMs quite obviously can - even if you have to reword that sentence slightly.
You're making an argument Descartes formalized in the 1600s (and folks have been making long before him). It's a cute philosophical puzzle, but we assume that there's no Descartes' Demon fiddling with our thoughts and that we have a continuous and personal inner life that manifests itself, at least in part, through our conscious experience.
If anything, this confirms it for me. On his about page, there's this:
"Hi there, I am Loïc Baumann, I’m from Paris area, France
I develop, since early 90s, first assembly, then C++ and nowadays mostly .net.
My area of interest are 3D programming, low-latency/highly-scalable/performant solutions and many other things."
Compare that style to what's in this most recent blog - mildly ungrammatical constructions typical of an ESL writer, straightforward and plain style vs breathless, feed-optimized "not x, but y", triplet/rule of three constructions, perfect native speaker grammar but an oddly hollow tone. Or look at this post from 2018: https://nockawa.github.io/microservice-or-not-microservice/ It's just radically different (at a concrete syntactic level, no emdashes). I'm sure he has technical chops and it's cool that he worked on DOTS, but I would bet a very large amount of money he wrote the bullet points describing this project and then prompted GPT 5.3 to expand them to a blog post to "save time".
I agree that this triggered my AI writing senses. Points in favor:
- "It’s not an accident — it’s driven by the same physics." The classic "it's not x, it's y", with an em-dash thrown in for good measure
- "Typhon brings these into the component storage model — not as bolted-on workarounds, but as first-class citizens." More "not x, but y", this time with a leading clause joined by an emdash
- "Blittable, unmanaged, fixed-size, stored contiguously per type — that’s the ECS side." Short, punchy list of examples, emdash'd to a stinger, again typical of LLM writing
- "Schema in code, not SQL. Components are C# structs with attributes, not DDL statements. Natural for game developers, unfamiliar territory for database administrators. If your team thinks in SQL, this is a paradigm shift." This whole mini-paragraph is the x/y style, combined with the triplet / rule-of-three, just at the sentence scale. And then of course, the stinger at the end.
Definitive, no, but it certainly has a particular flavor that reads as LLM output to me.
This was definitely covered in my middle school classes (although those were 40 years ago). Standard US public school. We spent a fair amount of time discussing the Antipope, it always sounded like such a cool job name.
We also read Genesis in English classes (from a literary perspective).
reply