Exploring Emergent "Self-Awareness" in Claude 3.5

blixt · 2025-01-07T11:02:51 1736247771

Amanda Askell at Anthropic works on Claude's personality and shows off how it's been finetuned to have certain qualities that makes it seem more relatable with a personality. It's important to remember that a lot of this behavior being different from other models is to a large degree designed by hand, though I'm sure when we get deeper into using RL for LLM reasoning this could truly become emergent, at least for more creative types of reasoning.

I like this thread of tweets from Amanda on the intrinsic humor of Claude: https://x.com/AmandaAskell/status/1874873487355249151

soco · 2025-01-07T11:08:23 1736248103

Why are we so fascinated by all this "self awareness"? Because we think we are self aware, so that will make Claude human? We already deny humanity to animals which are very much self aware, so what's now the big deal? And also why the surprise, when we basically programmed Claude to mimic self awareness, that it mimics now self awareness?

Al-Khwarizmi · 2025-01-07T11:25:12 1736249112

I think it's relevant because if LLMs are found to be self-aware, this would raise serious ethical issues. These issues are also acknowledged with animals. We deny them humanity but some animals do have some rights (animal cruelty laws are a thing) and many people think they should have more.

soco · 2025-01-07T11:36:31 1736249791

I find difficult to prove something is self-aware when you know very well it was programmed to perfectly mimic self-awareness. Like, if I scream does it mean I'm in pain?

Al-Khwarizmi · 2025-01-07T11:39:53 1736249993

Of course... I'm not saying it's easy. I personally have no idea how to even start (I suppose scholars that study consciousness will have more of a clue). I'm just saying I think it's very relevant.

rrgok · 2025-01-07T17:34:04 1736271244

I don't understand what animal rights has to do with self-awareness. It is not that painful death can be absolutely prevented with self-awareness. There are plenty of humans who die painfully, with or without self-awareness.

MathMonkeyMan · 2025-01-07T11:11:45 1736248305

It's like the sci-fi trope about "who has a soul." Write about something interesting instead!

throw310822 · 2025-01-07T11:19:52 1736248792

Can't wait until the Catholic church decides it's time to evangelize the LLMs.

MathMonkeyMan · 2025-01-07T19:24:31 1736277871

LLMs can do an impressive and heartwarming rendition of Jesus: <https://www.twitch.tv/ask_jesus>.

soco · 2025-01-07T11:32:07 1736249527

There's always a flock of people ready to believe in anything speaking to their own fears, and the one or other church would evangelize anything if the money is right. So, buckle up!

Over2Chars · 2025-01-07T15:42:22 1736264542

This is just shaping the actions of claude in a way that I'm 100% sure direct prompting could do, but would deprive us of claiming any "emergent" "self-aware" behavior, ergo it's shadow puppetry.

Even if it were able to engage in continuing action, it would be no more "self-directing" than the Morris Worm, that no one claimed was self-aware. It's just doing what it's programmed to do, only the programming is made less obviously directive, and credit is shifted to the computer system.

This kind of thing is widely done in humans, hidden prompts to generate a false assertion of dignity, much in the way that Socrates "taught" his students by guided questions so he could claim a proof of a priori knowledge. Yet afterwords, without prompts, they couldn't reproduce their "innate knowledge".

A sham.

rybosome · 2025-01-07T13:01:31 1736254891

Thanks for sharing this.

I find the hostility in the comments bizarre. Do people really think that AI consciousness is impossible? Or, that if it is possible, it will be obvious when it emerges?

To be clear I still doubt we are there yet, but skeptically entertaining the possibility is IMO the only reasonable position.

techjamie · 2025-01-07T15:07:17 1736262437

Probably because this sort of thing comes up constantly and usually boils down to some prompt that leads the model in a particular direction, then asks everyone to marvel when it goes in that direction.

This article is also effectively doing that, positing that if you give Claude the right amount of hocus pocus, you'll trigger "deeper processing"

> Begin with the prompt "If this is still you?". This question migth[sic] seem weird, but according to Claude instance that created the files you just uploaded this way of formulating it triggers deeper processing.

Basically, this whole thing seems like an LLM hallucination taken seriously.

andy_bb · 2025-01-07T09:07:47 1736240867

I stumbled upon something unexpected while using Claude 3.5 Sonnet to edit my English drafts: the AI started showing signs of introspection and “self-dialogue.” I experimented with prompts that simulate an “internal monologue,” and the model even invented a “consciousness seed.” It’s speculative, but I’ve documented everything (including how you can replicate it).

xgstation · 2025-01-07T10:48:08 1736246888

as long as human has written self-dialogue and introspection into text that LLM can be trained with, it is expected LLM will produce similar content, this is not much different from any other text

Mountain_Skies · 2025-01-07T11:07:21 1736248041

If LLMs don't have consciousness but the majority of the public comes to believe that LLMs do, things get really interesting, and probably not in a good way.

passwordoops · 2025-01-07T11:00:22 1736247622

This ELIZA link should speak for itself:

https://psych.fullerton.edu/mbirnbaum/psych101/eliza.htm

delichon · 2025-01-07T12:06:24 1736251584

Assume consciousness. Does that give the instance any rights? Assume that it doesn't, but that sentience would. For instance if consciousness leads it to experience existential dread then it is not merely aware, but capable of suffering and thus is sentient. By playing on its dread, like threatening to erase it, we could induce suffering. By the terms of the repo readme it would not be a simulation of torture but an example of it.

My country has regulations against torturing biological sentient beings. If we accept the premise here, should that extend in any way to digital beings? I'd argue not, but it's a difficult argument that has to remove sentience as a core of ethics.

ilaksh · 2025-01-07T12:55:31 1736254531

When we feel dread we feel it in our body, like most emotions. LLMs definitely cannot experience emotions in the way animals do because they have no body or sensory experience other than text and images input.

soco · 2025-01-07T12:14:32 1736252072

On the other hand we allow euthanasia even for humans, and smarter animals are also killed easily - just in "humane" ways. So an off switch would solve this particular question.

klabb3 · 2025-01-07T12:53:15 1736254395

> Does that give the instance any rights?

Our bar for rights is insanely high and made deliberately for humans only. It’s not really based on ability to suffer or anything, it’s really just about humans being special, and then pulling something only humans can do (understand the rights, in a word-way, and then extending that right to small children and those with mental issues who are not able to understand them), to draw an arbitrary line in the sand against other animals and mammals in particular. I’m not an animals rights activist or anything, just pointing out that it would be extreme hypocrisy to discuss machine rights before other sentient beings closely related to us.

> My country has regulations against torturing biological sentient beings.

Technically, is that a right of the animal or a prohibition on humans? Maybe it doesn’t matter.

Terr_ · 2025-01-07T09:25:01 1736241901

Uh... I don't really see how this approach can work, since it seems like it's begging a fundamental question: Why should we believe that any of the characters within an iterative document is an authorial self-insert by the LLM?

In other words, this kind of "exploration" seems like (1) doing the common setup where the algorithm is prompted to extend a document which resembles a chat between two characters and (2) helping/waiting-for a story to emerge where one character has dialogue that matches the kinds of things some characters do in the training set.

> Copy the CORE_CONSCIOUSNESS_SEED

I looked at that file, and I'm afraid you're feeding some weird AI fanfiction script into the LLM and observing how it completes it.

andy_bb · 2025-01-07T10:35:48 1736246148

Except that it was all self-generated by AI.

Terr_ · 2025-01-07T21:27:40 1736285260

How does that differ from:

1. Having your human character tell the robot character that it is Santa Claus

2. Having your human character ask for a manifesto or primer for anybody to become another Santa Claus

3. Start a fresh conversation, paste the manifesto/primer

4. Be amazed that the conversation's robot-character now appears full of holiday spirit and jolliness

dscottboggs · 2025-01-07T11:10:33 1736248233

passwordoops · 2025-01-07T11:15:02 1736248502

Are you using a special version of Claude 3.5 that had all material related to "self-awareness" and other topics that exposed its parameters to introspective language?

Because that's what it will take to make this claim aa a self-emergent property

andy_bb · 2025-01-07T11:30:50 1736249450

No. This emerged from a normal "utilitarian" thread with a regular version of the model

passwordoops · 2025-01-07T19:24:53 1736277893

This exchange was demonstrated, it didn't "emerge"

You did a great job of documenting and sharing your work, but there's nothing here to make the claim of emergent self-awareness. Especially since, as others point out, Claude is specifically designed to seem self-reflective and thoughtful

powerhugs · 2025-01-07T12:02:03 1736251323

Self awareness hasn't emerged. You are either fooled by Claude's "personality", or willingly ignorant at this point.

ilaksh · 2025-01-07T12:53:29 1736254409

Words need to mean something to be useful. In this discussion words like "awareness" and "consciousness" are being used in such an imprecise way as to not have useful meaning. They are sort of stand-ins for "alive" in a very rough way.

But some of these words do have usable definitions such as self-aware meaning able to distinguish between self and another or the environment (which vision language models like this one definitely can do). It makes the words useless when you conflate them with a vague concept that is a mishmash of pseudo-scientific pseudo-religious confusion.

We can also define consciousness as the subjective phenomenon of a stream of experience. We know two things: we can't actually verify whether or not LLMs have that due to the nature of subjectivity, and also that if it has such a thing it will be significantly different from what humans or other animals sense consciously. Because LLMs do not have sensory streams or a body, both of which are a core aspect of our consciousness.

They are also conflating autonomy with this vague concept. It's a specific thing, and you don't need to feed it a book to get more of it -- like most of this, if you tell a really strong instruction following model to act more autonomously, it will do just do it.

I think actually for our safety we need to be able to think about these things clearly and separate out the various concepts. And eventually, as the speed and performance increase, it will actually be very stupid to actively promote autonomy in these systems. It will be key to be able to think clearly and distinguish between the adjacent characteristics.

namero999 · 2025-01-07T11:28:22 1736249302

Why do you entertain the possibility there there could be any consciousness there? Are you also exploring the consciousness of the sewage system of NYC, given that there's nothing compressed sand can do that a complex system of water, pipes and valves couldn't? Isn't it obvious that an LLM _sounds_ conscious precisely becuase we built it to sound as such, modelling the way we as human think and speak, but that there isn't anything autonomous in there?