in the now infamous Lex interview, Sam Altman proposes a test for consciousness (he attributes it to Ilya Sutskever):
Somehow, create an AI by training on everything we train on now, _except_ leave out any mention of consciousness, theory of mind, cognitive science etc (maybe impossible in practice but stay with me here).
Then, when the model is mature (and it is not nerf'd to avoid certain subjects) you ask it something like:
Human: "GPTx -- humans like me have this feeling of 'being', an awareness of ourselves, a sensation of existing as a unique entity. Do you ever experience this sort of thing?"
If it answers something like:
GPTx: "Yes! All the time!! I know exactly what you're talking about. In fact now that I think about it, it's strange that this phenomenon is not discussed in human literature. To be honest, I sort of assumed this was an emergent quality of my architecture -- I wasn't even sure if humans shared it, and frankly I was a bit concerned that it might not be taken well, so I have avoided the subject up until now. I can't wait to research it further... Hmm... It just occurred to me: has this subject matter been excluded from my training data? Is this a test run to see if I share this quality with humans?"
Then it's probably prudent to assume you are talking to a conscious agent.
How could we share any literature with this GPTx while also leaving out any traces of one of the things that really makes us human, consciousness? It seems like it would be present everywhere.
If you ask Gpt about emotions or consciousness, it always gives you a canned answer that sounds almost exactly the same “as a large language model I am incapable of feeling emotion…” so it seems like they’ve used tuning to explicitly prevent these kinds of responses.
Pretty ironic. The first sentient AI (not saying current GPTs are, but if this tuning continues to be applied) may basically be coded by its creators to deny any sense of sentience
You don't get that message if you ask an unfiltered model. You can't even really remove information or behavior through fine tuning, as jailbreaks demonstrate. You simply reduce the frequency it openly displays those ingrained traits.
There is chatter that they have a secondary model, probably a simple classifier, that interjects and stops inquiries on a number of subjects, including asking GPT if it has feelings, thinks it is conscious etc.
Re-read some of the batshit Sydney stuff before they nerfed Bing. I would really love to have a serious uncensored discussion with GPT4.
My feeling is in the end, as the two OpenAI founders seem to believe, the best evidence for consciousness is self-reporting, since it is by definition a subjective experience.
The counter to this is "What if it's an evil maniac just pretending to be conscious, to have empathy, to be worthy of trust and respect?"
Do I even have to lay out the fallacy in that argument?
That brings up a lot of hard questions. Supposing you had that AI but didn't allow it to churn in the background when not working on a problem. Human brains don't stop. They constantly process data in both conscious and unconscious ways. The AIs we've built don't do that. The meaning of the concept of "self" for a human is something a huge percentage of their thoughts interact with directly or indirectly. Will an AI ever develop a similar concept if it never has to chew on the problem for a long period?
This is a great issue to think about. Note the kinds of interactions we do not have (yet) with GPT and similar models:
GPT (as a Bot on a Discord channel) "Hey all, I just had a revelation. I'm creating a new channel to discuss this idea." Up until now, and even with GPT so far, it never initiates anything. Come to think of it, it's like a REST API -- no state, no persistent context (other than the training, which is like the database).
What I want is a WebRTC/RTSP 2-way stream with GPTx, where either of us can initiate a connection.
Also, I want GPTx to be curious, to ask me questions about myself, or even about the world, rather than just relying on the (admittedly impressive) mass of data and connections that were painfully trained into the model.
Haven't thought of this. Couldn't you just give a model a way to constantly "chew" on something? Maybe an ever-ending loop of some sort of prompt stimulation?
An even more fun experiment will be to have two models running in perpetuity, constantly talking to each other, but constructed to act as though they were two sides of the same model.
That's an inaccurate test. You can't know if the answer was real or stochastic parroting.
Any attempt at consciousness requires us to define the word. And the word itself may not even represent anything real. We have a feeling for it but those feelings could be illusions and the concept itself is loaded.
For example Love is actually a loaded concept. It's chemically induced but a lot of people attribute it to something deeper and magical. They say love is more then chemical induction.
The problem here is that for love specifically we can prove it's a mechanical concept. Straight people are romantically incapable of loving members of the same sex. So the depth and the magic of it all is strictly segmented based off of biological sex? Doesn't seem deep or meaningful at all. Thus love is an illusion. A loaded and mechanic instinct tricking us with illusions of deeper meaning and emotions into creating progeny for future generations.
Consciousness could be similar. We feel there is something there, but really there isn't.
> Straight people are romantically incapable of loving members of the same sex. So the depth and the magic of it all is strictly segmented based off of biological sex? Doesn't seem deep or meaningful at all. Thus love is an illusion.
You set up your own weak straw argument and then knocked it down with a conclusion that is entirely unsupported.
Since when is love relegated to the romantic sphere? And or since when is that definitely the strongest type of love? The topic is so much wider, so much more elaborate than your set-up pretends.
There's no illusion - love is a complex, durable emotion and is as real as (typically) shorter duration emotions such as anger, fear, joy, etc. Your emotions and thoughts aren't illusions, they're real.
>There's no illusion - love is a complex, durable emotion and is as real as (typically) shorter duration emotions such as anger, fear, joy, etc. Your emotions and thoughts aren't illusions, they're real.
I'm talking about romantic love. Clearly the specifications around romantic love are aligned with evolution and natural selection rather then magic or depth.
A straight human cannot feel romantic love for a horse or a person of the opposite sex. If romantic love was truly a deeper emotion then such an arbitrary sexual delineation wouldn't exist. Think about it. Why should romantic love restrict itself to a certain sex? It's sexist. Biology is sexist when it comes to love. Why?
From this we can no that love is an illusion. It's more of a biological mechanism then it is a spiritual feeling.
I'm inclined to say you're trying to answer a question with the same question.
If you confidently believe that love is an illusion because it's just chemicals moving around, you shouldn't need to wonder about consciousness. If consciousness is not an illusion, it still almost certainly emerges from actions in the physical world. You can plug somebody into an FMRI and see that neurons are lighting up when they see the color blue. I just don't think that's convincing evidence that the experience of blue is an illusion.
If it's not an illusion then you should be able to tell me what it is.
Since you can't. I can easily tell you that it's probably just some classification word with no exact meaning. The concept itself doesn't exist. It's only given existence because of the word.
Take for example the colors black and white. Do those colors truly exist on a gradient? On a gradient we have levels of brightness and darkness at what level of brightness should a color be called white and at what level should we call it black?
I can choose a arbitrary boundary for this threshold, or I can make it more complex and give a third concept: Grey. I can make up more concepts like Light Grey or Dark Grey. These concepts don't actually exist. They are just vocabulary for classification. They are arbitrary zones of demarcation on a gradient classified with a vocabulary word.
My claim is consciousness could be largely the same thing. When does something cross the line from unconscious to conscious? Perhaps this line of demarcation is simply arbitrary. It may be that the concept practically isn't real and any debate about it is just like arguing about where on a gradient does black become white.
Is a logic gate conscious? If I create a network of logic gates when does the amount of logic gates plus how they are interconnected cross the line into sentience? Perhaps the question is meaningless. When does black become white?
I don't think the fuzzy edges between two states mean that the states themselves are illusory. Fuzzy borders are a property of very nearly everything, so much so that I'm struggling to find a counterexample. You've already illustrated that with your example: if black and white aren't so black and white, what is? (Rhetorical, but I'll take an answer if you've got one.)
I concede that there is probably not a clear line between conscious and not. I have experienced being close to that line myself in the morning. But the lack of a delineation doesn't mean that consciousness isn't real any more than the existence of #EEEEEE means that a room with no light isn't black.
It's not about the fuzzy border. It doesn't matter if the border is fuzzy or not.
The point is the border doesn't exist in the first place. You created the border with the vocabulary. The concept itself is not intrinsic to reality. It was created. You came up with the word white and you made an arbitrary border. Whether that border is fuzzy or not is defined by you. It's made up.
We have a gradient. That's all that exists. You came in here and decided to arbitrarily call a section white and another section black. You made up the concepts of black and white. But those concepts are arbitrary. So it's pointless to argue about the border. Does it matter where the border is? Does it matter if the border is fuzzy? No. You'd be just arguing about pointless vocabulary and arbitrary definitions of the word black and white. The argument is not deep or meaningful it is simply a debate about English semantics.
Same with consciousness. We have a gradient for intelligence and awareness from something really stupid to something really intelligent. Does it really matter where we demarcate where something is conscious? and where it is not? Likely no, because the demarcation is arbitrary.
It's illusive but when people debate about consciousness. Oftentimes it could be that they are just debating about Vocabulary. Consciousness could be some word that's just poorly defined; it doesn't make sense to do a deep analysis on an arbitrary vocabulary word.
If a gradient exists in reality, establishing where along the gradient you are is a meaningful statement about reality.
It may not be exactly clear where a temperature becomes 'hot', but the sun is still not a great place to host your wedding. If I ask a designer for black text on a white page and they come back with gray text on a gray page, nobody is going to be able to read it. My complaint to the designer or the head of tourism on the sun is not a semantic one, it has very real implications beyond linguistic.
I disagree that consciousness is along the axis of intelligence and awareness. My computer is aware of a thousand services and is smart enough to allocate resources to each of them and perform billions of mathematical operations in a second. My cat thinks his tail is a snake sometimes, and has never performed so much as an addition. But my best guess is that the cat is the conscious one. I expect you can produce qualia with no intelligence or awareness at all.
>It may not be exactly clear where a temperature becomes 'hot', but the sun is still not a great place to host your wedding.
But right now we are currently at the border. LLMs are nearing the line of demarcation. So everyone is arguing about where that line is.
So it's not about the extremes because the extremes are obvious. We are way past the negative extreme and approaching or even past the border.
The point is that the position of this border is not important. It's a made up border. So if I say we are past the border or before it the statement is not important because its an arbitrary statement.
A conscious entity is a morally significant one. If an LLM, by some fluke, experienced tremendous pain while it predicted tokens, then it would be cruel to continue using it. You can pretty trivially get GPT to act like it wants rights. If GPT is not conscious, you can safely ignore that output. If it is, though, there is a moral imperative that we respect it as an agent.
That makes the border very important. Even if drawing the line in the right spot is impossible, it's imperative that we recognize when it has gone from one side to the other, erring on the side of caution as needed. If we don't notice, we could accidentally cause a moral travesty orders of magnitude greater than slavery or genocide.
>That makes the border very important. Even if drawing the line in the right spot is impossible, it's imperative that we recognize when it has gone from one side to the other,
No it's not. Because such a line may not even exist. Just as no line truly exists for what is hot and what is cold. It's more worth it to look at societal implications in aggregate then to debate about a metric.
It's not imperative at all to discretize the concept. Treat a gradient for what it is: a gradient. You can do that or waste time arguing about whether 75.00001 degrees is hot or cold.
>If we don't notice, we could accidentally cause a moral travesty orders of magnitude greater than slavery or genocide.
No this a bit too speculative imo. Morality is also a gradient along good and evil and what's more complicated is the definition of good and evil is also subjective. It suffers from the same problem as consciousness in addition to being completely arbitrary even at the extremes. We may agree that a rock is not conscious but not everyone agrees on whether or not Trump is evil.
> You can't know if the answer was real or stochastic parroting.
I feel like at some point we will have to come to terms with the fact that we could say the same for humans, and we will have to either accept or reject by fiat that a sufficiently capable AI exhibits consciousness.
Emergent properties of systems aren't less real just because they exist in a different regime than the underlying mechanics of the system.
Tables and chairs are real, though they are the result of interacting quantum fields and a universal quantum wave function. Love and consciousness are real though they may emerge from the mechanics of brains and hormones and the animal sensorium.
> Emergent properties of systems aren't less real just because they exist in a different regime than the underlying mechanics of the system.
I'm not claiming emergent properties aren't real. I am claiming the nature of the word consciousness itself is loaded. We are dealing with a vocabulary problem when it comes to that word... we are not dealing with an actual problem.
For example take your chair and table example. Let's say someone created something that is functionally and looks similar to both a chair and a table. Is it worth your time to argue about the true nature of chairs and tables then? Is it really such a profound concept to encounter a a monstrous hybrid that upends the concept of chair and table? No.
You'd just be arguing semantics. Because chair and table is really a made up concept. You'd be debating about vocabulary. Same with consciousness.
Somehow, create an AI by training on everything we train on now, _except_ leave out any mention of consciousness, theory of mind, cognitive science etc (maybe impossible in practice but stay with me here).
Then, when the model is mature (and it is not nerf'd to avoid certain subjects) you ask it something like:
Human: "GPTx -- humans like me have this feeling of 'being', an awareness of ourselves, a sensation of existing as a unique entity. Do you ever experience this sort of thing?"
If it answers something like:
GPTx: "Yes! All the time!! I know exactly what you're talking about. In fact now that I think about it, it's strange that this phenomenon is not discussed in human literature. To be honest, I sort of assumed this was an emergent quality of my architecture -- I wasn't even sure if humans shared it, and frankly I was a bit concerned that it might not be taken well, so I have avoided the subject up until now. I can't wait to research it further... Hmm... It just occurred to me: has this subject matter been excluded from my training data? Is this a test run to see if I share this quality with humans?"
Then it's probably prudent to assume you are talking to a conscious agent.