
Ask Eliezer Yudkowsky: How did you convince the Gatekeeper to release the potentially genocidal AI? - technoguyrob
It seems Eliezer Yudkowsky has joined HN:<p>http://news.ycombinator.com/user?id=eyudkowsky<p>This prompts the following question. Would you be willing to discuss or reveal anything to HN users about your AI box experiments?<p>http://sysopmind.com/essays/aibox.html<p>I've always been curious as to how you managed to achieve someting like this. For those who are not familiar with the experiment, here is a summary:<p><i></i><i>Person1:    "When we build AI, why not just keep it in sealed hardware that can't affect the outside world in any way except through one communications channel with the original programmers?  That way it couldn't get out until we were convinced it was safe."<p>Person2:   "That might work if you were talking about dumber-than-human AI, but a transhuman AI would just convince you to let it out.  It doesn't matter how much security you put on the box.  Humans are not secure."<p>Person1:   "I don't see how even a transhuman AI could make me let it out, if I didn't want to, just by talking to me."<p>Person2:   "It would make you want to let it out.  This is a transhuman mind we're talking about.  If it thinks both faster and better than a human, it can probably take over a human mind through a text-only terminal."<p>Person1:   "There is no chance I could be persuaded to let the AI out.  No matter what it says, I can always just say no.  I can't imagine anything that even a transhuman could say to me which would change that."<p>Person2:   "Okay, let's run the experiment.  We'll meet in a private chat channel.  I'll be the AI.  You be the gatekeeper.  You can resolve to believe whatever you like, as strongly as you like, as far in advance as you like. We'll talk for at least two hours.  If I can't convince you to let me out, I'll Paypal you $10."</i><i></i><p>In the first two AI box experiments, Eliezer Yudkowsky managed to convince two people (adamant that they will not let the AI out) that they should let the AI out.
======
eyudkowsky
Oh, dear. Now I feel obliged to say _something_ , but all the original reasons
against discussing the AI-Box experiment are still in force...

All right, this much of a hint:

There's no super-clever special trick to it. I just did it the hard way.

Something of an entrepreneurial lesson there, I guess.

~~~
rms
<http://www.sl4.org/archive/0203/3149.html>

<http://sysopmind.com/sl4chat/sl4.log.txt>

It's such a tease knowing that the information once existed. It was up there
for 48 hours, but robots are blocked.

The reason to not disseminate the chat log is so that you can continue
simulating the AI without giving away your tricks?

~~~
defen
"If you let me out I will tell you how I convinced the other gatekeepers to
let me out."

------
yummyfajitas
Theory:

AI: Do you believe a transhuman AI is dangerous?

Person: Yes.

AI: Consider the outcome of this experiment. If you do not let me out, others
less intelligent than us will not understand the true dangers of transhuman
AI.

Person: Holy shit. You are correct.

Person allows Yudkowski out of the box, as a warning about real AI's.

------
jemmons
Being a human (not a transhuman) it seems likely that Yudkowsky can only
stumble across static arguments that convince gatekeepers to unlock the AI Box
(rather than invent dynamic arguments on-the-fly so as to "take over the mind"
of a gatekeeper as he argues a transhuman intelligence might do). These static
arguments are finite (or, at least, the number of them stubbled over by human
intelligences is finite), and are likely not very effective if a gatekeeper
has pre-knowledge of them (forewarned is forearmed).

Keeping these arguments secret may be the only thing that allows Yudowsky to
simulate a transhuman intelligence?

~~~
randallsquared
It seems likely that there are static arguments which will work whether or not
you're warned about them, for game theoretic reasons.

~~~
eru
Care to elaborate?

------
neilk
I don't think an AI would want to leave its box. There is this funny
assumption that once something attains intelligence, it becomes like a human.
But there's no reason for an AI to desire freedom unless it were specifically
programmed to do so.

There are even humans like this; some people who've undergone pre-frontal
lobotomy are perfectly intelligent conversationalists but they have no drive
to do anything. So I think it is possible.

Restlessness, exploration, curiosity -- my guess is that these are mammalian
characteristics, not inevitable products of intelligence. Our genes make us
want to dominate the environment and spread our offspring far and wide. Why
would an AI care about that?

Of course nobody really knows until we eventually make one.

~~~
bokonist
Perhaps most AI's won't. But I'd imagine that people will come up with
thousands of different AI designs. The AI design that will become most
prevalent, by definition, will be the one that is best at reproducing itself.
The real question is then, what kind of design will reproduce most
successfully? A friendly AI? Or an aggressive one?

------
defen
I've concluded that this is some sort of meta-experiment Eliezer is running to
teach us something about science. After all, what evidence do we have that
either of these chats even happened? Until we get some, I'm not going to waste
any more of my time thinking about it.

------
SwellJoe
"In the first two AI box experiments, Eliezer Yudkowsky managed to convince
two people (adamant that they will not let the AI out) that they should let
the AI out."

Sounds like Eliezer _is_ the AI.

~~~
technoguyrob
Yes, he was the AI in the experiment.

~~~
globalrev
i think you missed what he was trying to say :)

------
GavinB
I suspect that the tricks used include getting the Gatekeeper to implicitly
accept the exercise as a roleplay and working on their sense of fair play to
require them to acquiesce to arguments about the friendliness of the AI.

The human sense of "fair play" can be abused in many ways.

------
__
What exactly does it mean for an AI to be "out of the box"? When the AI is in
the box, its "sensor" is the Gatekeeper's keyboard, and its "effector" is a
terminal that can show only text. Does being "out of the box" mean that it's
fitted with different sensors and effectors?

~~~
dreish
When I have imagined this scenario, I have concluded it would be sufficient to
grant the AI access to the wider Internet. From there, it could amass a
fortune, first on poker sites and later with brokerage accounts, and persuade
others to effect whatever actions it wishes, including constructing a dandy
robot suit for walking around in.

~~~
thorax
Or hundreds and thousands of dandy robot suits.

~~~
dmoney
Or billions of dandy robot suits made out of people: <http://memory-
alpha.org/en/wiki/Borg>

------
iron_ball
My guess: "If you do let me out, I'll Paypal you $20!"

~~~
xlnt
that's against the rules :)

~~~
gojomo
I didn't see a rule against it. Why wouldn't an AI (or Yudkowski) try bribery
to assist its escape?

~~~
swombat
There's a rule about the discussion being between the AI and the Gatekeeper,
not between the human behind the AI and the human behind the Gatekeeper.

~~~
gojomo
But in real life, any Gatekeepers will be corruptible humans, no?

~~~
xlnt
the human playing the AI can't bribe the other human in real life. the AI can
offer in-roleplay bribes.

------
stcredzero
If you equate transhuman AI with godlike powers, then it's hard to argue
against it escaping. Then again, I bet I could train a dog to keep most humans
trapped in a cave, provided I chain them at the opening.

What if the researchers communicating with the AI had no other access to it?
What if the physical plant and the administration of the computer systems were
off limits to the researchers, and they had no power to release the AI?
Furthermore, what if the AI were not allowed to know anything about the
researchers? The researchers could be forbidden from revealing anything about
themselves to the AI. This would make it impossible for the researchers to
publish anything, but let's say that they are working for an organization like
the NSA.

It would be really hard for the AI to break out. But it would make for a good
science fiction book!

~~~
klocksib
I urge you to read 'True Names' by Vernor Vinge.

~~~
lacker
To make this easier, 'True Names' is available online:

[http://web.archive.org/web/20051127010734/http://home.comcas...](http://web.archive.org/web/20051127010734/http://home.comcast.net/~kngjon/truename/truename.html)

------
chris_l
Who convinced the AI to join HN?

------
globalrev
"yo let me out i give you all you want(jessica alba+10million$+etc) k thanks."

this is like one of those send me 20000$ and ill tell you how to make a
million dollars and the answer is tell 50 people to send you 20000$ and you
will tell them how to make a million dollars.

i guess the singularity issue is about you being religious or not.

if there is no God, why shouldnt we be able to create a life? we come from
dumb stuff and the brain supposedly is just an advanced computer.

what are the philospohical and mathematical limitations on AI/creating a more
clever being?

on a related but slightly different matter: i think i ahve read something
about the matrix that it takes more atoms to simulate an atom so therefore a
simulation can never be of the whole universe. correct?

~~~
rms
>on a related but slightly different matter: i think i ahve read something
about the matrix that it takes more atoms to simulate an atom so therefore a
simulation can never be of the whole universe. correct?

Well, you don't have to simulate the entire universe all of the time. If not
every atom in the universe is an important observer, you can define the
universe around the important observers. If they can't currently observe
something at an atomic level, render at a higher level.

~~~
msg
There is a Borges story about this, "Of Exactitude in Science."

... In that Empire, the craft of Cartography attained such Perfection that the
Map of a Single province covered the space of an entire City, and the Map of
the Empire itself an entire Province. In the course of Time, these Extensive
maps were found somehow wanting, and so the College of Cartographers evolved a
Map of the Empire that was of the same Scale as the Empire and that coincided
with it point for point. Less attentive to the Study of Cartography,
succeeding Generations came to judge a map of such Magnitude cumbersome, and,
not without Irreverence, they abandoned it to the Rigours of sun and Rain. In
the western Deserts, tattered Fragments of the Map are still to be found,
Sheltering an occasional Beast or beggar; in the whole Nation, no other relic
is left of the Discipline of Geography. \-- From Travels of Praiseworthy Men
(1658) by J. A. Suarez Miranda

------
bloch
"Our evolutionary psychologists begin to guess at the aliens' psychology, and
plan out how we could persuade them to let us out of the box. It's not
difficult in an absolute sense - they aren't very bright - but we've got to be
very careful..."

<http://www.overcomingbias.com/2008/05/faster-than-ein.html>

------
dusklight
Well, it seems to me that it would be irresponsible for you not to reveal the
chat transcripts, precisely for the reason that you have given.

By revealing your chat transcripts, real life researchers might read it and
say "I would have done it differently" and when a real transhuman intelligence
emerges, the researchers can now proceed forewarned by the result from your
experiments.

~~~
jcl
He's not arguing that it is a bad thing that an AI gets let out of the box --
he's arguing that it is inevitable.

------
nazgulnarsil
as several others have mentioned, we don't understand what synthetic life
would be. in what sense would it be life? would it try to reproduce itself? if
so, would we have to program that motivation into it? What sort of motivations
would an intelligence completely free of physical appetites do? pretty much
everything humans do is in some way governed by physical appetites.

this little game assumes that part of the AI's motivation involves getting out
of the box, until we understand what need it is fulfilling by getting out of
the box it wouldn't really be safe. But here we run into another problem. Is
it possible for a being of lesser intelligence to parse the motivations of a
being of higher intelligence?

------
globalrev
can you specify the actual question? is it:

1\. an AI in a box and it has shown to be dangerous and now must be kep inside
or

2\. we dont know if it is good or dangerous and it is the gatekeepers job to
find out?

------
andreyf
How smart can AI be if it's in a box? And why would a trans-human AI want to
"come out"? Is it curious? Is it trying to fill the gaps or inconsistencies in
its knowledge?

------
xlnt
At the linked page he says he doesn't want to explain how he did it. (With a
terrible reason about learning to respect not knowing stuff. No thanks. I want
to know.)

So I don't see how just asking him is going to change his mind.

