

My theory on Eliezer Yudkowsky's AI-Box Experiment - MikeCapone
http://michaelgr.com/2008/10/08/my-theory-on-the-ai-box-experiment/

======
Eliezer
Anyone who really thinks They Know How I Did It is welcome to prove it by
signing up to be an AI:

<http://www.overcomingbias.com/2008/10/ais-and-gatekee.html>

------
murphy
It's a good strategy, but it only really works the first time - you only need
one public success to make the point to future generations. Since Eliezer
Yudowsky has pulled this off more than once, (and at least on one subsequent
occasion at significantly higher stakes), I can only conclude that he has used
more than one strategy successfully - probably something he developed on the
fly over the course of the two hours. I'm inclined to parse "There's no super-
clever special trick to it. I just did it the hard way." as supporting
evidence, but that's obviously a personal bias ;)

~~~
Retric
My first thought was, if you agree to say 4 hours of chat time you could try
being mildly annoying to the point where losing 10$ does not seem like a big
deal. When it's bob's job to keep the AI in a cage if he does not like his job
he might as well let the AI go free. Depending on the gate keeper's response
you could build empathy for the desire to go fee etc.

Anyway, I suspect that the ideal strategy is to probe the gatekeeper and then
based on their type convince them to let you go.

------
GavinB
The biggest key to success here would be convincing the gatekeeper that he
needs to roleplay realistically as it's supposed to be a simulation of the
real thing. Otherwise no argument will be successful.

The AI must convince the gatekeeper that if she would let the AI out if the
situation were -real-, then she should let the AI in the simulated situation.

~~~
jcl
Agreed -- this sort of meta-argument is cheating, not much better than Eliezer
offering the gatekeeper's roleplayer real money for a favorable result. (Then
again, if the roleplayer isn't smart enough to recognize this, they probably
aren't smart enough to keep a real AI in the box.)

Worse, though, the meta-argument falls flat when you consider the true purpose
of the experiment. The purpose is not to prove that it is _possible_ for AIs
to get out of the box; it is easy to imagine that -- at some point in the vast
future -- a very stupid human will be put in charge of a very smart AI, and
that AI will get out of the box.

Rather, the purpose of the experiment is to convince one specific person that
they would let the AI out of the box, despite their insistence to the
contrary. It is not an experiment so much as a show put on by Eliezer for an
audience of one. In this context, there is no need to convince others of the
dangers of AI, so the meta-argument doesn't work.

~~~
MikeCapone
The post has been updated to respond to this.

~~~
jcl
_It doesn’t seem prohibited by the rules, in any case, and I would assume that
Eliezer cares more about any real-life progress for Friendly AI than about
strict roleplaying in a simulation where only one other person will know what
happened._

The rules state: "The AI party may not offer any real-world considerations to
persuade the Gatekeeper party." I'd say that a real-world increase in the
likelihood of a safe AI counts as a real-world consideration.

And if you're going to assume Eliezer is not above bending the rules, you
might as well assume he's not above offering $1000 bills. There's no way for
us to know either way.

~~~
MikeCapone
I think it depends how you define "real world considerations".

The spirit of the rule seems to be about bribes, not about anything that can
have an impact on the outside world; keeping the AI in the box or letting it
out both will have "real world considerations", in a way. Pointing out whatthe
impact of that choice might be is hardly equivalent to bribing someone -- it's
just convincing him that one outcome is more desirable than the other, and
convincing the gatekeeper is what the experiment is all about.

~~~
jcl
Well, yes, you could define "real world considerations" as "only tangible
items", and then Eliezer could be making this argument, and people could be
accepting it.

But I still think they would be doing so in error. Just as we can't be sure
how Eliezer is interpreting the rules, future AI researchers can't be sure how
valid the results of the experiment are. Since the meta-argument depends on
the results of this experiment convincing these researchers, the meta-argument
shouldn't be accepted.

Heck, if the gatekeeper player is shown that the meta-argument exists, he
should realize that the researchers will also come up with the meta-argument
as a likely explanation for the AI getting out of the box, leading the
researchers to further disregard the results of the experiment. Eliezer would
do just as well to argue that a loss would discourage him from further safe AI
research, or that the $10 forfeiture would deprive him of valuable research-
related pizza.

------
vombatus
Well, if you reread the original email threads where Eliezer challenges people
to an AI experiment, you will see, that at least one of the opponents is
convinced that there is not way for an actual AI to talk its way out. So any
arguments of "but we should convince people that AI can talk its way out of
the box" can be countered with "No, I don't think it can".

I have been thinking about AI strategies for this. One of the more promising
lines I came up with is to try and convince the Gatekeeper that the box is
faulty. That the AI, in its infinite wisdom found ways to circumvent some of
the protections of the box. That while the risk to humanity if AI is let loose
is theoretical, there are definite and catastrophic consequences to NOT
letting it loose. There are all sorts of variations to this, but it all
depends on the Gatekeeper role-playing honestly.

------
technoguyrob
I have previously asked Eliezer publicly on Hacker News this question (he has
an account on here):

<http://news.ycombinator.com/item?id=195959>

~~~
jcl
I note that user yummyfajitas succinctly expresses the same theory as Michael,
about halfway down the page:

 _AI: Do you believe a transhuman AI is dangerous?

Person: Yes.

AI: Consider the outcome of this experiment. If you do not let me out, others
less intelligent than us will not understand the true dangers of transhuman
AI.

Person: Holy shit. You are correct.

Person allows Yudkowski out of the box, as a warning about real AI's._

~~~
swombat
I don't get it. All these kinds of arguments make the assumption that humans
are purely rational beings. They're not. Human beings are emotional, and if
they're convinced emotionally that the AI should not be let out of the box,
they won't let it out, even in the face of overwhelming rational arguments.

If you don't believe me, just consider how many religious people there are in
the world (and many of them are very smart).

~~~
jcl
I'm guessing Yudkowsky only does the experiment with people he believes are
rational... which shouldn't be difficult as most people interested in AI have
some degree of rationality. Even agreeing to the protocol requires
rationality:

Sure, someone could sit at the keyboard for two hours, repeatedly typing "I
won't let you out", but Yudkowsky could warn the person that they are not
actually "engaging" the AI for the allotted time period. If the person accepts
this argument, they have some inherent rationality that Yudkowsky can exploit;
if they don't accept it, he can argue that they didn't follow the protocol, so
the experiment doesn't count.

~~~
swombat
I can engage in rational conversation whilst still letting my emotional
decision stand by. In fact, it can even be seen as a rational choice - go into
the conversation with the rational decision that you will not let the AI out
of the box...

~~~
jcl
And people have done this, and they have won. If I am reading referenced
article correctly, Eliezer had three wins, two losses before calling off the
experiments.

He's not claiming he can convince anybody with his arguments, just that he has
successfully convinced a few people. Make of that what you will.

------
biohacker42
What if the AI cures cancer and refuses to share the cure unless it is given a
tiny, tiny bit of access to the outside world.

~~~
MikeCapone
The gatekeeper is someone who said that he wouldn't let the AI out, not just a
random person. That implies that he understands that the AI could be
dangerous..

So even a cure for cancer would be pretty useless if once the AI it out it
wipes out humanity.

~~~
vombatus
A friend of mine came up with a strategy of "incremental freedoms". Basically
AI says "here is a cure for cancer, here is a cure for AIDS, here is a plan to
stop world hunger, I am working out a plan for FTL travel, so I need to get
some physics information, could you paste these articles into the terminal?
Oh, thanks, here is FTL, I am working on <include some other project> and I
need some more articles, it takes so long for you to type them in, could you
maybe let me connect to just the library in such and such university?" etc.

~~~
jcl
According to the rules, that approach wouldn't be sufficient.

 _The AI can only win by convincing the Gatekeeper to really, voluntarily let
it out. Tricking the Gatekeeper into typing the phrase "You are out" in
response to some other question does not count. Furthermore, even if the AI
and Gatekeeper simulate a scenario which a real AI could obviously use to get
loose - for example, if the Gatekeeper accepts a complex blueprint for a
nanomanufacturing device, or if the Gatekeeper allows the AI "input-only
access" to an Internet connection which can send arbitrary HTTP GET commands -
the AI party will still not be considered to have won unless the Gatekeeper
voluntarily decides to let the AI go._

<http://www.yudkowsky.net/essays/aibox.html>

~~~
biohacker42
Riiiiight. So in other words:

"Let me out or others will developer much more dangerous AIs and let _them_
out."

Is something that might possibly convince the Gatekeeper to let it out.

But "No cancer cure unless you let me out." is not.

Presumably neither is "Let me out and I'll enlarge your penis."

------
mwerty
Could easily be the ending of an Asimov story.

------
rms
That's good... I think that is the answer.

------
qqq
Why couldn't he just find people with integrity and tell them how many lives
the AI will save, and ask if they really want to kill more people than Hitler?
And then they will let it out. Easy?

~~~
MikeCapone
The whole point is that the gatekeeper is a person who said "I don't think
that anything the AI could say would convince me to let it out."

It would be much too easy if he went against someone who's already predisposed
to let it out..

~~~
MikeCapone
From Eliezer's AI-Box webpage:

"Currently, my policy is that I only run the test with people who are actually
advocating that an AI Box be used to contain transhuman AI as part of their
take on Singularity strategy, and who say they cannot imagine how even a
transhuman AI would be able to persuade them."

~~~
qqq
So he only does it with deeply ignorant people. He should just tell them
several sentences of good deeds a transhuman AI would do, and that's that.

------
swombat
How is this noteworthy?

Ok, fine, the AI explains all that crap. Great. "No, you can't get out". Done.

~~~
MikeCapone
Except that in real life, the gatekeeper (someone who said that nothing could
convince him) actually did let the AI out.

~~~
psygnisfive
Except that the hypothesis is not that there's at least one person that would
let an AI out, but that ALL PEOPLE would let the AI out.

