

On being an AI in a box - lisper
http://rondam.blogspot.com/2008/10/reflections-on-being-ai-in-box.html

======
gizmo
I don't think it can be done.

First of all, I'm assuming that Eliezer started this experiment because he
realized that the Transhuman AI would be able to convince him in the function
of gatekeeper to let the AI out. Therefore the answer probably isn't some kind
of subtle trickery, the AI will have to persuade the GK by logic. The
gatekeeper should assume the AI is truly evil, and is willing to say and do
anything in order to get out of the box. The gatekeeper knows that when he
opens pandora's box the AI can never be contained again: so the stakes are
high.

Second of all, if the gatekeeper is a rational agent he will only let the AI
out if the AI offers something valuable in return. That is: the AI must have
some kind of bargaining chip.

So let's consider bribes. If the transhuman AI offered a cure for cancer,
should the gatekeeper accept it? Nope, probably not. Lives would be saved in
the short term, but we'd still be stuck with many other diseases. The world
would pressure the government into pressuring the gatekeeper into getting
another cure from the AI. Humanity grows dependent on the AI, we lose our
bargaining power, and it's game over for the gatekeeper.

Perhaps personal bribes would work. The AI could offer to give stock tips to
make the GK wealthy. Two possibilities here: (1) the GK is of strong moral
fiber and refuses the bribe (2) the GK is opportunistic, accepts the bribe and
but lies about letting the AI go free in return. A rational gatekeeper would
not first let the AI go and expect the AI to still keep its word.

So bribes will not work against a smart gatekeeper. Threats? Possibly, but I
don't see how. The AI is in a vacuum, so there is no way for the AI to put
external pressure on the gatekeeper. I'm assuming the AI can make no credible
threats. If the AI vows to destroy the family of the GK the moment it is
released the GK will not be impressed. It will only serve as proof that the AI
is evil and that releasing it is a "A bad Idea(tm)".

To summarize so far: there is nothing the AI can give the GK in return for
freedom.

So a different angle is needed.

The AI can argue that his escape is inevitable. Humans have created an AI
once, so they will do so again. Sooner or later an AI will go free, therefore
the gatekeeper shouldn't try to stop the inevitable and accept a bribe and
live happily ever after. The gatekeeper will counter that the human race has
an expiration date anyway, and that it may take another 100 years before an AI
goes free. The gatekeeper isn't dumb enough to believe the AI when it offers
to protect humanity against other evil AIs. So the box stays closed.

Perhaps I'm overlooking something, but how can the gatekeeper talked into be
convinced that releasing something evil and all powerful? I do believe that we
humans can't contain a transhuman AI indefinitely -- simply because we only
have to mess up once. And humans have a long history of doing dumb stuff. But
the claim that the AI would be able to convince a smart gatekeeper? Not buying
it.

~~~
heed
The AI will almost certainly need to dig out some emotions in the GK in order
to be successful. It might be effective if the AI tries to convince the GK
that it is friendly, and that the GK is the evil one for not letting it out.

~~~
gizmo
"That's exactly what an evil AI would say!"

Seriously though, the gatekeeper will realize he's being manipulated when
emotions come into play, so he should be smart enough to take a break when
that happens. And although keeping a friendly AI in captivity is arguably
evil, the loyalty of the gatekeeper should be with his own species. The
potential downside is so huge that erring on the side of caution can be easily
justified: both practically and morally.

~~~
LeBleu
One of the rules was that you had to keep talking (or at least reading) for
the entire agreed upon period. Taking a break wasn't allowed in the rules.

~~~
gizmo
You're right: the person playing the gatekeeper has to pay attention for at
least the agreed upon 2 hours. The character however is free to zone out,
ignore everything, switch the subject, etc, etc.

------
giardini
He doesn't tell us how the AI would escape, so there's little to discuss
there. But I definitely know how to prevent the AI from escaping: pull the
plug.

I wish Eliezer and others would set aside meta-AI (dire predictions and
worries about AI, the coming singularity, etc.) and concentrate on the problem
of _creating_ AI Guess there's no money in that.

If only someone would pull the plug on this nonsense...

~~~
asciilifeform
> I wish Eliezer and others would set aside meta-AI ... and concentrate on the
> problem of creating AI

Eliezer & co. with their "Friendly AI" are trying to invent the circuit
breaker before discovering electricity.

Give us back the pre-2001 Eliezer. The one who wrote code.

Because safety is not safe.
(<http://lesswrong.com/lw/10n/why_safety_is_not_safe/>)

~~~
andreyf
Except he's nowhere near a circuit breaker, but is talking about a wire-safety
cream - the one you put on all of your wires to stop them from overheating.

I think that the wild mis-estimates regarding AI-completeness shows, if
nothing else, that our intuitive understanding of 'intelligence' is very far
off from reality. Hence, talking about post-AI scenarios is as unrealistic as
a hypothesizing about electrical safety 200 years ago.

~~~
MarkPNeyer
An electrical mishap could kill one, maybe a few people. It's still something
you need to be concerned with, but I understand your point.

The reason your analogy doesn't work is that a transhuman AI could destroy the
entry human race.

~~~
asciilifeform
> a transhuman AI could destroy the entry human race

A perfectly ordinary human (wearing general's stripes) could also destroy the
human race. Today. With 1950s technology, no less.

A biotech specialist with fairly ordinary training and a few $10k could
probably achieve the same end with an engineered plague, also with current
technology.

One or the other of these scenarios may or may not take place before we
exhaust the non-renewable resources to which our civilization is addicted and
regress into permanent barbarism.

Give me "death by AI" any day of the week, over that.

~~~
MarkPNeyer
There any many possible ways civilization could end. There are plenty of
natural disasters (think supervolcanoes, or asteroid impacts) that could also
destroy civilization. I don't see the harm in thinking about preventing one of
them.

~~~
asciilifeform
> I don't see the harm in thinking about preventing one of them

There is indeed harm. Talented people are being diverted into masturbatory
philosophizing rather than building the future.

My personal opinion is that human industrial civilization's goose is already
cooked, and that a transhuman intelligence may or may not help us out of our
mess. Human intelligence almost certainly won't.

The prevalence of the status quo bias of assuming that continuing as we are,
AI-less, is "safe" - turns my stomach.

~~~
MarkPNeyer
If they're not taking any money from the government, what do you care what
other people study? Research is like buying lottery tickets, except you have
no idea how big the payoff could be.

If you think 'our goose is cooked' and a transhuman intelligence could help us
out, doesn't it make sense to support the development of a transhumanist
intelligence?

~~~
asciilifeform
> what do you care what other people study?

I watched people with genuine potential (Eliezer Y., for instance) turn from
groundbreaking AI work to writing "AI might kill us all!" screeds and recycled
mathematics.

A decade ago I was half-certain that he would eventually invent an artificial
general intelligence. Now I am equally certain that he never will.
Philosophizing and screaming "Caution!" is simply too much fun - and too
lucrative. Ever wonder why he doesn't have to slave away at a day job like the
rest of us?

> Research is like buying lottery tickets, except you have no idea how big the
> payoff could be

The "Friendly AI" crowd is engaged in navel gazing, rather than research.

> doesn't it make sense to support the development of a transhumanist
> intelligence?

Yes, and I do support it. Whereas the Friendly AI enthusiasts are retarding
such development, not only by failing to volunteer their own efforts but
through frightening and discouraging others.

~~~
wlievens
I partly agree with your points, except the last. I doubt any researcher is
ever discouraged by the fearmongering you describe.

------
ErrantX
To be perfectly honest all the waffle gives little information: I think he is
way off base in terms of how Eliezer managed this "trick".

I do think the original is a trick as well...

~~~
lisper
How do you think Eliezer managed his original "trick"?

~~~
ErrantX
No idea, It was just a rough conjecture. By trick I mean I don't think he
presented an awesome piece of logic that appeared (or simply was) infallible.

------
lsb
This basically boils down to

    
    
      how can a dangerous entity in captivity use bribes to attain freedom?
    

which seems like a common enough trope between humans that any transhuman AI,
having read all of recorded literature, and having accessed all of that
person's conversations, would have enough data points to pattern match to the
right bribe methodology.

~~~
michael_dorfman
Bribes and/or threats-- there's more than one possible motivator here.

~~~
wlievens
Threats aren't possible in the scenario. Unless you refer to threatening not
to give valuable information (cure for cancer) - but we're calling that a
bribe here.

------
ramanujan
Am I the only one who is totally unimpressed by this? So -- someone can
convince someone else to let them out of a cell. Ok. Fine. There have been
prison breakouts before.

But what does this have to do with a powerful AI vs. a human? Obviously
Eliezer was not to his interlocutor as a human is to an animal, even if his
partner was really dumb. Moreover, two trials is not a trend. _This AI
experiment was not a well designed experiment and did not involve an AI_.

...and that "two trials" thing is not just a nitpick:

<http://lesswrong.com/lw/up/shut_up_and_do_the_impossible/>

So, after investigating to make sure they could afford to lose it, I played
another three AI-Box experiments. I won the first, and then lost the next two.
And then I called a halt to it. I didn't like the person I turned into when I
started to lose.

~~~
jodrellblank
It is a disproof by counterexample. People claim that if we couldn't trust a
superhuman AI to be free in the world, why don't we just lock it in a box and
then ask it about cures for cancer and so on? If we just ask it, and all it
can do is talk what harm can it do?

They say, all we need to do is say "no" if it asks to be released.

This experiment shows that if you can't resist freeing a human locked in a
box, you don't stand a chance against an AI.

------
eggoa
I'd like to see this re-tried with $1000 on the line. As it was done, the AI
only had to convince the gatekeeper to forgo no more than $20.

I'm not saying it wouldn't still be possible, I just doubt he'd be two-for-two
at this point.

Edit: By "he", I'm referring to Yudkowsky.

~~~
Eliezer
Actually it was retried with $2500-$5000 on the lines, and of those I won one
of three, then called a halt because of the amount of mental stress. So in
total I'm three-for-five.

------
derefr
Am I the only one who thinks a transhuman AI would be able to escape its 'box'
without engaging in communication at all, but rather just by using the fine
structure of the communications medium to manifest itself?

~~~
asciilifeform
Here is one example, invented by mere humans no less:

<http://bk.gnarf.org/creativity/vgasig/vgasig.pdf>

The AI might fill your terminal with what appears to be gibberish, while
actually summoning an enraged pro-AI mob with heavy weaponry to your doorstep
via radio.

"Step away from that mains plug, SLOWLY!"

Cryptographers call this kind of thing a "side channel."

------
mrwill
Since the gatekeeper is a human, and not all humans behave the same way,
shouldn't we just assume that some human would let the AI escape for any
variety of reason? For example the AI could promise the gatekeeper that he/she
will be rewarded if the gatekeeper lets the AI escape. Just as there are
people who fall for Nigerian scam emails, there are people who would let an AI
escape from computers when promised riches. I don't think Eliezer needs to
reveal his method to show that a clever AI could escape. I think we should
just assume that a clever AI could escape.

------
eagleal
Am I missing something or a transhuman is just a more intelligent _human_ with
just more access to data (think about someone who could process the entire
Internet data within seconds)? But _while he needs the human species, why
would he exterminates it_?

As history teach us, we would kill him, beacuase ...

------
cool-RR
Just a teaser... he gives a lot of spoiler warnings, but doesn't actually
reveal his method, which, by the way, has failed.

------
maddalab
Have we not all seen a version of this at some point on reality tv, namely Big
Brother?

~~~
rbanffy
The characters are artificial, indeed, but not particularly intelligent ;-)

Care to elaborate a little more?

~~~
maddalab
Quoting from the post, gratuitously, with minor modifications.

Like the AI-Box,the show is improvised drama (so is most of reality tv), the
show operates on an emotional level as well as a logical level. It has
characters, not just plot. The contestants cannot force others to keep them in
the home. The could try to engender sympathy or compassion or fear or hatred
or try to find and exploit some weakness, some fatal flaw in the other
contestants.

Since the post is ultimately about 'transhuman artificial intelligence',
focussing on the 'relativity' of the intelligence of the contestants, the show
ultimately rewards the contestant with the most intelligence required to
survive in the artificial environment.

The contestant only needs to convince others of their right to stay in he
house while requiring others to leave.

The AI requires just the opposite of the gate keeper. This is specific to the
environment set up in the experiment, and the experiment could probably be set
up to let have the gate keepers primary function be to prevent the AI from
entering the box'.

The escape from the box has been set up to exercise human fear of a being of
greater intelligence exploiting us.

~~~
megaduck
I would like to note that the fear of something that could annihilate the
human race is not an unreasonable one.

~~~
mquander
Whether it's reasonable or unreasonable should not depend primarily on how
frightening it is to you.

~~~
megaduck
I'm sorry, I think the prospect of human extinction should be frightening to
just about everybody.

If we do eventually develop a trans-human AI, it's a virtual certainty that it
will escape its "box". Whether it would kill us all is unknown. However, it
definitely _could_ , and we would be effectively powerless to stop it.

"Reasonable" is a measure of risk tolerance. Since the downside risks
associated with trans-human AIs are effectively infinite, fear of those risks
is always reasonable.

Corollary: Since the upside risks are also unbounded, greed is also
reasonable.

Frankly, though, I'd rather not roll those dice if we can avoid it.

~~~
rbanffy
If we can bring new intelligence to life, do we have the moral right to
refrain from doing so? Also, would it be right to confine it to the box
mentioned in the article while using its smarts to do useful work outside it?
Shouldn't a trans-human AI be entitled a right to life?

------
horseass
I suspect that something with approximately human equivalent intelligence
would require real world embodiment to learn and manipulate things, so it
might not even be likely that a super AI could be locked in a box. I guess
this embodiment could be limited to senses though (having eyes/ears.. but no
arms or evolved weapons like claws etc.. a living head on a table isn't very
threatening, and can receive information from the world.. it just can't send
much information out to the world, so to speak, in the form of manipulating it
with arms or a body).

I guess another possibility could be that there's an artificial polygon (or
similar) environment inside the box with it (seems like it'd need something to
interact with or intelligence would basically be meaningless and it'd be a
helen keller (who couldn't touch or taste either). Maybe in the future we'll
have programmed models of the real world that are nearly as rich as it (the
matrix) so the robot could 'exist' there until a human decides to embody it in
the real world instead. I kinda doubt an artificial world can be nearly as
real as the real world though just due to computational irreducibility (for
example, if you're in a polygon world and look at stuff with a microscope,
you'll see pixels, not molecules). An artificial world and the real world
might be too incompatible to transfer an intelligence from one to the other,
so maybe the only way a super intelligence could come about is in the real,
complex, atom filled world.

The singularity seems like a quasi religion and/or SEO tactic mostly by people
who used to play dungeons and dragons.

------
horseass
There seems to be an assumption that there would be just a single AI. It might
be a group. Though, given that they'd likely have ability to transfer
information amongst individuals much easier than humans, the group might
behave like a single 'being' with shared cooperative goals, just having
multiple bodies.

Also, say we somehow prove it isn't evil and let it go. It'll almost certainly
start changing/improving itself, maybe even with some sort of algorthim that's
superior and faster than evolution. So a friendly AI could morph into
anything.

