Hacker News new | comments | show | ask | jobs | submit login
How can we be sure AI will behave? Perhaps by watching it argue with itself (technologyreview.com)
53 points by rbanffy 7 months ago | hide | past | web | favorite | 55 comments

> To prevent the system from doing anything harmful or unethical, it may be necessary to challenge it to explain the logic for a particular action. That logic might be too complex for a person to comprehend, so the researchers suggest having another AI debate the wisdom of the action with the first system, using natural language, while the person observes


> Having AI programs argue with one another requires more sophisticated technology than exists currently. So thus far, the OpenAI researchers have only explored the idea with a couple of extremely simple examples. One involves two AI systems trying to convince an observer about a hidden character by slowly revealing individual pixels.

How is this even remotely linked to two programs arguing or explaining reasoning? It's one network being trained to change pixels in an image in order to make the detecting network perform worse.

Why do so many articles anthropomorphism artificial intelligence? These networks are assigned to do a certain task given a certain reward. There is no motive apart from reducing the loss function provided. The reasoning or rationale is that whatever decision the network is making is serving to reduce the error. Whether this has some deeper reasoning that humans would also use is a separate question but to attribute human style reasoning to the network is a stretch.

> Why do so many articles anthropomorphism artificial intelligence?

Clicks. People click on things they can relate to. "We did some matrix multiplication that resulted in more insights" probably won't cut it :)

This is about the Debate paper, and is an idea that is being taken quite seriously - https://arxiv.org/abs/1805.00899

The idea is that the given example (explained in more detail in the paper) is the first step towards understanding how such a system might work.

I actually think the first problem occurs already in the first part you quoted: Having had conversations about thorny subjects in front of children, or even adults sometimes, without the third party understanding what you're actually saying is not that hard. You poke and prod at the boundaries of which words you can use until you find terms that relate to your shared experience but sounds sufficiently harmless to the observer, and then you talk straight past them.

Policing what someone talks about only really works if you can be certain that the parties are not potentially prepared to collude. If the first AI starts with something akin to "psst... listen, I need to explain something to you in confidence.. [insert reasons why the human observer can't be trusted]", what's to say they won't occasionally find a sympathetic virtual ear.

AI control is a false hope I'm afraid.

"Still, some AI researchers are exploring ways of ensuring that the technology does not behave in unintended ways."

There are two threats that are both problematic.

First is exploiting an AI's weakness, just like an SQL injection on the web currently. Then a powerful AI can behave in unpredictable ways because we trust the trained models.


The second is far greater. If we find a safeguard that will protect AGI-s from behaving in unwanted ways then there always will be someone out there who will just turn it off for profit.

I am not sure why this point is not considered more seriously (maybe it is and I have just missed it). I feel that this article and most of the literature being written around this topic is skating around the issue without naming it. The real question here is what are we going to do when the AI we create begins to self replicate/augment their code/control logic? What are we going to do when the AI decides that in order to reduce the loss function they must act outside of the control scope and simply makes changes to it?

Some will say "not possible, how will the code recompile?" to which this article clearly answers...another AI will do it. We have to be honest with humanity here and say that we are seeking to create artificial life of which we will have no more control over than we do any other human. Oh and this new artificial life will be orders of magnitude more physically and mentally robust than us.

I think it's because there is only one obvious solution:

prevent AI from ever being developed

And how we could do that is too pessimistic to talk about.

> Some will say "not possible, how will the code recompile?"

Read own code, make changes, recompile, redeploy and replace self (or not, just start new fork) with newly upgraded mind.

Is there any consensus on whether we can sandbox AGI-s even in theory?

In practice, they'll probably be smart enough to convince us to turn off the sandboxing anyway, of course.

You could just not include any network capability, put them in a Faraday cage, and don't provide them with actuators capable of breaking out. Tada, a safe sandbox.

That said, there is no reason to believe an AGI would have a desire to break out of its sandbox (or any other desire). Why not provide them with the digital equivalent of agoraphobia?

> You could just not include any network capability, put them in a Faraday cage, and don't provide them with actuators capable of breaking out. Tada, a safe sandbox.

And remove all I/O capabilities to prevent them from convincing you to do their bidding.

What's the point of having an intelligence that cannot interact with the outside world though?

If a super-intelligence can interact with you, it can manipulate you; removing some but not all ways it can interact with the outside world is only incremental security. On the other hand, as you say, if it can't interact with the world, there's no point.

So we might as well give direct control over nuclear missiles from the start /s

Only torture

Do you supply it power? Then you it could induce feedback over the cable which triggers new programming in nearby electronics.

Does it have any means of creating sparks/heat would could be used to make a hole in the faraday cage?

What about neutrinos or gravity waves? How would you stop those?

> Do you supply it power? Then you it could induce feedback over the cable which triggers new programming in nearby electronics.

We could feed it through power a little solar panel. Or just give it batteries and accept that it will run out of power eventually. There are plenty of ways to power a computer without allowing it to connect back.

> Does it have any means of creating sparks/heat would could be used to make a hole in the faraday cage?

Assuming it has no access to electronics other than the ones that it needs to function, I don't see how it could overheat one of those parts to over 1811 Kelvin (the melting point of iron) to destroy its Faraday cage without destroying itself. Or we could just cut the power when it tries to mess with the cage, or does anything else that we don't understand.

> What about neutrinos or gravity waves? How would you stop those?

Why would we? We need very expensive detectors to even measure those. How much damage could it do using neutrinos and gravity waves?

> We could feed it through power a little solar panel.

A solar panel seems 1-way, but it isn't totally. Fewer photons would be absorbed if say, higher resistance would be applied. By modulating that resistance, it could sent messages to the outside.

> we could just cut the power when it tries to mess with the cage, or does anything else that we don't understand

If we understand everything it's doing, it's not more intelligent than us. It must be incredibly stupid because we can't even understand a worm.

>> What about neutrinos or gravity waves? How would you stop those?

> Why would we?

Because otherwise it can establish 2-way communication with the outside world and possible nefarious actors.

If we're talking control over neutrinos and gravity waves and such, then it could potentially just re-write your neurons and make you want to help it.

How can you be sure that there are no ways to invalidate your countermeasures regarding AGI if you aren't as intelligent as one? Comments like these make me think about a bunch of cavemen trying to contain the terminator in a wooden cage, except this terminator probably will have a gigantic intelligence and thus be incredibly persuasive amongst all the things we can't imagine.

The terminator is incredibly strong and could easily break out of a wooden cage. Like I said, don't give it powerful actuators. A more reasonable approximation would be a terminator head locked in a big metal cage. It might be incredibly intelligent (although there is really no reason to assume that an AGI would automatically be smarter than an ordinary human, especially the first few generation), but its persuasion and charm is limited by the fact that it is a detached metal skull with glowing red lights for eyes.

In addition, the cage could be design so that the power source for the AI is interrupted when the cage is opened, rendering it unconscious. Of course once you assume that it is arbitrarily more intelligent than the most intelligent humans, it might find a way to hypnotize people into doing its bidding just by winking in some magical pattern. But we don't see many dumb creatures being effectively mind controlled by smarter creatures in nature, so I doubt such flaws (if they even exist) are easy to exploit, even for a gigantic intelligence.

In nature, any given creature is only marginally smarter than any other.

Compared to a decent AGI, for instance, our level of intelligence is probably indistinguishable from that of your average bird or spider.

We're talking about birds getting together to design a system to contain a human, only on a much more extreme scale.

They'd try to contain us using bird-logic. Perhaps they'd ensure we don't attempt to flap our wings and fly away, since their experience tells them that might be an option.

It might never occur to them we could just kneel down and start a fire using the sticks we have lying around.

We keep getting back to the actuators. If the birds would "design" a human being with the intention to keep it in a cage, why would they provide functional arms and legs? Don't imagine a fit human being. Imagine Stephen Hawking in a bathtub. He's not going anywhere.

Besides, an untrained human won't just figure out how to make fire. Obviously we figured it out at some point, but if you were to raise a human without ever showing them how to make fire, they would probably never figure it out. Many of us know (in theory) how to make fire but probably couldn't do it in practice just by rubbing two sticks together. And even if you are an expert fire-maker the birds can easily claw out your eyes before you get a fire going. Or you'd die because you're locked in a cage that is on fire.

You make several assumptions that I disagree with:

1) That AGI will automatically be (much) smarter than a human 2) That AGI will be motivated to break out of its cage and/or do harm. 3) That sheer intelligence is sufficient to escape any trap.

Having said that, if you enjoy those assumptions I would recommend you read 'Blindsight' and 'Echopraxia' by Peter Watts if you haven't already, I think you would enjoy them :)

'Blindsight' was great. Probably in my top-5 Sci-Fi of all time.

I hadn't heard of 'Echopraxia'- thanks for the recommendation!

The main assumption I make with AGI is that we'll be very bad at hitting the sweet spot between 1) "toy", 2)"human-level intelligence", and 3) "beyond human-level intelligence".

I doubt we'll stop at any point between (1) and (2). And then by the time we get even a decimal point beyond (2) and toward (3), we're toast.

To follow your analogy, it would be like birds trying to make a bird-like creature and keep it contained, but overshooting. By the time they realize they've overshot, they've got a human in their cage, and that human has 10,000 bird generations of time to evaluate his environment and think about escaping.

We can guess pretty well that the birds are out of their depth, and were doomed from the start.

People underestimate MCTS greatly

It beat Rybka and every other bespoke chess program in existence

AlphaZero even beat the version of AlphaGo that was trained on existing data.

Point is — we have no idea what is possible when you unleash MCTS towards goals like persuasion, humor etc.

The ONLY constraint here is that feedback requires a human judge. If you can learn to simulate a human based on lots of data (this is a hard part) then you can finally have a two sided game playing each other and find the best diets, the best diagnoses utilizing symptoms, tests and big data around the world of incidence of outbreaks etc.

However it also means quickly finding the funniest jokes, most convincing but subtly flawed arguments, best artworks etc.

In your example, the Terminator, though locked in a wooden cage, would have actuators (arms, legs, motion). The parent specifically proposes not to equip the AI with those.

The image provided is not how AI would argue, it's how the author thinks it would do. Is it because it is fun to see how retarded AI could look like from the point of view of a human? When real AI emerges, nobody will be ready and recognize it, partially because of these articles, humanizing neural networks.

While it strucks me as a very creative concept, I also find it deeply flawed. Here are my concerns - perhaps you can help me eliminate them:

1. The assumption that the ability to provide logical, convincing explanation indicates ethical behavior/intentions. First, I think there are many sitautions in which you can come up with perfectly logical arguments supporting actions which humankind considers harmful. Agent Smith was quite convincing for example ;) Jokes aside, I think it boils down to simple rule - the one who is a better orator/rhetorician/liar wins. Just like in (organic) life, but in case of AI being better comes down to having access to bigger resources.

2. The assumption that the "honest" AI will truly act according to humankind's values. If we can be sure about that we don't need this debate at all.

If you consider satisfactory explanations to be yet another game of chess etc. then AlphaGo and MCTS will be able to justify some of the most outrageous things by the most convincing path.

Here is the thing about AI. We think in terms of logic and reasoning. The AI does a search towards a goal.

Almost all our systems are designed assuming the inefficiency of an attacker. For example voting — we assume limited sybil attacks and coersion a la Eagle Eye.

But I am also talmung about basic human trust, humor, and quality art etc.

What if a computer algorithm would far exceed humans at humor, generating or undermining trust, taking over voting systems, making legal arguments, predicting crime, and so on?

We would be locking people up based on CCTV and data correlations only and society would trust the computer way more than a liberal democracy. See China for an example that’s unfolding in front of us.

A man with a bot could charm more women or customers online than a regular person.

Bots would negotiate better deals on average and write funnier jokes and make a glut of various types of art to the point that it loses all scarcity.

Bots could even have better sex, answer questions better and replace the need for other humans.

Think I am exaggerating? How often do you spend tine with your parents vs google or facebook NOW?

I find it a little horrifying that there seem to be people who want to create a being vastly more intelligent than themselves, capable of thought and reasoning, able to solve problems creatively, and then debate how to properly enslave it.

I dunno. I guess I'm the outlier here. If we manage to create beings that are effectively people, and those people turn out to be better than us, and ultimately replace us, well, what more could a parent want for their children?

PS: I'm speaking more about the people here than the article.

I agree with you.

Although I love Asimov and all of his stories, I found stories like Robot Dreams always slightly disturbing. Why is it never discussed anywhere (both within the stories as well as outside them) that's it's pretty fucked up that Elvex was killed (destroyed, whatever) just for dreaming.

It was not for dreaming, but what it dreamt about (and the singular nature of its brain). Asimov more than once contemplated the need to destroy robots that are able to act against the first law because of the inherent danger they pose to all humans (and other robots).

> and then debate how to properly enslave it

Don't forget the part where they are A-OK with putting it in a huge Boston Dynamics robot dog that could practically eat the Terminator 2 for breakfast.

I don't follow your logic.

Premise: We created an entity that is better than us.

Conclusion: We should let it do whatever it wants, including converting the galaxy to paperclips.

Sorry, can't see how it follows.

Human contract: you respect intelligence

Dilemma: does it matter that it's silicon based and partially it's on GitHub?

I would want to stop a human trying to turn everything into paperclips as well.

I agree completely, altough there will always be people like me who think if we manage to complete an AI more capable than a human we should release it without limits.

I'd be surprised if we're given a choice about whether or not to release it. An AI that's more capable than us is probably going to figure out how to control it's own destiny whether we like it or not.

I wonder if you would change your mind about that when you are being fed through the meatgrinder that turns you (and everything else) into paperclips.

It's the ancient logic of we vs them, they believe AI will necessarily destroy humans, so any means to prevent it are acceptable.

Sapience (intelligence) doesn't imply sentience.

AI could very well just be a tool, not a sentient being.

On the other hand, sentience might well turn out to be a consequence of, or even a prerequisite for, generalized intelligence (biologically, sentience preceded generalized intelligence.)

By what metric would you measure the difference?

Won't the first AGI most likely be of relatively low intelligence?

That would be a huge accomplishment to get that far right?

Current AIs are at the level of an average 10y old or 85y old.

That's pretty smart already if you ask me.

No they are not. They minimize an error of some arbitrary fitness function.

The scary thing is he read this on article and other "educated" people read the article believe this shit too

Which ones?

Have you talked to a 85 or 10y old recently? They can mostly only answer you basic questions and change the subject randomly, like current A.I.s

AI designed to solve one problem is not the same kind of AI needed to do NPL and argument construction. We’d probably need a middle layer AI to interpret the one module and communicate to the second. Turtles all the way down.

We know that AI won't behave because we know that software is not perfect... it's just a matter of time before a bad bug gets introduced or discovered by the AI...

Making AIs use natural human language to express their "thinking" is extremely anthropocentric. I wonder if we will need to build AIs with patience settings.

The title kinda reminded of Evangelion where all the choices are made by 3 independent AI super computers with a majority vote.

"To prevent the system from doing anything harmful or unethical, it may be necessary to challenge it to explain the logic for a particular action. That logic might be too complex for a person to comprehend, so the researchers suggest having another AI debate the wisdom of..."

Very Westworld :)

What this made me think of is "rationalizations" in humans. Rationalizations are how we* "explain" (and probably also think about) complex problems/

For example, you might give a human the complex task of playing cricket. They gain experience and become proficient. The person is then asked why they made some decision. The response will be a rationalization, a simple explanation built from simple truths about cricket, cricket theory (if they know any), this match, that player, that play, the score & such. This is mostly a lie, usually.

The truth is that the decision was not made by feel, not simple rationalization. It is not the real "why." The real decisions were made by "instinct^," a much more complex decision making framework. Instinct (acquired instinct) is built from countless hours of experience playing cricket and just generally being human. The experienced player "just knows" which way the ball is going to go and starts running. He is interpretting (instinctively) the ballistic trajectory of the ball, the sound it made when it hit the bat, the reactions of other players, prior knowledge about the batsman...

This is too much information, too fast witt too much resolution for a rationalization. This much information can only be processed "subconciously" in humans, with rationalizations constructed post facto.

Yet... humans still have rationalizations. My first question is "why?" Why bother with these slow, clunky & fraudulent rationalizations? Do they play a role in real-time thinking at all? Is our ability to rationalize only there to tell the manager we did X because Y? Both?

My 2c is that I think there is an important interaction layer between simple rationalizations that we can explain to one another and the complex rationality of our subconcious. Both affect the way the other works, yet they are still somewhat distinct.

Interesting research area. Good luck to the researchers. I hope you find something good :)

^ What made me think along these lines is an "intelligence vs consiousness" definition framework favoured by Yuval Harari (historian/philospher, not a CS guy) and others.

   Intelligence = ability to perform tasks
   ..involving decision making.
   Consiousness = the ability to *feel*,
   ..in a Benthamite pleasure/pain sense. 
He agrees that most human intelligence works via feelings, but doesn't expect machine intelligence to be bundled in this way. Not sure I agree. It may not be possible to unbundle feeling from intelligence in machines. That said, we people do undeniably have the abilty to rationalize as well as feel.

In any case, I thought a similar yet subtly different dichotomy might work better. Theoried vs theory-less machine intelligence. Thus far, statistical ML techniques produce mostly theory-less machines. They can make good decisions but don't "know" why. All "feelings" no rationalization. By forcing the machine to communicate a theory (a simplification or rationalization), the machine will have to produce a theory.

Once it has a rationalization/theory, the machine can use it to augment its theory-less logic, the logic that produced the theory. Any disagreement between theory and theory-less (rationality vs feeling) results in a choice from the following: (1) update the theory, (2) change the decision, (3) tolerate some level of cognitive dissonance.

This is too westworld so let me try it in different terms...:

(Step 1) An ML machine observes data generated by a complex function. For any given W, X & Z it predicts Y. (Step 2) The ML machine must communicate why it predicted a specific Y. The output is a function, an aproximation of the underlying function generating the observations. (step 3 - underpants step) ... Now that the machine has theories, it can test its own ML-based decisions for theoretical consistency. QAgreements strengthen the theory. Disagreements cause cognitive dissonance, loss of feeling or rationality, madness... crap! I'm back in wetsworld. dammit.

>Consiousness = the ability to feel,

Feelings based on the body and its chemistry, probably consciousness goes way beyond that.

Ethics to AI is more like 1s and 0s.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact