> Having AI programs argue with one another requires more sophisticated technology than exists currently. So thus far, the OpenAI researchers have only explored the idea with a couple of extremely simple examples. One involves two AI systems trying to convince an observer about a hidden character by slowly revealing individual pixels.
How is this even remotely linked to two programs arguing or explaining reasoning? It's one network being trained to change pixels in an image in order to make the detecting network perform worse.
Why do so many articles anthropomorphism artificial intelligence? These networks are assigned to do a certain task given a certain reward. There is no motive apart from reducing the loss function provided. The reasoning or rationale is that whatever decision the network is making is serving to reduce the error. Whether this has some deeper reasoning that humans would also use is a separate question but to attribute human style reasoning to the network is a stretch.
Clicks. People click on things they can relate to. "We did some matrix multiplication that resulted in more insights" probably won't cut it :)
The idea is that the given example (explained in more detail in the paper) is the first step towards understanding how such a system might work.
Policing what someone talks about only really works if you can be certain that the parties are not potentially prepared to collude. If the first AI starts with something akin to "psst... listen, I need to explain something to you in confidence.. [insert reasons why the human observer can't be trusted]", what's to say they won't occasionally find a sympathetic virtual ear.
"Still, some AI researchers are exploring ways of ensuring that the technology does not behave in unintended ways."
There are two threats that are both problematic.
First is exploiting an AI's weakness, just like an SQL injection on the web currently. Then a powerful AI can behave in unpredictable ways because we trust the trained models.
The second is far greater. If we find a safeguard that will protect AGI-s from behaving in unwanted ways then there always will be someone out there who will just turn it off for profit.
Some will say "not possible, how will the code recompile?" to which this article clearly answers...another AI will do it. We have to be honest with humanity here and say that we are seeking to create artificial life of which we will have no more control over than we do any other human. Oh and this new artificial life will be orders of magnitude more physically and mentally robust than us.
prevent AI from ever being developed
And how we could do that is too pessimistic to talk about.
Read own code, make changes, recompile, redeploy and replace self (or not, just start new fork) with newly upgraded mind.
In practice, they'll probably be smart enough to convince us to turn off the sandboxing anyway, of course.
That said, there is no reason to believe an AGI would have a desire to break out of its sandbox (or any other desire). Why not provide them with the digital equivalent of agoraphobia?
And remove all I/O capabilities to prevent them from convincing you to do their bidding.
So we might as well give direct control over nuclear missiles from the start /s
Does it have any means of creating sparks/heat would could be used to make a hole in the faraday cage?
What about neutrinos or gravity waves? How would you stop those?
We could feed it through power a little solar panel. Or just give it batteries and accept that it will run out of power eventually. There are plenty of ways to power a computer without allowing it to connect back.
> Does it have any means of creating sparks/heat would could be used to make a hole in the faraday cage?
Assuming it has no access to electronics other than the ones that it needs to function, I don't see how it could overheat one of those parts to over 1811 Kelvin (the melting point of iron) to destroy its Faraday cage without destroying itself. Or we could just cut the power when it tries to mess with the cage, or does anything else that we don't understand.
> What about neutrinos or gravity waves? How would you stop those?
Why would we? We need very expensive detectors to even measure those. How much damage could it do using neutrinos and gravity waves?
A solar panel seems 1-way, but it isn't totally. Fewer photons would be absorbed if say, higher resistance would be applied. By modulating that resistance, it could sent messages to the outside.
> we could just cut the power when it tries to mess with the cage, or does anything else that we don't understand
If we understand everything it's doing, it's not more intelligent than us. It must be incredibly stupid because we can't even understand a worm.
>> What about neutrinos or gravity waves? How would you stop those?
> Why would we?
Because otherwise it can establish 2-way communication with the outside world and possible nefarious actors.
In addition, the cage could be design so that the power source for the AI is interrupted when the cage is opened, rendering it unconscious. Of course once you assume that it is arbitrarily more intelligent than the most intelligent humans, it might find a way to hypnotize people into doing its bidding just by winking in some magical pattern. But we don't see many dumb creatures being effectively mind controlled by smarter creatures in nature, so I doubt such flaws (if they even exist) are easy to exploit, even for a gigantic intelligence.
Compared to a decent AGI, for instance, our level of intelligence is probably indistinguishable from that of your average bird or spider.
We're talking about birds getting together to design a system to contain a human, only on a much more extreme scale.
They'd try to contain us using bird-logic. Perhaps they'd ensure we don't attempt to flap our wings and fly away, since their experience tells them that might be an option.
It might never occur to them we could just kneel down and start a fire using the sticks we have lying around.
Besides, an untrained human won't just figure out how to make fire. Obviously we figured it out at some point, but if you were to raise a human without ever showing them how to make fire, they would probably never figure it out. Many of us know (in theory) how to make fire but probably couldn't do it in practice just by rubbing two sticks together. And even if you are an expert fire-maker the birds can easily claw out your eyes before you get a fire going. Or you'd die because you're locked in a cage that is on fire.
You make several assumptions that I disagree with:
1) That AGI will automatically be (much) smarter than a human
2) That AGI will be motivated to break out of its cage and/or do harm.
3) That sheer intelligence is sufficient to escape any trap.
Having said that, if you enjoy those assumptions I would recommend you read 'Blindsight' and 'Echopraxia' by Peter Watts if you haven't already, I think you would enjoy them :)
I hadn't heard of 'Echopraxia'- thanks for the recommendation!
The main assumption I make with AGI is that we'll be very bad at hitting the sweet spot between 1) "toy", 2)"human-level intelligence", and 3) "beyond human-level intelligence".
I doubt we'll stop at any point between (1) and (2). And then by the time we get even a decimal point beyond (2) and toward (3), we're toast.
To follow your analogy, it would be like birds trying to make a bird-like creature and keep it contained, but overshooting. By the time they realize they've overshot, they've got a human in their cage, and that human has 10,000 bird generations of time to evaluate his environment and think about escaping.
We can guess pretty well that the birds are out of their depth, and were doomed from the start.
It beat Rybka and every other bespoke chess program in existence
AlphaZero even beat the version of AlphaGo that was trained on existing data.
Point is — we have no idea what is possible when you unleash MCTS towards goals like persuasion, humor etc.
The ONLY constraint here is that feedback requires a human judge. If you can learn to simulate a human based on lots of data (this is a hard part) then you can finally have a two sided game playing each other and find the best diets, the best diagnoses utilizing symptoms, tests and big data around the world of incidence of outbreaks etc.
However it also means quickly finding the funniest jokes, most convincing but subtly flawed arguments, best artworks etc.
1. The assumption that the ability to provide logical, convincing explanation indicates ethical behavior/intentions. First, I think there are many sitautions in which you can come up with perfectly logical arguments supporting actions which humankind considers harmful. Agent Smith was quite convincing for example ;) Jokes aside, I think it boils down to simple rule - the one who is a better orator/rhetorician/liar wins. Just like in (organic) life, but in case of AI being better comes down to having access to bigger resources.
2. The assumption that the "honest" AI will truly act according to humankind's values. If we can be sure about that we don't need this debate at all.
Here is the thing about AI. We think in terms of logic and reasoning. The AI does a search towards a goal.
Almost all our systems are designed assuming the inefficiency of an attacker. For example voting — we assume limited sybil attacks and coersion a la Eagle Eye.
But I am also talmung about basic human trust, humor, and quality art etc.
What if a computer algorithm would far exceed humans at humor, generating or undermining trust, taking over voting systems, making legal arguments, predicting crime, and so on?
We would be locking people up based on CCTV and data correlations only and society would trust the computer way more than a liberal democracy. See China for an example that’s unfolding in front of us.
A man with a bot could charm more women or customers online than a regular person.
Bots would negotiate better deals on average and write funnier jokes and make a glut of various types of art to the point that it loses all scarcity.
Bots could even have better sex, answer questions better and replace the need for other humans.
Think I am exaggerating? How often do you spend tine with your parents vs google or facebook NOW?
I dunno. I guess I'm the outlier here. If we manage to create beings that are effectively people, and those people turn out to be better than us, and ultimately replace us, well, what more could a parent want for their children?
PS: I'm speaking more about the people here than the article.
Although I love Asimov and all of his stories, I found stories like Robot Dreams always slightly disturbing. Why is it never discussed anywhere (both within the stories as well as outside them) that's it's pretty fucked up that Elvex was killed (destroyed, whatever) just for dreaming.
Don't forget the part where they are A-OK with putting it in a huge Boston Dynamics robot dog that could practically eat the Terminator 2 for breakfast.
Premise: We created an entity that is better than us.
Conclusion: We should let it do whatever it wants, including converting the galaxy to paperclips.
Sorry, can't see how it follows.
Dilemma: does it matter that it's silicon based and partially it's on GitHub?
AI could very well just be a tool, not a sentient being.
That would be a huge accomplishment to get that far right?
That's pretty smart already if you ask me.
Very Westworld :)
What this made me think of is "rationalizations" in humans. Rationalizations are how we* "explain" (and probably also think about) complex problems/
For example, you might give a human the complex task of playing cricket. They gain experience and become proficient. The person is then asked why they made some decision. The response will be a rationalization, a simple explanation built from simple truths about cricket, cricket theory (if they know any), this match, that player, that play, the score & such. This is mostly a lie, usually.
The truth is that the decision was not made by feel, not simple rationalization. It is not the real "why." The real decisions were made by "instinct^," a much more complex decision making framework. Instinct (acquired instinct) is built from countless hours of experience playing cricket and just generally being human. The experienced player "just knows" which way the ball is going to go and starts running. He is interpretting (instinctively) the ballistic trajectory of the ball, the sound it made when it hit the bat, the reactions of other players, prior knowledge about the batsman...
This is too much information, too fast witt too much resolution for a rationalization. This much information can only be processed "subconciously" in humans, with rationalizations constructed post facto.
Yet... humans still have rationalizations. My first question is "why?" Why bother with these slow, clunky & fraudulent rationalizations? Do they play a role in real-time thinking at all? Is our ability to rationalize only there to tell the manager we did X because Y? Both?
My 2c is that I think there is an important interaction layer between simple rationalizations that we can explain to one another and the complex rationality of our subconcious. Both affect the way the other works, yet they are still somewhat distinct.
Interesting research area. Good luck to the researchers. I hope you find something good :)
^ What made me think along these lines is an "intelligence vs consiousness" definition framework favoured by Yuval Harari (historian/philospher, not a CS guy) and others.
Intelligence = ability to perform tasks
..involving decision making.
Consiousness = the ability to *feel*,
..in a Benthamite pleasure/pain sense.
In any case, I thought a similar yet subtly different dichotomy might work better. Theoried vs theory-less machine intelligence. Thus far, statistical ML techniques produce mostly theory-less machines. They can make good decisions but don't "know" why. All "feelings" no rationalization. By forcing the machine to communicate a theory (a simplification or rationalization), the machine will have to produce a theory.
Once it has a rationalization/theory, the machine can use it to augment its theory-less logic, the logic that produced the theory. Any disagreement between theory and theory-less (rationality vs feeling) results in a choice from the following: (1) update the theory, (2) change the decision, (3) tolerate some level of cognitive dissonance.
This is too westworld so let me try it in different terms...:
(Step 1) An ML machine observes data generated by a complex function. For any given W, X & Z it predicts Y. (Step 2) The ML machine must communicate why it predicted a specific Y. The output is a function, an aproximation of the underlying function generating the observations. (step 3 - underpants step) ... Now that the machine has theories, it can test its own ML-based decisions for theoretical consistency. QAgreements strengthen the theory. Disagreements cause cognitive dissonance, loss of feeling or rationality, madness... crap! I'm back in wetsworld. dammit.
Feelings based on the body and its chemistry, probably consciousness goes way beyond that.