
Competitive Self-Play - gdb
https://blog.openai.com/competitive-self-play/?
======
jonny_eh
I imagine this is the way we get to "true AI", or AI indistinguishable from
our own. We train it with a simple virtual environment, that we can gradually
increase the complexity of, until it mimics our own. Then we can download the
AI into a robot. Boom, it's that easy :P

One interesting outcome of this type of AI is that no one knows what the
robot's thinking, since no person designed its brain. The brain evolved, just
like ours did (but over such a shorter period of real time).

~~~
fizx
This is one reason I'm hopeful for a not-killer-robots future. I think there's
a chance strong AI will have to evolve over a non-trivial amount of time (as
opposed to an afternoon's super-singularity), and a strong evolutionary
pressure will be "don't scare the humans."

~~~
mistercow
"Don't scare the humans" is an insufficient criterion if you're trying to
prevent an AI apocalypse. It just means that the AI that destroys us won't be
_obviously_ dangerous.

~~~
MarkPNeyer
Today I have already consumed more sugar than most of my distant ancestors
would see in a year. The device i hold in my pocket reminds me that it is time
to perform the action, and so i will perform the action. I sit down about 10
hours a day, and I have never once hunted for an animal I would eat, nor
retried food from nature to consume, aside from a few raspberries on the hill
in my parents' back yard when I was a child.

The contours of my life are dictated largely by a curious system called
'capitalism', a distributed computational mechanism whereby largernumbers of
agents act with only one goal: to maximize their own utilities. I think
capitalism is a mostly benevolent AI attempting to limit the oppressive
conditions of our reailty's scarcity, and to allow billions of us to live on a
planet that can only support so many hunter-gatherers per square meter. Many
suspect it is not benevolent.

The singularity is already upon us; it's unevenly distributed. It started with
the sedentary shift. Agricultural societies dropped the egalitarian nature of
their ancestors, they had lower average health, and they had a tiny elite cast
of nobles presiding over a much larger population of what were essentially
slaves. As hunter-gatherer population increased, fighting was more frequent.
An army of slaves beats a smaller contingent of well-fed free men. This
terrible knowledge was the apple in the garden. What we call our written
history is essentially the singularity playing itself out, as knowledge
accumulates, depends, reflects itself, and expands.

Our lives are already controlled by this embodied, accumulated knowledge.
Capitalism, rule by the head, our tools are knowledge condensed into matter,
and we are controlled by them, living lives as far from our natural
environments as cows in a farm or chickens in tiny cages, towering over the
streets of hong kong.

I don't think it's destroying us, any more than we seek to destroy chickens or
cows. Well, sometimes we do. I had a veggie burger for lunch today though.

~~~
ghostbrainalpha
I love this comment.

Sorry I'm not contributing anything substantive, but I wanted you to know
someone noticed this, and it was brilliant.

~~~
MarkPNeyer
Thank you! If you enjoyed the comment, you might like my book on amazon. It's
free if you have kindle unlimited.

[https://www.amazon.com/Mechanics-Emotion-Structure-
Possibili...](https://www.amazon.com/Mechanics-Emotion-Structure-Possibility-
ebook/dp/B01MDPLI4S/ref=sr_1_1?ie=UTF8&qid=1507919723&sr=8-1&keywords=mechanics+of+emotion)

------
d--b
Mmmh so far, it doesn't look much more compelling than good old genetic
algorithms...

~~~
gcb0
Welcome to the Fad Career! :D

Previously relegated to Online Software Engineers, now Data Scientists can
feel what it is like to have tons of recent undergrads flooding their fields
and coming up with all sorts of "new" ideas that are just the first version of
something that is established in the field for decades!

Next up, Electrical and Firmware Engineers flooding the IoT Fad Career! Just
give it some 5 or 6 years.

~~~
pizza
Make sure to also look out for

    
    
        epigenetic
    
        neuro-
    
        noo-
    
        quantum
    
        cyber-
    
       negentropic
    

etc. in the coming years

------
amelius
I've seen much better videos of simulated walking structures, e.g. [1].

[1]
[https://www.youtube.com/watch?v=pgaEE27nsQw](https://www.youtube.com/watch?v=pgaEE27nsQw)

~~~
ehsankia
Also, much better competing AIs (and wittier narration), from back in 1994 [1]

[1]
[https://www.youtube.com/watch?v=JBgG_VSP7f8&t=2m10s](https://www.youtube.com/watch?v=JBgG_VSP7f8&t=2m10s)

------
Danihan
This seems pretty obvious for practicality. The AI can play thousands, or
millions of games in different VMs 24/7 and be exposed to a radically higher
number of simulated circumstances versus the comparatively plodding rate of
genuine interaction.

~~~
aeleos
The main issue with self-play is that unless done very methodically it can
lead to behavior that does not learn what we want it to learn, but games the
simulation and basically cheats. Its not a perfect solution and is just
another tool being used to improve models. It can produce really interesting
results especially in complex games but its also not perfect.

~~~
gooseus
I agree... I feel like advanced games will always need guidance to alter the
game such that they AI don't reach some local maxima through an exploit.

This happens all the time when making games for humans and is evident by how
many balancing patches are made to new, highly competitive games (such as any
Blizzard title).

The next logical step would be for another, impartial AI to observe the games
and changing the rules and parameters intelligently as they evolve to guide
the player AI toward the actual goal.

So, I'll just dive sideways right into a religious/philosophical thought based
on the simulation discussion I've been having all over the place:

 _A universe-sized simulation built for a purpose, which requires simulated
intelligence to carry out that purpose, would almost certainly include a God
intelligence to alter parameters and induce suffering /hardship to direct the
simulated intelligence toward that purpose._

~~~
danohu
Earlier iterations are buggier and have poorer dev tools. So the God
intelligence has more need to smite and command the AIs within the game.

After a while the bugs are ironed out, so God can settle back and gently tweak
parameters at a distance.

~~~
visarga
> After a while the bugs are ironed out, so God can settle back and gently
> tweak parameters at a distance.

That explains the hands off approach God has lately with the human society.

------
rtpg
I'm having a hard time understanding how the body can stay stable at all. For
the emergent behavior to appear, you would need the AI to control the body
pretty precisely, but if you just had a "random" AI the body would never stay
up straight.

Seems hard to imagine any amount of generations that get the body up to
"stand". I would have maybe expected it to crawl on all fours.

~~~
dweekly
> Agents initially receive dense rewards for behaviours that aid exploration
> like standing and moving forward, which are eventually annealed to zero in
> favor of being rewarded for just winning and losing.

------
indescions_2017
It seems like fighting games such as Street Fighter or Tekken are a perfect
fit for Self-Play.

Anyone at OpenAI attempted to build such an agent? Are there any AI research
platforms designed specifically for player vs player fighting games? As far as
I know, elite human players are still massively dominant. Even though it would
make for an exciting matchup. But giving the complexity of actual fighter
competition, with combo attacks, power meters, time limits, etc. There is an
absurdly high dimension of training variables required.

I'd actually like to try and take a step back and apply self-play to something
lower dimensional. Perhaps a 2D Tron Light Cycle sim. And see if some truly
unexpected strategies arise ;)

~~~
logent
Given perfect information, execution, and instant reaction times, fighting
games would be trivial for an AI to win at. There are already some bots that
will do things like auto-block or parry any incoming attack in Tekken and
they're pretty funny to watch (see TOOLASSISTED's videos like this one:
[https://www.youtube.com/watch?v=nNG5iRMdeg0](https://www.youtube.com/watch?v=nNG5iRMdeg0)
)

At high levels of play, where execution largely isn't an issue, it's all about
reading your opponent, conditioning them to play how you want to play, and
exploiting their tendencies to get in your damage. It would be pretty cool to
see a bot learn to play with a human-level reaction time handicap and put them
up against a pro in a long set, though!

~~~
indescions_2017
Actually your comment just triggered a sense memory for me. Of a rudimentary
"classic" AI for a prototype 2D cell-shaded bitmap sprite sheet fighter game I
created more than a decade ago. I think it was in adobe flash and actionscript
3...

And yeah my bot "cheated" extensively ;)

------
fil_a_del_fee_a
Imagine a robot police force, based on this technology. If a suspect were to
attempt to tackle or evade the police, it would react accordingly. This robot
police force would be networked, so all police nationwide would learn from
every suspect encounter. You may have seen movies where the human "outsmarts"
the robot, but when robots are hundreds of steps ahead of humans, how do we
defend ourselves?

------
chamoda
Evolution created consciousness as we experience after billon years of brutal
trial and errors. Creation of consciousness will endanger mankind but from
evolution point of view, evolution will jump to a next level of evolving. That
something evolution could not created directly by herself but her best child
mankind created for her.

------
Nomentatus
Leading to a thought... are dreams how animals engage in competitive self-
play?

~~~
OscarCunningham
I don't think that can be the only purpose of dreams because I would think
that children need to learn a lot more than adults, but they sleep about the
same amount.

------
kiriakasis
just a reply to a common reply to this kind of things

[http://idlewords.com/talks/superintelligence.htm](http://idlewords.com/talks/superintelligence.htm)

------
koliber
At 0:59 in the movie, I noticed another kind of emergent behavior: kicking the
goalie where it hurts after it defends the goal. I would call it
"retribution".

------
cpayne624
Tangentially, I'm so in love with the site design. Shout out to the UI team.

~~~
dag11
Me too! Mousing over the article links[1] reminds me a lot of the modern Apple
TV UI.

[1] [https://blog.openai.com](https://blog.openai.com)

------
gt_
I wonder if the AI was as annoyed with the music choice as I was.

~~~
gdb
Aw, that was my one contribution to the video :)! What kind of music would you
have preferred instead?

~~~
PKop
First, turn the volume down. Then music that is quieter, more ambient, and
_unnoticed_ , or none at all.

Like this[0] or any electronic genre (without lyrics preferable).

I also did not like the music. Seems trivial and unnecessary to point out, but
that's the point: if people can't help but "notice" the music and view it as a
distraction, it was a bad choice.

More generally: the cloying, "chipper" stock music of many youtube videos is
_always_ irritating to me, and never a good choice for any videos in my
opinion.

[0] [https://youtu.be/85bkCmaOh4o?t=109](https://youtu.be/85bkCmaOh4o?t=109)

------
eternalcode
Create an ICO for it and you'll earn millions /s

~~~
__s
So we have a pool of agents, they can send binary blobs, they need to have X
many tokens per day to survive (start it low, let it grow over time), they
need some way to mine a blockchain, allow some way for them to reproduce via
genetic ai & random mutations, add in some misc mechanism of disaster in order
to reward saving for a rainy day, maybe add in a mechanism where agents can
group up to kill other agents, see if they evolve a means of striking deals &
splitting up loot for future attacks

Then eventually add a human console where people can invest in agents, see if
the agent will give out returns, eventually allowing agents to put smart
contracts on ethereum & interact with misc smart contracts..

