
OpenAI Five Benchmark - gdb
https://blog.openai.com/openai-five-benchmark
======
blueish
I'm surprised they felt confident enough to lift a lot of the restrictions so
soon. A lot of them seemed like big game changers I wouldn't expect the ai to
be able to exploit so easily. Especially the introduction of Roshan and
invisibility, they mention implementing some randomization, it seems like it
would take quite a big commitment in terms of resources to take Roshan where
the AI wouldn't even realize the benefits immediately (namely the Aegis, the
XP and gold they would have to value against the loss from farming/taking
towers).

The introduction of the other heroes also comes as a surprise, I wouldn't have
expected them to have the ai utilizing new abilities. They don't mentioned how
they are picked, other than the ai having a random draft of them (does the ai
pick their composition?)

~~~
eduren
I wonder if the reward mechanisms factor in denying resources from the enemy
team. A lot of the time it makes sense to take Roshan just so the enemy
doesn't get the opportunity to.

~~~
blueish
I think that'd be an even harder behavior for the ai to come to though, since
they'd have to recognize those benefits for the other team and their origin,
and then finding that taking Roshan prevents those benefits. I think that's a
next step after discovering the benefit of taking Roshan.

~~~
Anderkent
Same argument applies to pushing and defending towers, getting rax etc.
Assuming that the devs did not hardcode rewards for those objectives, then the
AI surely already has to 'understand' events that impact the game in long
term.

------
minimaxir
> We’ve increased the reaction time of OpenAI Five from 80ms to 200ms. This
> reaction time is much closer to human level, though we haven’t seen evidence
> of changes in gameplay as Openai Five’s strength comes more from teamwork
> and coordination than reflexes.

This new constraint is interesting. The Super Smash Bros. Melee AI paper noted
that they had to keep the reaction times to superhuman levels in order for the
model to converge (albeit DOTA is a bit different from Melee):
[https://arxiv.org/abs/1702.06230](https://arxiv.org/abs/1702.06230)

~~~
gwern
Yes, but the fact that PPO can learn very long range strategies with enough
computation when most would expect it to diverge/fail to learn at all is
already demonstrated by the original 5x5 DoTA bot. That's probably the same
thing there: it can handle the long-range learning and so does fine at human-
level APM, while the SSBM AI is stuck learning short-term strategies which
heavily rely on simple reactive fast policies.

~~~
vomjom
There's nothing about PPO that helps it learn long-range strategies. It
primarily lets you make multiple steps for a single batch so you can converge
faster.

In fact, for a single step with no policy lag, it's equivalent to a standard
policy gradient update.

DeepMind was also able to train a CTF agent with human-level reaction time:
[https://deepmind.com/blog/capture-the-
flag/](https://deepmind.com/blog/capture-the-flag/)

I suspect the difference that allows you to train with reaction time is an RNN
or compensating for the lag some other way. I'm testing that out right now
with my own SSBM bot:
[https://www.twitch.tv/vomjom](https://www.twitch.tv/vomjom)

~~~
gwern
> There's nothing about PPO that helps it learn long-range strategies.

Exactly. Which is why it's so surprising that it did anyway despite that and
discount rates which don't give any value past a minute or so.

> DeepMind was also able to train a CTF agent with human-level reaction time:
> [https://deepmind.com/blog/capture-the-
> flag/](https://deepmind.com/blog/capture-the-flag/)

Note that the CTF agent is _way_ more complex, featuring multilevel RL and
evolutionary losses, and even DNC in the agents.

------
haeffin
> Because our training system Rapid is very general, we were able to teach
> OpenAI Five many complex skills since June simply by integrating new
> features and randomizations. Many people pointed out that wards and Roshan
> were particularly important to include — and now we’ve done so. We’ve also
> increased the hero pool to 18 heroes. Many commenters thought these
> improvements would take another year.

The linked commenters thought that getting to "real dota" (more than 100
heroes, captains mode instead of random, ...) would take another year. So I
don't think it's fair to make that statement.

Edit: Don't get me wrong, I think the improvements are very nice, but pointing
to people saying "these people thought we would need a year, we did it in
under a month!" is not what you should do if you didn't actually do what the
linked people stated.

~~~
sincerely
It seems like the 5 invulnerable couriers restriction is something that will
have a huge influence on how the early game is played and is something that
the humans won't have any experience taking advantage of.

~~~
joefkelley
I don't think it's that complicated to adapt to... everyone should just be
pretty much constantly ferrying out regen and harass more aggressively.

~~~
ufo
And bottles are currently disabled so you can't use the most abusive strat
that 5 couriers would allow.

~~~
acover
I don't think you can bottle ferry anymore

~~~
gsich
you can

~~~
de_watcher
You can't. It was changed around this year.

Courier isn't that important. It's being phased out across the recant patches.
And there is a popular Dota mod with 5 fast invulnerable couriers.

~~~
gsich
Courier is important. Otherwise you are forced to go back healing/getting
items. Which will lose you XP and gold and ultimateley the game.

Yes there is Turbo, no it's not comparable to regular gameplay.

~~~
Anderkent
Regular dota players learn to play without the courier anyway, because someone
always feeds it away or uses it to ferry themselves a magic stick.

So maybe playing without a courier at all would be more representative of the
pub experience ;)

~~~
gsich
Must be 1k you are talking about.

------
justicezyx
Note: The choice of human opponent team is from people that are vastly better
than the majority of Dota2 players, but still vastly worse than the top-tier
pro teams.

So when next time Elon tweeted: "OpenAI beats the human players in 5v5"

You know that the game is not broken by AI yet (not like Go, which is indeed
broken by AI).

~~~
hshehehjdjdjd
I think from the trajectory it’s pretty obvious who will be on top in five
years. Whether the intercept is now, a month from now, or half a year away
doesn’t matter all that much.

------
inverse_pi
Does anyone know how random drafting work? If they're truly random, i.e.
randomly picking 5 heroes out of 18 (CM, DP, ES, Gyro, Lich, Lion, Necro, Qop,
Razor, Riki, Nevermore, Slark, Sniper, Sven, Tide, Viper, or WD) then it's
much less about teamwork. What if they end up with Razor, QOP, Nevermore, DP,
Gyro? The problem here is in Dota, each hero almost has a clear position in
the game, much like soccer. Having both teams randomly pick 5 heroes would
probably ruin the game, and make it really difficult for human (think
covariate shift), whereas the bot is probably trained using this distribution

~~~
chlvsl
Sorry, is it really necessary to say nevermore instead of "sf"? I understand
some of us have been playing since WC3 days and are used to the old names
(wisp, necrolyte, nevermore) but it's actually longer to pronounce AND more
confusing nowadays.

~~~
gsich
More confusing? No. Ingame voice lines also use the name on occasion.

~~~
sincerely
How could you make the argument that it _isnt_ more confusing to use non-dota2
names?

~~~
gsich
Because it is not a non-dota2 name.

------
inertiatic
This will be very interesting to see, despite the vast differences in ruleset
that make this much much less complex than the actual game.

My prediction is that we're very very far away from AI that can beat the top
teams in a 5v5. Amateur teams can easily be beat simply on the strength of the
mechanics (which are very very strong on the AI, beating even pros), but the
strategy and coordination of the top teams are out of this world.

~~~
jononor
Dare to put a time on that prediction? 1 year, 5?

~~~
ball_of_lint
I would say 18 months from August 5 when they do the stream they will still be
unable to beat a professional team playing the full game with no restrictions.

Right now they are so many key parts of the game: Illusions, Summons, Bottle,
Courier, and most of the heroes. The 18 they have chosen are all fairly
straightforward and make drafting simple. I want to see an AI playing Huskar,
IO, and Natures Prophet. Better, I want to see an AI that can draft and ban.

~~~
unrealhoang
Exactly what I think. The way the Pros exploit those heroes requires a lot of
logic deduction, not just game sense feeling and tree searching (which current
AI methods are strong at). Same thinking if we can combine all three of them,
I think we will be very close to AGI.

------
wetpaws
I really hope OpenAI teached bots to type "cyka" and "go mid"

~~~
joefkelley
Don't forget safelane pos 1 feeding at minute 6 and typing "gg mid no gank"

------
Analemma_
Less restrictions is of course better, but I'm still not impressed without an
actual Captain's Mode all-heroes draft. In addition inverse_pi's comments
about all-heroes being vital because heroes have different roles to play, the
draft is both an important part of the game and, I would think, one of the
most difficult for an AI: it involves bluffing, mind games, online strategy
adjustment in response to an opponent's actions, and awareness of the current
meta.

The draft isn't everything and it's possible that a sufficiently talented AI
could always lose the draft and still win the game, but that would be a pretty
boring outcome from the perspective of contributing to AI knowledge (just like
it's possible, though unlikely in Dota, that sufficiently good micro could
overwhelm any disadvantage in strategy and tacitcs if the AI can play at 2000
APM: it would "win", but only in a very boring sense)

~~~
drexlspivey
It can't really have 2000 APM because it observes 450 frames per minute.

~~~
Analemma_
I chose that number sort of arbitrarily to just mean "very very fast", but I
see your point.

~~~
peripitea
Furthermore, they are limiting the reaction time to 200ms, to match good
humans (I suspect some pros are actually faster than that) and remove any
advantage there. So it doesn't have a meaningful advantage over pros in any
mechanical/reaction time sense; it's truly just trying to play the game more
intelligently from what I can tell.

~~~
visarga
Still has the advantage of simultaneously observing thousands of game
variables from the API at a glance.

~~~
peripitea
Ah yeah good point

------
nopinsight
Based on the chart showing the effect of "We're still fixing bugs" in their
last blog post, it looks like they should have the skills 'buffer' to handle
significantly better teams than those they have faced so far.

[https://blog.openai.com/openai-five/](https://blog.openai.com/openai-five/)

"We’re still fixing bugs. The chart shows a training run of the code that
defeated amateur players, compared to a version where we simply fixed a number
of bugs, ..."

Looking at the chart and the fact that they are confident enough to lift
several restrictions, I'd bet on OpenAI Five winning against at least some of
the professional teams at The International. It's even possible they will beat
most teams there.

------
zhynn
I am embarrassed to say that I am confused about what the outputs are from the
"Rapid" RL training system. Do you end up with an executable that then drives
the game inputs/api? Does it produce a "bot script" that is used by the game
to drive the logic? I understand that thousands of CPUs/GPUs are used for the
training, but then what is actually playing the game at the end of the day?

~~~
gdb
(I work on the Dota team at OpenAI.)

The output is a trained neural network!

~~~
tejaswiy
Could you go into some more detail on the actual engineering mechanics? Does
each bot have an instance of the neural net model it runs a separate PC? How
often do you feed game state into the net? What's the output of the network
(bunch of movement / item / spell commands) that are fed in through the game
driver?

~~~
zhynn
Oh, good question, I didn't think of that either. there one NN that consumes
the state for each of the bot players and then returns the "next action" for
that bot, or is there a separate NN for each of the bots, and does that NN run
on the LAN machine or is the LAN machine just running the game code and python
agent which is mediating the game code and the NN?

------
QML
What is the purpose of having deep learning run on games like AlphaGo and
DOTA2, instead of having them train on more general or real world tasks? Is it
a constraint on the amount of data, since in video games you can easily
generate more?

~~~
levesque
The data generation is indeed one key aspect of it. To train a reinforcement
learning model such as this one, you do need an insane amount of data (they
wrote somewhere that the model played the equivalent of 180 years of Dota per
day).

Overall, games are a good playground to test ideas and verify assumptions. The
next step to transfer this type of knowledge to real world problems would be
to build a simulator, train on it using ungodly amounts of computing
resources, and then fine-tune the final model on the real world thing. This
has been done for robot control tasks in the past. But first, you have to
develop and prove that the base learning algorithm works -- and games are nice
for that.

This here is also a good showcase of collaboration learned by RL agents, and
beating pro teams in an esport where prize pools range in the millions of
dollars is an amazing way to convince people.

------
keepingscore
How do 5 DotA players coordinate? They share information via voice?

How does a deep learning algorithm coordinate between 5 heros? I assume it's
not 5 bots communicating over some channel but one bot acting on 5 heros?

~~~
supermdguy
Surprisingly, it's 5 completely separate bots:

"OpenAI Five does not contain an explicit communication channel between the
heroes’ neural networks. Teamwork is controlled by a hyperparameter we dubbed
“team spirit”"

\- [https://blog.openai.com/openai-five/](https://blog.openai.com/openai-
five/)

~~~
sakarisson
So the bots do not communicate directly?

~~~
Anderkent
The bots presumably "learned" that the other heroes act the way that the bot
would have acted were they in that position (i.e. all my allies run the same
algorithm, so I can predict what they would do)

------
renwoshin
How does counter-picking work if it's no longer mirror match? Was this a
separate model?

~~~
Pxl_Buzzard
According to the article the bots are only doing random picks from a pool of
18 heroes for now. I imagine pick/ban will come with later iterations.

~~~
sakarisson
They will apparently be playing Random Draft mode, which normally means that
it works similarly to All Pick, except for the fact that the players pick from
a pool of 50 random heroes. How this will work with the 18-hero pool is
something I don't know.

