
Dota 2 with Large Scale Deep Reinforcement Learning [pdf] - hongzi
https://cdn.openai.com/dota-2.pdf
======
gambler
I'm as so tired of trying to read these "deep learning" AI papers that
deliberately obfuscate what they did and didn't do. Often by using
deliberately ambiguous terminology, over-explaining the domain and immediately
flooding you with low-level detail even in high-level descriptions.

Each paper should start with unambiguous description of:

1\. What are the inputs of the model.

2\. What are the outputs of the model.

3\. What is the overall size of the model. Size, not parameter count.

4\. What part of the domain has been manually encoded into the architecture
and what has been learned over the training period.

5\. What are the restrictions on the domain compared to real life.

6\. How the performance is evaluated.

This should be on the first few pages. I.e. the descriptions of _what the
model does_ should precede the description of how it does it.

~~~
erlend_sh
Also, the following paragraph is very misleading:

> OpenAI Five won 99.4% of over 7000 games.

The players in those remaining percentages played repeated rounds against the
AI and eventually started winning more often than not. The AI had only one
strategy (deathball) and once top-skill-tier players learned how to play
against it, they had a >50% chance of winning.

~~~
ionforce
That's not misleading if it's literally true.

~~~
Mirioron
It's misleading because many people will read that and assume that the AI is
nigh-unbeatable by players. But that's only the case against players who
haven't played against it before.

~~~
gcarvalho
I don’t remember anymore, but were those 7000 matches against high level
players? Even the default bots on DotA2 can beat some median and below
parties. And they’re very very bad and outdated.

~~~
sorenn111
Yes, openai 5 played OG who won the international (biggest dota 2 tournament
in the world) 2 years in a row. Openai 5 beat them 2 games in a row in a best
of 3

~~~
josefx
As far as I remember it all openai 5 games were not following standard DotA 2
rules of the time. Things like one courier per player instead of players
fighting over a single one and the same heroes on both teams. Even I can win a
game if I get to set the rules beforehand.

------
hongzi
I'm really impressed by the "surgery" operations to keep reusing models as
opposed to toss old models and retrain when some small part of the game
changes. Appendix B has some pretty good dive-in.

~~~
raiman
We've also just released a "Surgery" specific paper going over the techniques
used to determine what parameters to carry over
([http://learningsys.org/neurips19/assets/papers/19_CameraRead...](http://learningsys.org/neurips19/assets/papers/19_CameraReadySubmission_Set_Based_Surgery.pdf))

------
stargazing
As a long time player and fan of Dota 2, it's really exciting to see a company
like OpenAI taking an interest in the game. I still remember when they
showcased their AI for the first time by beating some of the best players in
the world 1v1. It was an unreal feat at the time.

------
minimaxir
For context, this is the just-released paper from OpenAI about OpenAI Five:
[https://twitter.com/OpenAI/status/1205541134925099008](https://twitter.com/OpenAI/status/1205541134925099008)

------
h0bzii
I'm deeply disappointed in OpenAI for this. I was looking forward to them
playing the full game against the world champions, instead they played a very
limited version of the game, a version of the game that the world champions
doesn't play before mind you. And then they call it a win and case solved.

1\. They should play the full game without restrictions.

2\. They should have the same input and output as humans. So a direct link to
the graphic and sound cards outputs as input. And a direct link to USB as
input. (Let the ai be a mouse and keyboard driver). I don't think bot should
have any artificial delay under these circumstances.

------
reroute1
> "By defeating theDota 2 world champion (Team OG), OpenAI Five demonstrates
> that self-play reinforcementlearning can achieve superhuman performance on a
> difficult task."

Isn't this a bit of a leap though, it came with massive caveats to the game:

1\. Drafted from a 17 hero pool. 17 out of 115??

2\. No summons or illusions. Again drastically reduce possibilities in game.

3\. This AI is trained against real players for years, so it has enormous
experience against this type of opponent. The opposite applies to humans who
never compete against bots, so have no experience against this type of
opponent. If I recall correctly the more people played against the bots the
better humans performed in successive games. Winning two games with all these
restrictions and caveats is still impressive, but it feels like they overstate
things. Not to mention the flawless mechanics and communication between the
bots...

~~~
dx87
I don't think it's overstating it too much. Even with the restrictions, the
bots are still performing well above the capabilities of the vast majority of
human players. The restriction on illusions and summons was also for the
benefit of the players; the OpenAI team didn't want the bots to win through
flawless micro skills. For point 3, even though the players didn't get to
train against bots, they have the advantage of being able to learn and react
accordingly. Since the bots can only learn when their models are being
trained, they're trivial to beat if you use a novel technique that they
haven't seen before.

~~~
gdxhyrd
If you want the bot to avoid winning by micro, you add delays, cooldowns on
interactions, imperfection on clicking, misclicks, etc. on the level of a
human pro.

In any case, simplifying the game is usually done to make training far
cheaper.

~~~
john-radio
But the human competitors also require large amounts of training in order to
competently play Dota 2, and their training is not simplified that training in
a similar way. I realize that "fairness" is not really the point of having
humans play against bots, but doesn't it damage the usefulness of the
comparison from having them perform a human activity?

------
iamjudged
I’ve found various recent complex-strategy-game AI efforts very interesting,
but always have one key complaint: They don’t properly ground the mechanical
execution of their AI to realistic human levels for comparison. And if you
give the AI an unfair mechanical advantage, your model isn't going to have to
learn nearly as good of strategy. I will say however - this is the closest to
realistic I have seen - but is still lacking.

The two main measurable parameters of performance are: 1 - reaction time 2 -
rate/volume of actions (i.e. Actions Per Minute) And I would argue some there
should be an additional consideration of some form of: 3 - mouse-click
accuracy

I read through the details of the implementation, and they did decent at 1, 2
but overall need to do better.

Their reaction times end up as a random draw between 170-270ms. I think raw,
simple visual reaction times for a pro gamer could be ~200ms, BUT that’s just
for a simple “click this button when the light changes” type of tests. There
are “complex reaction time” tests where you sometimes click, but other times
don’t (eg a red or green light), and reaction times in that case are around
~400ms. I think if a pro is in a game situation where they anticipate their
opponent will take some action and are ready to immediately respond, 200ms is
a fair reaction time. But that’s not the usual state through a game, and the
bot effectively has that perfect anticipation mindset at all times. So not
crazy, superhuman reactions, but definitely not completely realistic/fair
either.

In regard to action rate, they allow the model to take 1 action every 7.5 ms -
which translates to 450 APM. The very best pro gamers are in the 300-350 APM
range. And i think a humans actions include various thoughtless click spamming
(which AI doesn’t need to do), as well as visual map movement/unit examination
that an AI would not need as much of with a direct, comprehensive feed
available information. So the sustained 450 APM seems pretty superhuman to me
- BUT dota 2 is much less of a APM intensive game, and certainly sustained APM
isn’t as important. And humans get get higher APM in important burst moments
whereas this AI is at an exact fixed rate of 450 APM. So all-in-all, the APM
is maybe fair (at least close to fair)

The mouse click accuracy piece, however is pretty unfair if the ai can make
precise clicks across the screen with no affect to reaction time. This factor
isn’t considered at all by the AI team. I feel they should either add in some
randomization to simulate inaccuracy, or cause delayed reaction time based on
how far the mouse would have to move.

With all these factors combined - I still feel this is not quite a fair test.
But it’s closer than other’s I’ve seen, and it’s still a very impressive
overall achievement! I’d love to see them go the small extra distance of
constraining these mechanical performance parameters just a bit more. I feel
that would make a BIG difference in the level of strategy required to beat the
best humans. They’re SOOO close to amazing me!

~~~
orbital-decay
Yeah, the low-level motor and sensory part is what's actually hard to get.
Current AI is good enough to figure out the sensory stuff, but still works
with the game input much more directly than humans. However for that to
change, it needs a precise bio-mechanichal model of what the human players
use; is something like this available at all?

~~~
chii
create an actuated finger/mouse+keyboard combination which moves with
realistic human speeds (e.g., signal speed and actuation speed). Have the AI
output controls for this device (so the mouse has to be moved, rather than
allow for precise x-y coordinate inputs like they have in the bot).

