
How to Train Your OpenAI Five - dsr12
https://openai.com/blog/how-to-train-your-openai-five/
======
xemdetia
I'm still kind of disappointed with how much hubbub came out of these games. I
watched most of them and it just in general felt like mediocre dota even
against the pro team. I'm not saying I am amazing but I wish this article came
out sooner to kind of put a reasonable contrast in the stuff that was coming
out. While I enjoyed the idea of the drafting phase it is hard to not look at
the hero pool available and see only two or three reasonable lineups (both of
OG's lineups trying to take advantage of invis heroes were both weak in
general in the current version), and I think the shown drafts display that.
It's hard not to feel like that OpenAI solved the problem of that limited hero
pool and ended up with a superior team to humans who weren't familiar with
that version of the game.

I don't think that's a wrong outcome, but people trying to frame it that
OpenAI beat OG at their own game in the same lines as go, chess, or checkers
seems misleading. I think I am just still feeling sensitive to the sentiments
and outcome that ended up being the outcome of systems which got on the press
pipeline like watson. I wish it was more celebrating how they were able to
create this ensemble learning environment, so I'm hoping for that more
detailed analysis especially how they were able to adapt (and hopefully not
throw away) the learning they did after some of the really large patches.

~~~
hug
There are so many ways in which I feel this argument breaks down.

* OG would absolutely handily win against any non-professionally-contracted dota team under the exact same challenge conditions. (i.e.: OpenAI Five's test was about as good as they could give it at this stage.)

* OG's play within the scope of the game was novel, in that they were deliberately picking strategies that OpenAI Five didn't "understand" (i.e.: creep skipping) and still lost.

* OpenAI's play was novel, even within the constraints of the pool: No human team would pick the 4-core + CM lineup that OpenAI favoured. Even if you played with that limited hero meta, 4-core would likely _not_ end up being a deliberate strategy picked by human teams with any frequency.

* OpenAI's play in all team fights was _demonstrably_ better, playing exactly up to the limits of the heroes they were using and no further. (It is, in fact, OpenAI's teamfight which won them the game.)

* A number of the reductions and restrictions in hero-pool were to allow for OG to have a fighting chance: All heroes with controllable illusions or clones were banned because OpenAI had too much of an advantage in "micro".

If you sum all of those things up, I think it looks painfully obvious that
OpenAI Five, if trained against the full hero pool, would completely and
utterly dominate gameplay at the professional human level. It wouldn't even be
close.

~~~
ctchocula
> A number of the reductions and restrictions in hero-pool were to allow for
> OG to have a fighting chance: All heroes with controllable illusions or
> clones were banned because OpenAI had too much of an advantage in "micro".

This one is not so certain. One of the OG players (Notail) mentioned after the
game that the AI definitely had a few weaknesses such as:

* AI didn't bother checking the trees after the splitpushing hero disappeared from the map.

* AI doesn't handle invisible units well.

* AI has poor warding.

He said he regretted not exploiting these weaknesses more. There are certain
splitpush ("rat Dota") heroes that potentially could have helped OG to the
detriment of AI e.g. Antimage, Furion, Morphling, Juggernaut, etc. Instead,
what we saw was OG using a non-traditional split push hero (Viper) from the
limited hero pool try to split push, which failed because the hero got ganked
too often (Viper does not have natural escape mechanism unlike many of those
splitpush heroes above and he didn't itemize for splitpush e.g. Shadowblade
first item) and it doesn't push very fast. With these splitpush heroes and
well-placed observer wards, I think a splitpush-based lineup could possibly
take the game.

OpenAI's play was very impressive. I did not see this coming after OpenAI lost
to some of the weaker pro teams at last year's TI. However, I would give OG
even chances at winning a longer series (Bo5 or Bo7) over 5 or 7 days where
they could learn from OpenAI's playstyle and figure out strategies on how to
exploit them similar to how in longer Dota 2 tournaments, we see initially
successful strategies on Day 1 get countered by the tournament final. Another
analogy is the 1v1 bot that OpenAI released. When released, even pro players
would lose to it. However shortly after, the combined intelligence of the Dota
2 community discovered an exploit where if you lead the enemy creepwave into
the jungle, the AI bot glitches out and the human player eventually wins [1].

[1]
[https://www.reddit.com/r/DotA2/comments/6t8qvs/openai_bots_w...](https://www.reddit.com/r/DotA2/comments/6t8qvs/openai_bots_were_defeated_atleast_50_times/dliundl/)

~~~
Gatsky
Not sure I buy the argument that humans can learn to beat an AI by exploiting
its weaknesses. This argument has been made many times. Kasparov said
something similar when he lost to Deep Blue, yet not too long after that chess
engines became unbeatable.

~~~
unrealhoang
That argument will fail if the game is “solved”, OpenAI with current limited
same-kind-of-heroes hero pool is far from it. OpenAI current heroes pool is
just about exact execution/coordination, which in and of itself is amazing
nonetheless but strategical heroes like ember spirit/arc warden/nature
prophets are those that bring Dota2 to much higher strategical levels.

~~~
Gatsky
Well maybe, but so far we have seen that AI improves quickly enough that other
factors are irrelevant.

------
chriskanan
I was making slides to discuss this model in my deep learning class tomorrow.
From what I can find on the model, it isn't _that_ complex --
[https://d4mucfpksywv.cloudfront.net/research-
covers/openai-f...](https://d4mucfpksywv.cloudfront.net/research-
covers/openai-five/network-architecture.pdf).

The real trick seems to be just massive amounts of self-play. This model had
the human equivalent of 45,000 years of experience playing DOTA 2, with 250
years of experience per day.

~~~
qlk1123
Outsider here. How do you measure the equivalent experiences of a model? Tens
of thousands of years ...... how is that possible? Does any timing trick
involve in your training/evaluating phase?

~~~
hug
The game has an in-game clock, and there are regular "timed" events that take
place throughout the game that are basically a hard cap on when you can
realistically win the game. (i.e.: To earn gold & experience, you must kill
'creeps', and waves of creeps spawn every 1 minute in game. You'll need a
certain amount of gold & experience to have the in-game strength to take down
defensive towers.)

An average human game lasts 30 to 45 minutes. If OpenAI simulated 1,000,000
games, that'd be over 28 years of non-stop gameplay to a human.

~~~
qlk1123
It makes me wonder, what to do then if an game of interest does not have such
timing mechanism? To put it another way, how to train the agents as fast as
AlphaGO (no timing issues at all) or OpenAI Five (in-game overclocking) if the
environment can only be simulated in the same time scale?

Any thoughts?

------
SmooL
It's not full scale DotA (limited hero pool makes a huge difference imo) but
it's still incredible.

Obviously, this limited rule set is not pure DotA, and so the humans we're at
a slight disadvantage.

The results of letting anyone play against it will be the huge test - AI often
fails to "cheese" strategies, and this will be a good test of whether any
exist. I'm excited to see the results.

~~~
asutekku
They told they didn’t choose all heroes as the AI was already much better at
micromanaging the creeps. Limiting the selection apparently gave humans a
better chance to win.

~~~
andreime
(edited for more info) That's incorrect. Each team has strategies and
preferred heroes and OG won the international ( a sort of world championship )
by ignoring trends and sticking to a limited preferred hero pool. Since then,
they've been consistent at underperforming vs. other human team.

So by limiting the amount of heroes, humans are clearly at a disadvantage. In
the last international, from the hero pool, 110 heroes were picked and 98 were
banned.

The micromanagement claim seems fair when it comes to illusions.

------
RSZC
There's a line between puff-up marketing and being actively misleading and
OpenAI continually crosses it.

With only 25 heroes allowed (of > 100) they were unable to get >5k
MMR...nowhere near pro level. What are they at with the full hero pool? 3k?

I'm not trying to cast aspersions on their AI work - it's obviously an
incredibly hard problem. But instead of admitting that they failed to make a
competitive team, they instead play a stripped down game that's easy for their
bots' limited strategies, and then publish triumphant marketing releases
claiming mission accomplished.

~~~
anilshanbhag
This is such a pessimistic view of things. As the hero pool expands, the
training challenge increases at the rate of n^10. So going from 16 to 25 - we
have ~100x blowup in hero combinations, likely requiring significantly more
compute to train.

~~~
RSZC

      OpenAI Five is the first AI to beat the world champions in an esports game, having won two back-to-back games versus the world champion Dota 2 team, OG, at Finals this weekend.
    

That's their opening sentence, and it's actively misleading.

~~~
cthor
It's pretty sad people excuse this. Every single press release OpenAI have
made in the history of this project has been actively misleading.

What they've achieved is actually impressive, too, so the deception is doubly
annoying.

------
b_tterc_p
I think what people don’t get when they want to celebrate this is that there
is no dominant strategy for a game like this. To play optimally you need to
adapt your plan according to what your opponent goes. If anything, there is
evidence that this AI, as well as the Starcraft one, don’t do this. They just
have a world view of what works best and try to do that. And then it kind of
works because the AI is much better at micro.

It would be very ineffective if this AI had to input everything through a
human body somehow. Now to be fair, it would be very difficult to not win on
micro. There’s no real set of constraints I can think of that would make it a
fair match.

Ideally they should probably be shifting to a game where micro just isn’t as
big of a differentiator. Or, consider moving to a game where a strategic
player issues commands and human players execute them. Off the top of my head
I can’t think of anything that really fits that build.

Alternatively, spy party (an incredible game) could be an excellent candidate
as it requires deception, and a strategic awareness of what the other player
is thinking. And, at a high level of play, micro is practically flawless.

~~~
usgroup
Dominant strategies in the game theoretic sense work regardless of what you
do.

Chess is a strategy game with no micro. So is anything turn based.

~~~
b_tterc_p
and that’s why I said there’s no dominant strategy. It does not exist.

Chess has no micro, but also has no hidden state. You don’t really need any
concept of what your opponent is thinking to play optimally.

~~~
usgroup
Poker ... has dominant strategies. Has hidden state.

------
jonathanhd
I’m intrigued by their “5k” with 25 hero mention. Are the bots worse because
they have less proportional training time or because the extra hero abilities
are making it more difficult to rely on their deathball strat? Considering how
every public performance of openAI has shown nothing but relentless aggression
Id speculate that it’s the latter.

------
leesec
As continues the trend, people constantly write off the next improvement in AI
as "not that big of a deal" or "not actually AI because X" or "underwhelming".
I suspect they'll be doing this all the way up until AGI.

~~~
squeaky-clean
It's underwhelming because the titles are misleading, not because the
achievements aren't impressive. Beating humans by outhinking them and still
using human-like reflexes and all game resources available is like 1000-cool-
points, and beating them as they are now is 500-cool-points. Yes it's an
incredible achievement, but they get you into the article with a misleading
line, and so it's to be expected that all of your hype is deflated as you
read, rather than building up your excitement. I just with they'd be honest
about the shortcomings in their AI.

It's like the joke from Futurama about why humans don't watch the Robot
Blernsball League (basically future-baseball).

Bender: Now Wireless Joe Jackson - there was a blern-hitting machine.

Leela: Exactly! He was a machine designed to hit blerns. Wireless Joe Jackson
was nothing but a programmable bat on wheels.

Bender: Oh, and I suppose Pitch-o-Mat 5000 was just a modified howitzer!

Leela: Yep.

------
usgroup
“In total, the current version of OpenAI Five has consumed 800 petaflop/s-days
and experienced about 45,000 years of Dota self-play over 10 realtime
months...”

Clearly very exciting and a remarkable achievement but does anyone else marvel
all the less when such colossal resources are needed?

Obviously it’s not entirely brute force but it seems much too brute force to
be considered “intelligent”.

~~~
Tenoke
Given how many years of brute forcing through the solution space it took for
evolution to pre-train a human brain, I'm guessing we are not intelligent
either.

~~~
usgroup
Well less than that for one and to do a whole lot more, at a far lower wattage
:)

------
acover
Has openai released the code for how they are training the models?

