
The International 2018: Results - legatus
https://blog.openai.com/the-international-2018-results/
======
lawrenceyan
It's clear that there's still a long way to go. Too much manual feature
engineering and handtuning is needed right now due to the extremely weak
reward signalling of current reinforcement learning algorithms, crippling
OpenAI in the later stages.

The cool thing is that bruteforcing computational power seems to get us
decently close. I'm optimistic that with renewed interest in the reinforcement
learning field, breakthroughs will be made on the algorithmic side within a
matter of time.

~~~
maldeh
_> We don’t believe that the courier change was responsible for the losses. We
think we need more training, bugfixes, and to remove the last pieces of
scripted logic in our model._

It is interesting to consider that manually scripted behavior may have been
responsible for some of the more strange moments in the matchups; possibly
they may have had hardcoded place-ward() routines that resulted in all of that
clumped-together warding... or maybe even "check-roshan-pit()" without
adequate training data to identify Roshan's respawning patterns?

~~~
gdb
The item purchasing is still scripted, which means that Five receives wards
whether it wants them or not. One explanation for the ward dumping is that
Five is just trying to free up inventory space.

~~~
scns
Use the backpack? A hard five rarely owns 9 items and wards stack.

------
chibg10
While there are certainly broader insights to be made, one tidbit I think I
noticed that interested me had to do with Axe.

Between the two games, we saw Axe played by both the human team and the AI
team. When played by the humans, blink-calls were completely shutdown by the
AI's superhuman counterinitiations. That made enough sense. When the Axe was
played by the AI though, I don't recall Axe ever even _attempting_ any blink-
calls. I'm curious if this might be the result of the AI overfitting to itself
-- at AI reaction speeds, blink-calls are not a very useful maneuver, and thus
the AI learns not to perform them.

Against a group of humans though, Axe's blink-call initiations are arguably
the hero's biggest selling point.

We didn't get to see most of the hero pool, but I wonder how much the AI
overfitting to AI playstyles will hinder the bots against humans in the
future.

Of course, the bots have many other issues which loom larger atm imo but I
felt interested in enough in this tidbit to point it out.

~~~
emilsedgh
Or maybe the blink-then-call sequence was not tried by the bot in the
training. Therefore it doesnt never got rated.

Thats the thing I tried to understand. The blink-to-call is very specific of a
sequence. Youd have to blink in middle of several enemy units and call them.
What are the chances of the bot doing that randomly and therefor understanding
its value?

Specially if the bot realizes random blinks are a watse of resource or jumping
in middle of enemy is a bd strategy, the chanced of it trying that wih Axe and
therefore learning it gets lower.

~~~
Kagerjay
Blink calls seems to be a very predictable strategy for axe. If there's more
than 2 enemys heros next to each other, with X number of teammates within
close proximity to deal support DPS, blink call.

How much training datasets were these bots given though?

------
fermienrico
I wish they’d use a better name for their engine than “Five”. Half the time I
get confused if they’re talking about of Five players or Five engine or 5
something else.

Use a unique name like “Galaxy” which doesn’t represent anything remotely
close in the game - Spell names, skills, etc. There is a huge amount of stuff
going on in the game and it’s such a heavy cognitive load for an outsider who
doesn’t play Dota, it was annoying to keep checking if they meant the name of
the engine or number five in a game of Five vs Five. Or Five vs 5!!!? I’m so
damn confused.

Same thing here: [https://openai.com/five/](https://openai.com/five/)

Number #2 bullet point says “Defeat five of the world’s top professionals

Five will attempt this live at The International in Vancouver’s Rogers Arena
this week!”

It is such a poor choice.

~~~
fireattack
The capitalization is obvious enough for me.

~~~
fermienrico
I was listening to the commentary on Twitch and it was a mess.

------
PhearTheCeal
Here are some insights into how OpenAI fine-tuned the rewards and short-term
actions of the bots
[https://gist.github.com/dfarhi/66ec9d760ae0c49a5c492c9fae939...](https://gist.github.com/dfarhi/66ec9d760ae0c49a5c492c9fae93984a)

The numbers seem pretty arbitrary to me, that's probably what this blog post
is talking about when it mentions why it lost.

~~~
habitue
Since these are hyperparameters, some of which are annealed over the entire
training period, and given the fact that the training required ungodly amounts
of computing time, I think it is just impractical for them to have fully
checked whether they were set optimally. They probably went with what seemed
good, and trusted deep networks to pick up the slack. (this is total
speculation on my part).

I do think if they'd used some more sophisticated RL algorithms, perhaps with
intrinsic curiosity, or some kind of hierarchical task learning, they might
have been able to reduce their training time and maybe been able to tune their
hyperparameters a bit more

------
Leary
One of the clear weakness of the current OpenAI Five is its warding, with
wards oftentimes being placed in bases. Perhaps the amount of vision that the
team has can be a short term reward. Likewise, it currently does not engage in
too much counter warding with sentries.

The game vs. Pain clearly demonstrated how humans can use wards to gain an
information advantage over the bots that otherwise had a great chance of
winning the game.

~~~
x1000
Idk anything about any of it. But, what if open AI is "bad" at ward placement
because it plays itself so much, it already know exactly where it would be if
it were the other team?

~~~
dx87
The developers talked about the bad warding, and it's a combination of them
actually being bad at it, and not knowing how to manage their inventory so
they'll just plant wards in order to free up inventory space.

------
ufo
I hope they continue the project over the next year. I am really curious if
they will be able to teach the AI to be better at late game strategies and
long term planning. (Item builds might be an interesting challenge in this
area as well)

------
shawn
Someone asked me, "How does OpenAI know the MMR of its bots?"

I don't know. I assume it's similar to how AlphaGo measures its ELO ranking.
But the strange thing is, this is hundreds of years of self-play, not a public
pool of humans playing against each other. How does MMR in simulation
translate to MMR in real matches?

Before pointing out that it's possible for an ELO rating, consider that Dota
MMR is a bit different – every game you win, you get +25. Every game you lose,
you get -25. This changes at the very high / very low levels, or if the
matchups are wildly imbalanced, but that's the general setup. Or it was, a few
years ago.

Does anyone have a guess?

~~~
gdb
We estimate it based on when we start being evenly-matched against teams with
a given average MMR. (As you might expect, our usual pattern is to lose to a
test team consistently, then start being evenly matched, then consistently
beat them.) The version of this diagram on our Five page also include bars
indicating when we started and stopped testing against various teams:
[https://openai.com/five/](https://openai.com/five/).

We don't claim this is perfect, which is why we label the chart "estimated
MMR".

~~~
shawn
Ah, that seems like an excellent system. It removes all of the guesswork of
some formula-based solution. Empirical observations tend to be more accurate.

It might be worth hiring one of the top dota pros to coach OpenAI's bots: they
could point out each small mistake that the bots made which pros wouldn't
make. That might make it more tractable to beat the top team next year. (The
actual dota coaches might be even better for this purpose than the pros, too.)

Looking forward to the outcome! The goal of "beat the top dota team" is one of
those ambitious ones that few companies take on. It's really interesting to
see the incremental progress.

~~~
Buge
How would the pro point out the mistakes to the bot? The bot cannot understand
human language.

~~~
shawn
Not to the bot, but rather write down the mistakes and decide (as an AI
programmer) how best to address them.

The solutions would probably take the form of adjusting weights, adding
additional dimensions, removing or refactoring existing signals, or deciding
that extra training would naturally solve the problem.

------
arenaninja
Progress is much farther along than I expected. At the same time I wonder how
well the bots adjust to the meta. Dota 2 changes every time a new patch rolls
along. I also wonder if bots could make some of the more unconventional picks
work. As an example, I love Medusa, and she's one of the least picked heroes
in pro dota

~~~
chibg10
>Progress is much farther along than I expected.

I had thought this as well, but the more I think about it the more I'm
questioning how far along the bots actually are.

They're excellent at teamfighting. No doubt about it. But they seem to be
inferior to humans at nearly everything else. Basically anything that requires
strategic thinking, the bots pail in comparison to even a 3-4k mmr player. I'm
really starting to suspect that the bots have essentially been able to be
successful purely due to their tactical abilities. And I see very little
reassuring progress in strategic abilities tbh. For example, the highground
push with no buybacks on the AI side that ended in a 5-man buyback and
teamwipe from the humans was a glaring strategic error. The warding is bad.
The Rosh play is terrible. Item choices (another enormous strategic element)
are still scripted. Nearly all heroes with abilities that require significant
strategic thinking are left out of the AI's hero pool. Axe was pushing Radiant
top without a TP while his base was getting destroyed and all of his other
teammates were dead.

On one hand, it's easy to _feel_ like OpenAI Five is making good progress
because it's legitimately challenging pros. Upon deeper analysis though, the
bots haven't actually demonstrated ability at more than a 2.5k-mmr level on
anything other than laning, teamfighting, and perhaps defensive map movement.
Given that I haven't seen much evidence that the AI is making progress on the
truly strategic elements of the game, I'm not entirely convinced that what we
are seeing isn't close to a plateau already.

More breakthroughs are certainly possible, but I've seen enough to make me
more skeptical than I was before I'd seen OpenAI Five play with my own eyes.

~~~
rightbyte
I feel the bots greatest strength are their "aimbot" abilities. They have such
great precision from their "targetEnemy(obj)". They know perfectly when in
range or not. They are never looking on the other side of the map.

Without this they would get stomped.

~~~
arenaninja
I think you're right. The mechanics of the game are HUGE. From misclicking,
miscasting, mistiming a spell, last hitting, etc.

------
simonebrunozzi
Question for the OpenAI team: have you ever thought about applying Five to
other games, as it is?

I'd suggest Total War: Warhammer II (it has an interesting competitive scene,
and has both tactical combat, and strategy gameplay), which is very different
than DotA 2, but it would be super intriguing to see how it performed, and how
fast it could learn.

If you built Five in the right way, it should be able to learn with mostly any
other game.

I can also imagine you offering this to gaming companies in the near future,
so that they could provide a decent computer AI instead of the crap they
usually offer now :)

(source: I used to be a videogamer in my teens, and occasionally still play
some strategy games, not as often as I would like to :D)

~~~
ufo
IIRC some of the main reasons OpenAI chose Dota2 were that it ran on Linux and
had a bot API. Does Total War fit these requirements?

