
OpenAI bots competing against Humans right now - kahlonel
https://www.twitch.tv/openai
======
ufo
In the last game of the series the bots were forced to play an intentionally
terrible hero lineup. The humans finally got to win a game but more
interesting for me was that the times when the AI did seemingly crazy things
were much more common.

I wonder if this is an artifact of the training methodology: maybe if your
team is very weak then your choices are also weaker, and reinforcement
learning doesn't work as well?

~~~
tux3
It reminds me of the Go AIs going on tilt when they're far behind.

When the win percentage for Go AIs gets to around 5%, every action it can take
results in a losing game so it can't make the difference between normal play
and super strange moves anymore.

When every choice is really bad, humans tend to still go with their normal
strategy and wait for their chance to turn things around, but bots assume the
opponent is playing perfectly, so they act like their winrate is going to stay
near zero no matter what they do.

~~~
nopinsight
If this is true (which it might be, I haven’t studied these systems in
detail), then an obvious fix is to get the AI to randomly train with ‘weaker’
versions of itself (perhaps while the strong instance is handicapped) in
addition to the latest generation.

Several levels of weak opponents should be used, with varying probabilities,
to tune the AI’s robustness against real-world, imperfect competitors.

------
kmnc
Very impressed with the bots. The main strategy I see from the bots that is
different from pro meta is how they spam abilities aggressively. Now the main
limitation is 5 unkillable couriers which enable this. Watching, I do feel
like given more practice humans would beat this version of the ai easily. It
looks exploitable, and still having many rules limitations it has a ways to
go. It will win in the end though, that much has become obvious.

~~~
skgoa
The bots also have perfect knowledge of enemy hp/mana and the relative
positioning through the API. In fact the bots are gifted perfect
micromanagement through the API. This enables them to do things most humeans
wouldn't try, because the risk of messing up and giving the enemy a huge
advantage is too high.

~~~
yvdriess
The API does not give the bots extra information compared to a human player.
The micro- and reaction time edge is also being dulled to being more human-
like. They still have superhuman team fight execution though.

"We’ve increased the reaction time of OpenAI Five from 80ms to 200ms. This
reaction time is much closer to human level, though we haven’t seen evidence
of changes in gameplay as OpenAI Five’s strength comes more from teamwork and
coordination than reflexes."

~~~
Super_Jambo
Part of the advantage in teamwork and coordination is presumably that they
don't have a limitation on what data they can view at once?

Dota & HON had people mod their client to give an optional bigger FOV
resulting in bans for cheating.

I'd assume the bots don't have to specify their screen position, plus no
orientation response means a limitation on this wouldn't be meaningful anyway.
What I'm saying is there's a big difference between 'I see lion on the mini-
map down there' and 'lion showed for 1 frame on the other side of the map, his
HP is 324, he has a TP scroll, no boots and a health pot.'

Something I noticed is the bots seem to like range and AoE far more than the
normal human meta. The humans being limited in the distance they can see to
one screen were frequently just failing to appreciate how dangerous 2-3 bots
half a screen away were to them.

Quite a few teamfight wins came from the bots inevitably causing far more
damage to the entire enemy team via heros like DP & Gyro. But this isn't
really perfect teamfight execution. I'd have really liked to see a mirror
match.

~~~
skgoa
Correct, the bots "see" the entire map. Well, the parts that are not hidden by
'fog of war'.

------
nopinsight
What the cases so far tell us: Once an AI beats humans who have spent 10,000
hours practicing a skill, it is a matter of time before it beats the best
professionals in the field.

Cases it already happened: board games such as Chess and Go, Poker,
diagnostics of certain diseases using medical images

Cases where AI is still clearly inferior: video understanding, natural
language _understanding_ , motor control esp of hands and legs, general
medicine, driving

Hard-to-classified cases (AI is better for some instances, worse for others):
image tagging and classification, speech recognition (speech-to-text),
diagnostics of certain other diseases using medical images (which might need
to take into account other information outside of images)

More examples esp counter examples are welcome.

~~~
badpun
> Poker

This is a misnomer.

The only variant of Poker where AI beats humans is heads-up (two-player)
variant, which is simplest form of poker and also rarely played. The AI was
(marginally) beating humans there by playing a game-theory-optimal strategy.
For poker games with 3+ players, the GTO strategy (Nash equilibrium) no longer
exists, so AIs need to use more standard techniques (search-based,
reinforcement learning etc.), which are, at the current state of the art,
laughably weak at poker.

Not to mention, that in poker the actual hierarchy of players' skill is not
100% obvious. You could distinguish at least two areas of skills:

\- play vs other experts

\- play vs amateurs/weaker players. Here' the goal is not to come out ahead
(which, in long term, is a given), but to _maximize_ the dollar amount taken
from these players, which is a skill in itself.

~~~
nopinsight
Thanks for the clarification. I do not know much about poker.

The observation applies to a given variant of poker (or any other domain). So
if an AI beats humans with 10000-hour experience in that variant, the best
experts in that specific variant are not far-off targets.

Superficially similar problems might in fact require very different techniques
to solve as your example illustrates.

------
andreyk
A good thing to read to know the basics of the game and what to pay attention
to:
[http://smerity.com/articles/2018/n_things_to_look_out_for_in...](http://smerity.com/articles/2018/n_things_to_look_out_for_in_openai_benchmark.html)

~~~
eerikkivistik
I watched the game against audience members, and the bots seemed to overextend
at times. Also weird courier scouting glitches (which makes sense if all
couriers are currently invulnerable for the purpose of this generation of
bots). Another funny thing was watching them use smokes for no apparent
reason.

~~~
kahlonel
They overextended in the enemy safelane. Maybe killing the enemy carry is #1
on their priority list :D

~~~
eerikkivistik
Yeah, I was thinking more in the lines of the aftermath, after diving –
getting stuck in the forest behind the enemy tower. But you are correct, they
seemed hella determined to get that Slark kill!

~~~
ionwake
haha yes I saw that, it seems forest navigation is a strength though, tho
sometimes they dont search

------
abhiminator
This is indescribably exciting. Can't wait to see OpenAI Five tear through the
human players just like what the single bot did to 'Dendi' \-- a professional
gamer -- during the International 7 tournament in August last year. [0]

[0]
[https://www.youtube.com/watch?v=wiOopO9jTZw](https://www.youtube.com/watch?v=wiOopO9jTZw)

~~~
TaupeRanger
It's not that exciting and it's very easy to beat humans at games when your
reaction times are far faster and you have access to more information at a
single moment in time. It's like being impressed by a calculator.

~~~
kenning
dota 2 is not as extreme about this as, say, starcraft. in starcraft, most
"strategies" are heavily scripted build orders that almost never deviate from
a handful of openings (similar to opening moves in chess) and a large amount
of winning comes from the ability to quickly give orders to your troops
("micro"). This is an example of a strategy that is not viable without
superhuman reactions:
[https://www.youtube.com/watch?v=IKVFZ28ybQs](https://www.youtube.com/watch?v=IKVFZ28ybQs)
if you directly attack 20 siege tanks with 100 zerglings you will only kill
about two siege tanks, but an AI can kill all the siege tanks with some
zerglings left over.

There's some of this in dota, but there's a cap on the skill level for most
playable characters that pros generally get "close enough" to, and beyond that
the strategic depth comes from area control decisionmaking. Theres over 100
heroes and many of them have really weird abilities, like the possibility of
creating a temporary wall (earthshaker) or the ability to teleport anywhere on
the map every 20 seconds (furion). I could be wrong though, maybe the AI is
winning games by playing heroes with long range and perfectly microing them to
harass and prevent the other team from ever getting gold/xp.

~~~
halflings
> in starcraft, most "strategies" are heavily scripted build orders that
> almost never deviate from a handful of openings (similar to opening moves in
> chess) and a large amount of winning comes from the ability to quickly give
> orders to your troops ("micro")

As somebody who plays StarCraft casually (gold/low plat in ladder), this is
not true. It's even less true for pro players. The level of strategy in
StarCraft is impressive, it's really hard to guess in which direction games
will go when two very good players are playing against each other.

Sure, perfect execution when it comes to one strategy (say, mech-heavy Terran)
will give you the largest advantage against your opponent, but failing to
scout appropriately and guess what your opponent is up to means your strategy
is dead. You also have to decide when to attack, how much you're willing to
sacrifice to damage somebody's economy, when you want to focus one economy vs
building units, ...

The video you sent with zerglings is a gimmick made for fun (it's a hard-coded
AI using the siege tank's aim logic to divert zerglings from that). That would
_not_ win you a game. (because most likely a pro Terran would have destroyed
your base before that)

~~~
fizx
I wonder how far you'd get with a bot that macros perfectly but also A-moves
2-3 groups.

~~~
halflings
What does "macro perfectly" mean? If it does the same strategy over and over,
you just scout, find its strategy, and go for the counter. Its macro will be
useless if it has the wrong type of unit.

In a way, the built-in AIs "macro perfectly", but they are terrible at
strategy and fighting (because even fights are not just a matter of gimmicks,
you need to split units in a special way, send diversions, attack at the same
time from multiple fronts, etc.)

------
vn0m
Only one LSTM per bot so there is no "master strategy" neural network.
Interesting, if I'm not wrong, bots are learning independently.

~~~
currymj
During training they were given access to each other’s reward functions, and
the extent to which they weighted total team reward over their own was
gradually increased.

------
lawrenceyan
Expecting 3-0 slam dunk here. Consistently surprised at the ability of tried
and true basic reinforcement learning at completing challenging tasks. I
totally expected some kind of neccessary breakthrough in the RL field before a
real time game like Dota 2 could be beaten.

~~~
ionforce
I would love to know why the third match turned sour. I suspect (as an amateur
with no ML background) that that matchup was under-trained.

Like I could imagine OpenAI getting stuck in a subset of the draft pool for
which it trained against, like maybe the top 10 of 18 champs. And then picking
outside of that meta causes it to fall back on much less robust
training/strategy.

~~~
zackaman
Because the first two matches were so lopsided, the bot lineup was selected by
twitch chat + audience members. We drafted them a pretty terrible lineup, and
from the start the bots estimated their chance of winning to be about 2.9%.

~~~
mellinoe
Indeed. And to explain further: not all hero combinations are equal. Meaning:
you cannot select any arbitrary set of 5 heroes and expect them to perform
well. Different heroes have different strengths and synergies that make them
stronger or weaker depending on the specific teammates and opponents that are
present. This is why drafting is considered such an important (and difficult)
portion of the game. In match 3, a purposefully-bad team was selected. It
would have been VERY impressive if it was able to win.

~~~
ionforce
> Meaning: you cannot select any arbitrary set of 5 heroes and expect them to
> perform well.

When I think of AI, I think of something crawling its way out of purposefully
adversarial situations such as this one. I would have loved to see optimal
play from 5 wacky heroes.

I just have this suspicion that that wasn't optimal for that team comp.

But of course the matchup itself is a thing.

------
jlebar
If there are any OpenAI folks hanging out here, I'd be really curious to hear
about the (apparent) tactical pause by the AI in game 2.

~~~
mirashii
It wasn't a bot, it was an observer. A bot would have it's name appear.

------
wuschel
Other thread:
[https://news.ycombinator.com/item?id=17692577](https://news.ycombinator.com/item?id=17692577)

------
NegatioN
My thoughts on the AI's performance so far knowing a bit about both DOTA and
machine learning:

\- Item-usage for things such as smoke and wards (which were recently added to
their reportoire) are not well captured by the bots yet. And the buying of
wards were confirmed as a scripted event. It seems hard for them to capture
the long term sparse reward of these. Smoke might not be needed by a perfect
agent, but wards should be. The developer interview noted that it's not very
clear what the reward for an agent warding even should be.

\- Some of the big advantages OpenAI gains are in team fights, where it's
200ms reaction time (upped from 80ms to resemble human reaction time more)
still strikes me as something that tilts it into a solid mechanical advantage.
On several occasions OpenAI's Lion managed to disable a human player
performing (what I assume to be) a move which shouldn't be interruptable.
(blink-->shift+ctrl ultimate ability on earthshaker) This could have tilted
teamfights in the human team's favor a few times if it wasn't stopped by the
"machine-like" reflexes of OpenAI.

\- Positioning before team fights by OpenAI are scary. There are very few
openings, and every individual agent is protected by its team.

\- Likewise is the map movement scary most of the time. Being able to recall
just enough agents back to defend while simultaneously taking out strategic
objectives of the human team. Also Blitz(caster) has noted earlier OpenAI's
ability to focus on the winnable lanes and sacrifice the others, prioritizing
well. (and exploiting map mechanics unknown to most pros until a few years
ago)

\- When the AI takes down a tier 1 tower, they seem to be very quick to take
down the remaining tier 1 towers, instantly capitalizing on their map control
advantage, and expanding it.

Some interesting things / bugs:

\- Sniper bot throwing multiple spells on the same location right away (even
though the damage doesn't stack) effectively simply wasting his mana and
cooldowns, for no gain.

\- Sniper using his ultimate ability to pressure the lower hp characters of
the human team continously. Usually it's more often used more as a finisher,
and might be an artifact that appears from the AI having access to an
unkillable courier that ferries a lot of healing/mana regenerating items.

Some additional info learned from the interview of some of the devs:

\- Incentivizing killing Roshan (a boss character in the middle of the map,
which yields a one-time ressurection item for one player, after being killed)
is done by varying roshan's HP down to a really low amount, making sure the AI
experiences the upside to this. Otherwise it would require all 5 agents
gathering there, expending their magical abilities and investing a lot before
actually seeing a reward. (which is unlikely to happen)

\- Game length in self-play sessions are above 60 minutes around 1% of the
time.

~~~
jjjjjjjjjjjjjjj
> Lion managed to disable a human player performing (what I assume to be) a
> move which shouldn't be interruptable

Fogged (the human player) commented on this and said he messed up. If he had
shift-queued the spell, or just used it immediately after blink, it would've
landed [0].

Cancelling an initiation with instant spells (Lion hex, Rubick lift, etc) does
happen frequently in high level human play as well, where you continuously
pre-cast the spell, cancel, walk back slightly, repeat, on the out-of-range
initiating enemy, to have the spell interrupt the initiation as soon as the
initiating enemy blinks in to range. I do agree that the bots have a solid
mechanical advantage, just pointing out that this specific scenario does
frequently happen in human play as well (albeit not on every single
initiation).

[0]
[https://www.reddit.com/r/DotA2/comments/94vdpm/openai_hex_wa...](https://www.reddit.com/r/DotA2/comments/94vdpm/openai_hex_was_within_the_200ms_response_time/e3ofipk/)

------
dantheman
Congrats OpenAI on game #1

~~~
std_throwaway
And game #2!

------
ionwake
hi,does anyone know if openAI release an up to date state of learning for
their open AI dota code?

Basically can I run the same sim on my laptop and watch them play? I can see
some code on Github but dont know if the actually neural net data is available
too.

If anyone knows the answer that will be great thanks

~~~
minimaxir
Even if they did release the likely-very-big model, your personal computer is
likely not fast enough to make 2-3 actions per second (x5) and update weights
in real time without a beefy GPU.

~~~
drexlspivey
You don't need to update any weights, the model is already trained.

~~~
lma21
Ignorant here. How do you take an already trained model and execute it
elsewhere? Isn’t the training phase part of the whole (ongoing) simulation?

~~~
sethgecko
The training phase is when you are trying to minimise your loss function by
trying different weights each iteration.

Once that is finished, you have a trained model which you can use by providing
input and getting an output (with the weights frozen).

The training phase is very expensive computationally because you have to
calculate the gradient of your loss function on potentially huge tensors.

The execution phase is not that expensive and commercial laptops will be
likely able to run the model without any problems.

------
hackandtrip
Chat deciding heroes can be so cool! We may see new strategy, AI out of the
safe lane confidence!!

------
rasz
clicked on the stream, title should be changed to "getting destroyed by
Humans"

~~~
c-
Let's just ignore the first two 25 minute games in favour of AI

~~~
jsheard
And that the AI was handicapped in the game it lost, as they let the audience
draft for the AI team and of course they deliberately picked a bad lineup to
see what would happen.

------
bufferoverflow
First match is over, human pros got wrecked, and very fast.

~~~
FartyMcFarter
They were audience members, not pros.

~~~
scottlegrand2
The marketing on this whole effort is strong and the training budget was
insane. That said until it defeats the best of the best it's just really
amazing machine learning as opposed to game changing machine learning IMO.
Also keep in mind that there's a game state API here, this is not at all like
learning 2600 games from learning the screens alone which would make me fear
we were on the verge of the robot apocalypse personally.

------
yters
AI winning due to super human abilities is about as impressive as a Counter
Strike bot with perfect aim making headshots every time.

~~~
thomasahle
Check the DeepMind Quake bot: [https://deepmind.com/blog/capture-the-
flag/](https://deepmind.com/blog/capture-the-flag/)

Even when reducing tagging accuracy to below human level, they still performed
better.

