
OpenAI’s Dota 2 defeat is still a win for artificial intelligence - dsr12
https://www.theverge.com/2018/8/28/17787610/openai-dota-2-bots-ai-lost-international-reinforcement-learning
======
kenhwang
As someone who played a bunch of Dota2 (and was present at TI where this
happened), the bots were a joke -- and the humans that played against them
were not only super sandbagging and screwing around, they quickly figured out
the bots' preferred strategy (deathball) and came up with an effective
counterstrategy (split push) that the bots responded very poorly to.

The bots excelled at things you expected computers to be good at, consistently
quick reactions/mechanical skill and coordinated skirmishes. The bots fell
flat on strategy, optimal resource usage (short and long term), and most
importantly, coordinated decision-making.

So as an result, they dominated the early game which has always been heavily
mechanics favored and absolutely fell apart mid/late-game when decision-making
mattered (and often made decisions that benefited early game economy at the
cost of mid/late game).

Computers being good at mechanics is not an impressive result.

~~~
whymauri
Calling them a 'joke' is pretty extreme. Did you watch the benchmark? They
were pretty much steamrolling the caster team.

With all due respect, the professionals who have played against the bot have
been pretty impressed with previous iteration. The major roadblocks at TI were
probably that the bot did not efficiently learn courier mechanics and that the
draft was not done by the bot.

~~~
kenhwang
The caster team played against the iteration of the bot that was trained on a
different version of the game that's very unlike the normal game (multi-
invincible couriers).

Multiple invincible couriers makes the game very similar to the 1v1 show match
last year where the strategy of just out-live the opponent wins. The casters
have never played that version of the game before, and wouldn't know to use
that strategy.

The more interesting part was, the bots attempted to use that very same
strategy in the single-vulnerable normal version of the game which was
drastically less effective; you could only ferry out consumables at 1/5th the
rate as before and the courier could be killed so it wasn't a riskless
strategy like it was before.

~~~
whymauri
I mean yeah, this version of the bot had drastically less time to train. Only
6 days compared to a few weeks for that version. I know how the game works, I
mentioned the couriers already.

I 100% disagree that the invincible couriers make the game "very similar" to
the 1v1. There's many more layers of engineering that went into that version.
The bots had to learn drafting, cooperation, highground pushing, Roshan,
vision, invisibility, etc.

I'm amazed you can trivialize all this as a 'joke'.

~~~
alper111
If you look at network structure, it acts as one agent, not five. So, free
coordination. (See: [https://t.co/GPKHPsIu1C](https://t.co/GPKHPsIu1C))

In my opinion, what i see is a very good player who knows how to chain stun
precisely without any strategic depth. If you claim to have built an AI
system, which you ultimately want it to evolve to AGI, you at least expect
some sort of strategic decision making at the macro level. Though since it has
almost perfect micro, it can easily outweight the most of teams. So yeah, with
that expectation I see this as a joke, too.

P.S. The model is trained with 128k cpus and 256 gpu. It is able to play 180
years worth of game in a day. Think about it.

~~~
whymauri
It's five independent agents. The article on OpenAI's website and the network
structure both say this. I'll zoom in since it's a complicated structure.

It's the first line of the article: [https://blog.openai.com/openai-
five/](https://blog.openai.com/openai-five/)

>Our team of five neural networks,

They use a hyperparameter called team spirit to cooperate. I don't think the
goal of this is AGI at all, so I don't see why people are making that leap.
But sure, for the geniuses of HN this must clearly be trivial.

~~~
backpropaganda
It's not independent agents. The neural networks have the same input, share
weights, and also share some activations. With that much sharing, it's better
to think of it as one neural network which has output heads for all the 5
players. So coordination is free. Actually, coordination makes as little sense
as saying that multiple neurons in a neural network are cooperating, or that
the two legs of a humanoid are cooperating to walk. Further there is no game
being played between heroes of the same team. They literally have the same
objective. The "coordination" buzzword is just another attempt by OpenAI to
confuse and mislead readers, and give a false sense of their progress.

~~~
whymauri
They cannot share the same inputs unless the team spirit hyperparameter is
exactly 1, which it is not. You are partially correct in that the agents
consume the parameters of the four other agents, but it is weighted
differently accorsing to team spirit parameters.

~~~
backpropaganda
The team spirit hyperparameter is a crutch they've introduced themselves.
Ideally it should be one. In Dota there is only one objective for the entire
team and that's to win the game. The fact that they shape rewards is an
implementation detail and doesn't change the fact that Dota 2 does not require
cooperation, because there's no cooperation game being played. It's a purely
zero-sum adversarial game being played between two teams.

------
cowmoo728
Isn't it the case that many behaviors had to be specifically encouraged or
coded into the bots to get them to work at all? They learned an impressive
amount on their own after being pushed in the right direction, but this is far
from a pure random reinforcement learning approach as this article suggests.

This article also really understates how clowny the bots were playing. They
appeared confused and aimless for much of the game, until they found a clear
objective like an enemy hero that moved too close to them, or they had a
numbers advantage that allowed them to pressure an enemy building. They were
brutal at closing gaps and pursuing enemies to kill without hesitation, but
didn't perform well at all beyond that.

This isn't to take away from the accomplishments of OpenAI - to get to this
stage after 18 months of work is an impressive engineering feat. It's just
telling that the bots have only learned one single strategy, don't even
execute that very well, and still have to play with hero restrictions that
rule out counters to that specific strategy of aggressive skirmishing and
pushing.

~~~
mywittyname
> It's just telling that the bots have only learned one single strategy, don't
> even execute that very well, and still have to play with hero restrictions
> that rule out counters to that specific strategy of aggressive skirmishing
> and pushing.

I suspect this has to with the fact that the bots were designed to be agents
with no leadership. As you say, the bots performed very well in skirmishes,
they lost because they didn't really know what to do beyond that.

They might fare better if one of the bots could act as a team captain and
provide high level instructions to the team, i.e., assign people to harvest,
defend, or push.

It seems like that's what the human teams have over the bots -- somebody
telling the individual team members what to do.

~~~
Notorious_BLT
Generally, captains in Dota teams don't give specific instructions on what to
do when not making specific plays. Carry players will default to returning to
farming as efficiently as possible given the current state of the map. Support
players will default to placing wards for map vision, stacking camps for the
carry to farm, and searching for potential kills. But there definitely is a
need for a shot-caller, and OpenAI has no idea how to play that role.

OpenAI 5 had a number of glaring flaws in what it learned. It seemed to have
never come to the conclusion that certain characters would have better
outcomes if they focused on certain activities. Every character farmed, any
character would grab the Aegis (reward item) after killing Roshan (usually
this is reserved for a character likely to be in the middle of the fray,
dealing a lot of damage).

Another problem was that, until the last few weeks before its matches against
pro players, they had allowed both teams to have 5 couriers. This allowed
OpenAI 5 to keep up relentless pressure by constantly bringing themselves
consumable health regeneration items. This was so unlike a regular game of
Dota that the community (including their opponents) complained because
normally the courier's time is a valuable resource, and the courier can
normally be killed if it is used too close to enemies. With an endless supply
of healing items with no risk involved, it didn't even resemble a game of
Dota. They did away with this for TI, and it revealed a serious weakness in
their relentless-aggression strategy.

I get the impression most of OpenAI's games vs itself ended very quickly, with
one team making a mistake during the relentless-aggression early game pushes.
As a result, it seemed to have no idea what to do differently as the game went
on.

~~~
dropit_sphere
>Every character farmed,

I'll own up to wanting to chalk this up to "AI wisdom." The feed-the-carry
strategy has always seemed like _precisely the same kind_ of premature
optimization that makes AI often easy to beat.

------
FartyMcFarter
Good article!

> First, OpenAI could have created a vision system to read the pixels and
> retrieve the same information that the bot API provides. (The main reason it
> didn’t is that it would have been incredibly resource-intensive.)

Mmmm, no. Reading the pixels would not retrieve the same information that the
bot API provides, because the screen does not contain all that information at
any given time.

Any Dota 2 player will know that they don't have access to all this data
simultaneously:

[https://s3-us-west-2.amazonaws.com/openai-
assets/dota_benchm...](https://s3-us-west-2.amazonaws.com/openai-
assets/dota_benchmark_results/network_diagram_08_06_2018.pdf)

~~~
tlb
It would have to remember the data that is off-screen. It's quite possible to
train a recurrent network to do this. Some Atari games require this to play
well, like Pacman requires remembering where the ghosts are when they're
blinking.

More complicated is the logic to know what to look at when, by keeping track
of which parts of the state might be important and out of date, and
controlling the viewpoint to update it. I haven't seen a good example of this.

~~~
FartyMcFarter
True... well, almost, since data from previous frames becomes stale with time.

I agree that learning how to retrieve the info would be another rather tough
problem (on top of all the tough problems the current incarnation of OpenAI's
bots already need to solve).

------
guskel
Humans still beat machines at most of the Atari 2600 games.

[https://github.com/cshenton/atari-
leaderboard](https://github.com/cshenton/atari-leaderboard)

And that list is incomplete if you want to include human high scores on non-
emulators (not included in that list). This is even after the reinforcement
learning algorithms have been given orders of magnitude more training time
than humans. Furthermore, much of the machine performance over humans can be
attributed to better reaction time.

~~~
yazr
You are incorrect.

IIRC, you can easily build a super-human Atari bot. Just add some off line
planning (e.g. paper by Guo), or add manual rewards or features.

No one is doing since there is no point. We want to study Atari WITHOUT these
"cheats", so that we can then apply these algos in more complex situations.

~~~
daveguy
I think the OP was using "machines" as reinforcement learning (or whatever the
best general learning algorithm). It doesn't count if you program a solution
for each individual game. These are learning algorithm benchmarks not
programmer benchmarks.

For generalized reinforcement learning algorithms. Humans almost always beat
the "machine". The only cases where machine wins is on 100% hand eye
coordination and duration.

All ML algorithms are terrible at figuring out how to plan de-novo. So
anything that requires multiple contexts or planning is a fail.

~~~
guskel
Yes, my intent was referring to reinforcement learning algorithms. Writing a
handcrafted program to beat an arbitrary Atari game is trivial.

Yet on the other hand, it is apparent that reinforcement learning algorithms
have surpassed humans in board games such as Go and Chess.

------
ir193
_Instead of coding the bots with the rules of Dota 2, they’re thrown into the
game and left to figure things out for themselves_

I saw a strange behaviour. In the second game, I saw witch doctor use
"Maledict" on neutral creeps, but this skill only affects enemy heroes. This
only waste mana, put the skill in cool down and have no benefit. How can AI
learn it?

~~~
SolarNet
Because neural networks are noisy, and they cause small mistakes and
misunderstandings like in humans. The AIs concept of an enemy obviously
doesn't have as strong of a distinction between "hero" and "creep" (and for
that matter "enemy" and "neutral") as those of us who have read the game rules
might have _because_ of your quoted statement.

As a further analogy I often see humans make ridiculous and insane leaps of
logic (how assignment works when you first learn programming; free speech
applies to things besides governments; etc.), and we run on the same style of
hardware.

To an extent this demonstrates how far we are from generalized AI. All the
Open AI system is is a very specialized animal, that's lived through millions
of simulated generations very quickly. It doesn't understand language or
concepts - we can't even begin to tell it the rules of the game, only though
simulated evolution can we teach it - or even the self awareness many animals
do. It's likely closer to the nervous system of an insect than it is to the
brain of a dog.

------
YeGoblynQueenne
>> Such a thing does not exist because machines think like humans in the same
way that planes fly like birds.

Could we please lay this empty platitude to reast? "Planes don't fly by
flapping their wings like birds, so why should computers think like humans?".

Well, except that we were only able to built flying machines (rather than
floating ones) when we figured out _why_ birds can fly [1].

And then, the whole of our computer science is based on the idea that a
computer is, actually, the same kind of system as a human mind- a
computational device, a machine that can compute everything and anything that
can be computed. This is the deep insight that informs the ambition to create
artificial intelligence: that brains are a kind of computer, computers are a
kind of brain, and they can both compute the same kind of program, in other
words- intelligence.

Though we may not know how human minds work exactly and therefore we can't
readily copy them, it is thanks to Turing's and Church's insight that we even
have computers today. And so, in a very real sense, we can only ever create
thinking machines that do intelligence in the same way that humans do
intelligence [2].

_______________________

[1] I mean, a boomerang is really an airfoil so I guess Australian Aborigines
had figured out flight long before the Wright brothers, but I gues they didn't
need flying vehicles back then?

[2] Unless there is a paradigm shift and we figure out a better way to do it
etc etc disclaimer disclaimer.

~~~
jimbokun
Brains are not computers, and brains were not the model for computers.

Formal logic and the Turing machine were the model for computers.

Turing and Church did not invent computers based on their careful study of
human biology and physiology.

~~~
YeGoblynQueenne
The point is that human minds and computers are both computational devices of
equivalent power and that we believe that creating artificial intelligence is
possible because intelligence is a program that can be computed by a Turing
machine just as it is computed by a human mind.

Obviously, the Turing-Church thesis didn't have anything to do with brains. It
does, however, have very important implications about our ambition to create
human-like artificial minds.

~~~
Isinlor
There may be an issue with running a "brain" program on a non-brain hardware
depending on what the program is. We don't know what the program is so it's
hard to speculate but, it is possible that brain program is not well fitted to
silicon based architectures. It may turn out that low-level physical and
chemical processes that are crucial to brain functioning are prohibitively
expensive to simulate. Brains after all operate in the perfect simulator and
evolution had no reason to avoid expensive to simulate solutions. I would
guess that it is rather unlikely, but it is possible since even a single cell
chemistry, well even a single organelle chemistry is complex. Simulating
processes involving RNA may turn out to prohibitively expensive.

Also, I'm sure that theoretically optimal "intelligence program" on silicon
processors is vastly different from organic ones. They may share a lot, but
the constraints are just so vastly different including the most important ones
like energy efficiency or space availability.

~~~
YeGoblynQueenne
>> There may be an issue with running a "brain" program on a non-brain
hardware depending on what the program is.

That's not impossible! For example, it's easy to find algorithms that a
computer can carry out without error that a human mind would really struggle
with. Although this is a case of computational resources, rather than the
expressive power of the computational apparatus, it's still the case that it's
not always possible to find programs that both humans and computers can
compute _in practice_.

So it may even be that, while computers can run some programs very
efficiently, that humans can't, it's the other way around also and computers
can't efficiently run the programs that humans can.

In which case of course, either the entire AI enterprise is doomed to failure,
or we get lucky and there is some other way to do intelligence that is
available to computers but not humans. Who knows!

------
YeGoblynQueenne
>> The more important question might be: can we ever have a fair fight between
humans and machines?

Fair or not fair, there is no API to the real world. If we are going to create
machines that can learn and think outside of simulations, however complex they
may be, these machines will need a way to interact with the physical
environment.

Gary Marcus is on the right track here. Building systems that can "handle the
complexity of the real world", as per OpenAI's stated goal according to the
article, is incredibly ambitious and pointing out the vast distance separating
the current state-of-the-art from that lofty goal is, well, only fair.

I mean, if you think about it, back in the '70s, in the original AI Winter,
one of the big criticisms of AI research was that it languished in simulated
environments like blocks world and didn't perform nearly as well in the real
world. And here we are today, celebrating a bright step on the path to
conquering yet another simulation.

~~~
piyh
Our simulations have gotten way better in 40 years. You could teach an AI the
basics of simulation driving or flying that would work as a baseline for the
real world.

~~~
YeGoblynQueenne
Unfortunately, modern, statistical machine-learning based AI is extremely bad
at generalising from one environment or one context to another. And, training
one some task in a simulation and then performing the same task in the real
world requires a very strong ability for generalisation. That is because it is
extremely expensive to simulate the real world with any fidelity and therefore
every simulation is "cutting corners" \- and really, really big corners at
that. This gap, between reality and simulation must be covered by
generalisation, but our machine learning systems genearlise too poorly.

As a result, training in simulated environments doesn't help handle the full
complexity of the real world. Even if we had robot bodies that could move as
freely and manipulate objects with as much dexterity as they can do in
simulations.

~~~
Isinlor
Well, for very simple, repetitive tasks it does work:
[https://blog.openai.com/generalizing-from-
simulation/](https://blog.openai.com/generalizing-from-simulation/)

But you are right that it will fail on anything that is hard to simulate. And
a lot of trivial things are very hard to simulate.

Tesla recently failed at things like plugging two cables together. Picking
stuff like bags at Amazon warehouses may be another example.

~~~
YeGoblynQueenne
It's even hard to tell whether that robot hand pushing the puck is actually
"generalising" like the article claims. Maybe it is, maybe the task is set up
so that it makes it possible for the robot hand to push around the puck even
when it's on a bag of crisps. It's very difficult to know for sure.

------
aeosynth
Anyone know what the 'undiscovered mechanic' is?

> At least one previously undiscovered game mechanic, which allows players to
> recharge a certain weapon quickly by staying out of range of the enemy, has
> been discovered by the bots and passed on to humans

I assume they're talking about blink dagger, and something more advanced than
"don't take damage"?

~~~
baobolus
that's a bit misleading.

it was discovered that if you stay out of vision and cast raze, the other
player does not get stick charges. that was the 1v1 shadow fiend bot a year
ago though.

~~~
newsoundwave
If that's the case, that's incredibly misleading. Especially because that's
not undiscovered by human players, just by the OpenAI team, if that's what it
is: [https://blog.openai.com/more-on-dota-2/](https://blog.openai.com/more-on-
dota-2/)

"Sumail pointed out that the bot had learned to cast razes out of the enemy’s
vision. This was due to a mechanic we hadn’t known about: abilities cast
outside of the enemy’s vision prevent the enemy from gaining a wand charge."

That's not an undiscovered mechanic in the world of Dota - that's been known
for a while and at least documented since 2015
[https://dota2.gamepedia.com/index.php?title=Magic_Stick&oldi...](https://dota2.gamepedia.com/index.php?title=Magic_Stick&oldid=896457).

I wouldn't be surprised if it was known before then, I certainly remember this
from a while back.

Again, if that's not what it was, then I take back what I said, but if it was,
I do think that statement is misleading as written.

------
DrNuke
These are the latest wargames instalments (you know, the Matthew Broderick
kool-aid from his 1983 film and so on) and still in their infancy for
optimising local, transient tactics behind the overall goal. AIs playing
Civilization by Sid Meier vs themselves is fascinating enough and possibly
useful for transfer learning here? At the very least, the weakest links in
real mid-game situations coming from learned-led plans would be discovered.
Apart of military uses of such studies and many other techniques giant corps
masquerade behind emoticons and facial recognition apps, why not let the
latest AIs play vs climate change, for example? It would be much more fun in
general and possibly able to highlight the contribution of many impending
agents humans can’t fathom already.

------
YeGoblynQueenne
>> Already, the training infrastructure used to teach the OpenAI Five — a
system called Rapid — is being turned to other projects. OpenAI has used it to
teach robot hands to manipulate objects with new levels of human-like
dexterity, for example. As always with AI, there are limitations, and Rapid
isn’t some do-everything algorithm.

Oh, yes, indeed, there are limitations. The robot hand in question can only
manipulate cubes and then only a specific kind of cube with standard
dimensions, as far as it's possible to tell from all the demonstrations
publicised by OpenAI. And, I'm guessing, if they had a robot hand able to play
the yo-yo, they wouldn't hesitate to show it.

------
chaoz_
Typical title in this AI-Hype time period. Group of people came up with a
model and were able to compete with nearly unlimited training data. Wow.

Don't think match win would really mean something. Only from the marketing
perspective.

~~~
kamens
Sometimes it feels like there's a secret competition to see who can author the
next infamous "No wireless. Less space than a nomad. Lame."-style comment.

~~~
chongli
I saw that comment when it was first posted on Slashdot all those years ago.
The original iPod was overpriced and it only worked on Macs with FireWire. It
really didn't take off until they had a USB model working on Windows at a more
reasonable price.

------
knbknb
Was the full list of heroes supported in theses showmatches? Or was this also
restricted to a subset of, say, 15 out of 150 (or whatever that number is)?

The announcers/casters were a bit ambiguous about this, AFAIK they said
something like : "the teams agreed on a predefined, balanced list of heroes
before the game...", and there was no drafting.

(I'm not native speaker, and I don't know much about Dota2 rules)

~~~
Master_Odin
It was limited to a pool of 18 heroes still.

------
fabiofzero
The headline reads suspiciously like the legendary "Bitcoin has crashed:
here's why this is a good thing"

------
acroback
How is this monstrosity going to understand that it is not just practice which
makes humans good at team work in Dota2?

AI has long ways to go before it can defeat humans in a complex game like
Dota2.

~~~
ggggtez
Well the point is that practice actually is enough. Humans have a tendency to
think they are special because they can do X. Then a robot does X, and
suddenly the goal posts move. Humans are just specialized neural networks plus
meat.

~~~
yters
Once AI can move the goalposts, then I'll be worried.

