
Grandmaster level in StarCraft II using multi-agent reinforcement learning - hongzi
https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning
======
jplayer01
Honestly, the article needs to be replaced with
[https://deepmind.com/blog/article/AlphaStar-Grandmaster-
leve...](https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-
StarCraft-II-using-multi-agent-reinforcement-learning) which actually goes
into some technical detail. Nature.com's article is purely for laypersons and,
imo, not particularly useful for HN's crowd because of how little insight it
gives.

It also provides the paper and an archive of all of the AI's matches for
anybody who wants to take a closer look. These can be viewed with the free
version of SC2 (afaik).

Further links:

[https://doi.org/10.1038/s41586-019-1724-z](https://doi.org/10.1038/s41586-019-1724-z)
(supplementary data available here in json form)

[https://rdcu.be/bVI7G](https://rdcu.be/bVI7G) (public paper)

[https://deepmind.com/research/open-source/alphastar-
resource...](https://deepmind.com/research/open-source/alphastar-resources)
(replays)

[https://www.youtube.com/playlist?list=PLtFBLTxDxWOSrWZ8krQt6...](https://www.youtube.com/playlist?list=PLtFBLTxDxWOSrWZ8krQt6eDNXTpG67Xpf)
(list of older AlphaStar matches cast by an SC2 player)

[https://www.youtube.com/watch?v=l82wBa3UoZU](https://www.youtube.com/watch?v=l82wBa3UoZU)
(one of the newer matches cast by another player)

[https://old.reddit.com/r/starcraft/comments/dpaunw/deepminds...](https://old.reddit.com/r/starcraft/comments/dpaunw/deepminds_alphastar_ai_has_achieved/f5u9fx8/)
(win/loss rates across all the played matches by race, includes apm)

~~~
dang
This comment was originally posted on
[https://news.ycombinator.com/item?id=21408024](https://news.ycombinator.com/item?id=21408024),
but we've merged it into the earlier submission, which used the link you
mentioned.

~~~
jplayer01
Neat, thanks.

------
interblag
This is a really interesting one to digest. As with previous announcements
about AlphaStar, much of the feedback (here and elsewhere) is about the
fundamental challenge of assessing human vs. machine in an RTS. These points
are very valid - stepping back however, this still feels like a pretty
incredible accomplishment.

I'm a gold league SC2 player, so maybe in the 30th-50th percentile. Three
years ago, when DeepMind started this project (and after nearly two decades of
research into SC/SC2) I could probably have beaten the best non-cheating AI.
Now, after 3 years, this AI is playing at a Grandmaster level, under at least
a reasonable approach to fairness. By comparison, according to the AlphaGo
paper [1] the best Go AIs prior to AlphaGo were playing at a 6 Dan level,
which looks to be somewhere in the 90-98th percentile [2].

The speed at which AlphaStar overtook previous AIs seems to me to be nearly
unprecedented in AI research. This is like if the world's best chess AI had
gone from losing high school tournaments to being competitive with Kasparov in
less than 3 years. Valid criticisms aside, this feels like an incredible
achievement.

[1]
[https://www.nature.com/articles/nature16961.pdf](https://www.nature.com/articles/nature16961.pdf)
[2]
[https://senseis.xmp.net/?KGSRankHistogram](https://senseis.xmp.net/?KGSRankHistogram)

~~~
jrx
> This is like if the world's best chess AI had gone from losing high school
> tournaments to being competitive with Kasparov in less than 3 years.

I don't think it's like that at all. On the high level, there is no "chess
AI", "go AI", "image classification AI" and "dexterous manipulation AI". These
are all sides of the same coin, that gets significantly better every year.
Adding support for the new game or new "environment" to existing deep learning
based backbone still requires a bit of engineering work and a few creative
tricks to unlock the best possible performance, but the underlying
fundamentals are already there and are getting better and better understood.

There is a reason why the progress in AI is so hard to measure. Anytime a next
task is solved, there is a crowd saying it's not a "real AI" and that
scientists are solving "toy problems". Both statements are totally true. But
the underlying substance is that each of these toy problems is of increasing
complexity and brings us closer and closer to solving the "real problems",
which are mostly so undeniably complex that we couldn't attack them upfront.
Still, the speed of progress in the field of AI research is staggering and
it's hard to keep up with it even for professional researchers who spend all
their waking hours working on these things.

6 years ago we were able to solve some Atari games from pixels. Today, that
feels like a trivial exercise compared to modern techniques. With billions of
dollars of investment pouring in and steady supply of fresh talent, it is very
hard to predict what the pace of research will be in the coming years. It is
entirely possible we'll encounter a wall we won't be able to overcome for a
very long time. It is also possible that we won't, and in that case we're in
for a very interesting next few decades.

~~~
dmreedy
> On the high level, there is no "chess AI", "go AI", "image classification
> AI" and "dexterous manipulation AI". These are all sides of the same coin,
> that gets significantly better every year.

On a practical level, this is not true. There are different algorithms,
different architectures, different hyperparameters required for each of these
problems, and often for each subdomain within each of these problems, and
often for each specific _instance_ of these problems. It's difficult to draw
any kind of holistic picture that combines all of the individual advances in
each of these problem instances; _that 's_ why progress in AI is so hard to
measure, and why a statement like "each of these toy problems...brings us
closer and closer to solving the 'real problems'" is probably a bit too
coarse-grained to be fair as well.

~~~
tialaramex
Are you writing this from last century?

Deepmind's best-in-class chess and Go AIs are the same code (AlphaZero) just
given respectively rules and game state input for either chess or Go and then
allowed to train on the target game.

One of the fun works in progress in this space is teaching AIs to play a suite
of 80s video games. Getting quite good at several games where the idea is to
go right and not die is pretty easy these days, but Deepmind's work can do a
broader variety only coming badly unstuck on games where it's hard to discern
your progress at all without some meta-knowledge.

~~~
dmreedy
I don't mean to imply AlphaZero is not impressive; it surely is. Nor do I mean
to imply that any of these advances aren't impressive. I do mean to imply that
"closed-world games with well-defined rules" is a relatively small subdomain
of problems. And that BERT looks pretty different from AlphaZero.

~~~
tialaramex
The post you disputed pointed out that there aren't separate AIs needed for
things like Go or Chess. Because there aren't (any more) the Deepmind work
showed that you can just generalize to learn all games in this class the same
way.

You claimed that "different architectures" are needed. Not true. And further
you claimed this is true even for "each subdomain". This would have been a
fair point in 1989. Traditional chess AIs approach the opening very
differently for example, relying on fixed "books" of known good openings. But
AlphaZero since it is a generalist doesn't do this, it plays every part of a
match the same way.

Now you've gone from asserting that Chess and Go need separate AIs to claiming
that since BERT and AlphaZero are different software it makes your point.
Humans pretty clearly don't have a single structure that's doing all the work
in both playing Go (AlphaZero) and understanding English (BERT) either - so
that's a pretty bold bit of goalpost moving.

------
Strom
> _Agents were capped at a max of 22 agent actions per 5 seconds, where one
> agent action corresponds to a selection, an ability and a target unit or
> point, which counts as up to 3 actions towards the in-game APM counter.
> Moving the camera also counts as an agent action, despite not being counted
> towards APM._

I'm happy to see that they've greatly improved the APM cap. During the earlier
showmatch they had an extremely naive cap on the average APM during the whole
game. Which meant that the AI could take it easy for most of the time and then
during battle sequences hit peaks of 2000 APM. Also the previous cap was based
on the in-game APM counter, which doesn't count some things like moving the
camera, which is now addressed. The current state sounds a lot better.

However it seems still superhuman in mechanical non-strategy ways, e.g. a
human can misclick (click too early/late when moving the mouse, or just miss
the target with the cursor completely) or do accidental double clicks etc.
These end up being very costly mistakes against a zero mechanical mistake AI.
Which in turn means that the AI can win with an inferior strategy due to the
extra gains it has via superior mechanical accuracy. In other words this
artificial intelligence is still under significacnt artificial motor skills
welfare, and thus even if it starts beating pro players we shouldn't be too
quick to talk about how it's the intelligence part that got it the win.

All of that said, I'm liking the progress and am excited to see what they
achieve with this next. Would love another showmatch against pro players.

~~~
Erlich_Bachman
Yes, those might have some impact, but it is clear that the progress is there,
with this new APM cap and camera movement etc.

You can also see in replays that the AI often makes mechanical mistakes,
missing spells, missing units, even ordering wrong units from outside the
screen - so it surely seems that if it's win rate was conditioned in any
strong way on its sheer mechanical ability, it would have learned to not make
any such mistakes. Since the mistakes are clearly there - it seems that its
power comes primarily from somewhere else, likely from the AI ability to
choose the right actions, not from mechanical power of executing them
perfectly.

Thus it is not likely that "this artificial intelligence is still under
significacnt artificial motor skills welfare". They have consulted top players
before they set this AI up, and if those details were important, they would
have baked them into the limitations list already.

~~~
Strom
The mistakes it makes are due to bad decisions. There have been no claims by
DeepMind that they have some sort of chaos engineering [1] going on where the
AI decides one thing and then the output system actually does another thing.

Also I think you overestimate the AI/IT knowledge of these top players that
they're consulting. I have great respect towards them, but they're not
renaissance men [2] who both play 10 hours of StarCraft per day and also know
the subtleties of how computers work, not to mention bleeding edge AI. You can
watch the previous showmatch [3] to see how DeepMind people lack knowledge of
StarCraft and how the pro players they're consulting lack understanding of the
AI and both are learning new things live on air. Its obvious that their
cooperation is bearing fruit as they spend more time talking to each other, as
evidenced by the new APM limits. However I'm willing to bet that they would
reach many more good conclusions if they just continue working together.

\--

[1]
[https://en.wikipedia.org/wiki/Chaos_engineering](https://en.wikipedia.org/wiki/Chaos_engineering)

[2] The problem with both parties (the pros & deepmind) is that they're so
overspecialized. I'm nowhere near as good as them at their respective fields,
but I am a professional programmer and diamond in StarCraft II. In addition
I've built StarCraft II AI myself, although with different goals related to
finding optimal strategies.

[3]
[https://www.twitch.tv/videos/369062832](https://www.twitch.tv/videos/369062832)

~~~
halflings
>[2] The problem with both parties (the pros & deepmind) is that they're so
overspecialized. I'm nowhere near as good as them at their respective fields,
but I am a professional programmer and diamond in StarCraft II. In addition
I've built StarCraft II AI myself, although with different goals related to
finding optimal strategies.

So many things wrong with this comment.

You're nowhere near a professional StarCraft player if you're in Diamond
league (I play casually and I'm on high plat, bordering Diamond), and Oriol
Vinyals, the lead research scientist behind this project, is one of the most
renowned scientists in the field _and_ used to play StarCraft at a
professional level. They also said that other employees at DeepMind are at
Masters level, and helped test the AlphaStar.

~~~
Strom
You quoted me saying that I'm not as good as them and then you say that my
statement is wrong because I'm not as good as them. We seem to be in
agreement. Maybe you misread? Anyway that was just a footnote to point towards
my generalist nature, because I think that's a fundamental reason why I
immediately see these things while for them it takes time to realize. Also to
be even more fair towards them, my generalist knowledge around computer
controlled gaming goes way beyond StarCraft II as I've written bots for many
different games.

My main argument revolves around APM. Oriol Vinyals might be great, but I've
also seen him make a statement on video in 2019 [1] how restricting the AI
based on average APM during the whole game is reasonable. He has blindspots
that someone like me can immediately spot.

\--

[1]
[https://www.twitch.tv/videos/369062832](https://www.twitch.tv/videos/369062832)

------
bsaul
Has anyone read the actual paper ? This summary really makes it look like
"mission accomplished", but this was much much more interesting than that. We
saw AI do "obviously stupid things", and we also saw them improve a lot in the
middle of the trial, as many youtubers showed.

AI was also much more interesting when playing the protoss race, and really
felt like it was responding to the opponents actions, and the other races not
so much.

But most surprising is that it didn't make any "breathtaking" moves or
actions, as opposed to AlphaGO. Actually not a single game made pro player
realize something new about the game. Which is really embarassing, because it
suggests that the AI just was able to correctly reproduce existing strategies
and build orders, that it probably "learned" from existing pro games in the
training sample.

I was really hoping for a more interesting report, honestly explaining the
shortcomings of the technics used and giving hints for the obvious blunders.
As well as a roadmap for a second round, this time aiming at beating the very
top players.

~~~
papreclip
IMO it's a testament to how games like SC are collectively and thoroughly
"solved" by the community, and how the games aren't that complex after all. I
never followed SC or SC2, but my observation of the pro scene and competitive
ladder for Warcraft 3 was that cookie cutter strats dominated. Pro players
were typically those who executed best, not those who innovated best.

Personally I felt disappointed by the fact that real-time strategy played a
pretty minor role in RTS games. If you left the well-beaten path of cookie
cutting and tried something new you invariably gave up a pretty obvious
advantage to do so.

~~~
JamesBarney
There is a lot of strategy in RTS's but alpha star doesn't need to exploit any
of it because it can win on mechanics. It has perfect macro, and perfect
micro, which allows it to beat players without having to learn strategies or
tactics.

It's kind of like competing against a gorilla in boxing chess. Just because a
gorilla is dominant doesn't mean boxing chess doesn't require chess skills,
only that a gorilla doesn't need them.

~~~
DuskStar
Are you sure it has perfect micro? Can you even execute perfect micro on 30
actions per 5 seconds?

~~~
confidantlake
"Perfect" no, much better than any human yes. 30 actions per 5 seconds is 360
actions per minute. The very top players might have similar stats, but the
similarities would end there. A lot of human actions are mindless spam
clicking and those 300+ actions would contain several misclicks. The ai would
never misclick. They can also do things like pull back every weakened unit
before it died much more accurately than a human. A human knows to do these
things but it has much less mechanical skill than the ai.

------
nightcracker
> After 50 games, however, DeepMind hit a snag. Some players had noticed that
> three user accounts on the Battle.net gaming platform had played the exact
> same number of StarCraft II games over a similar time frame — the three
> accounts that AlphaStar was secretly using. When watching replays of these
> matches, players noticed that the account owner was performing actions that
> would be extremely difficult, if not impossible, for a human. In response,
> DeepMind began using a number of tricks to keep the trial blind and stop
> players spotting AlphaStar, such as switching accounts regularly.

So instead of fixing the unfairness... they tried to hide it?

~~~
floitsch
These extremely difficult/impossible things didn't really give an advantage.
For example, AlphaStar would sometimes click on an object at the border of the
screen. For humans that would be almost impossible, because the screen would
scroll when the mouse approaches the border.

Similarly, AlphaStar would not play with group hotkeys, but use a different
technique. However, in none of the analyses, people noticed things that would
give AlphaStar an unfair advantage.

~~~
gwd
> These extremely difficult/impossible things didn't really give an advantage.

One of the videos I watched compared APM (Actions Per Minute) with EPM
(Effective actions Per Minute). AlphaStar always has them nearly identical,
which would be (according to him) basically impossible for humans.

~~~
Mirioron
Of course it's impossible for humans. Humans are used to clicking multiple
times to do an action. Some actions you want to take in the game are _very_
important, failure to do them correctly can mean a lost game, for example
moving units in combat properly. This means that it's better to spam click
that action a few times to make sure that the button press is registered
properly, because it is possible for button presses to fail due to hardware,
software, and most often a coordination error. This means that essentially
every player has a much higher APM than EPM, but it's really EPM that counts.
APM itself is mostly irrelevant.

~~~
skybrian
Hmm, seems like it might be interesting to add a slight error rate for AI
clicking and see how it handles it?

~~~
doctorpangloss
You only need excess dexterity than your opponent, and only for a couple
seconds, to decisively win a match in SC2.

Error rates, EPM and APM are all red herrings.

------
hoseja
There has always been the issue of interface when playing videogames AI vs
human. Either give the human a brain-computer interface or give the AI a
mouse, keyboard, monitor, robot hands and a camera. Anything else seems
inherently unfair.

~~~
username90
They are testing the AI's ability to do tactics and strategy, not motor and
visual tasks. It has built in delay and needs to move around the map to gather
information just like a human, so it doesn't have any significant unfair
advantages.

Edit: Note that the version they sent out to the ladder had significant larger
delay, significant lower APM, and didn't get any information not visible on
the screen unlike the first iteration.

~~~
jamra
However the computer can macro while in a fight which is something people
can’t dI. We would miss the fight and lose the game. It’s not just a tactics
game. Attention span and where your eyes are matter.

~~~
goalieca
It’s so bad that below master people shouldn’t really be doing too much micro
during a fight because they’ll lose out on macro. Even in pro matches you see
attention span issues and sometimes avoidance of too much micro to win. Really
good Micro is going to win you any battle.

~~~
jamra
I’m in diamond 2 and I definitely micro in battles. The difference between
platinum 1 and diamond 3 is largely macro but it very soon becomes micro as
well.

------
xorfish
[https://www.youtube.com/watch?v=3HqwCrDBdTE](https://www.youtube.com/watch?v=3HqwCrDBdTE)

Here is a video of someone finding out that he played against alphastar. (He
won)

------
sgillen
I see a lot of comments downplaying this achievement, saying it's not
impressive or it won't be impressive until X condition is met.

I welcome skepticism and criticism for this sort of thing, and think most of
it that I've seen here is well founded. But I would like to take a second to
explain why I think this, and really all the progress in this area, is
actually a really impressive achievement to me.

Let me try and frame this from the computers perspective. Let's assume a
resolution of 1024x780. I'm not sure what size frames they actually feed their
agent but it's not that important to the discussion, the point is it's a big
image, and according to the article this agent is learning from pixels. So,
you the computer are given let's say 1024*780 = 798720 numbers to look at. You
then choose a number between 0 and 798720 (or the crazy 10^26 number the
article gives as the possible number of actions at each frame) as your action
for that frame, and then you get another 798720 numbers to look at. After the
round is over (on average 20 minutes, if you make a decision every frame
that's 20x60x60 = 72000 rounds). You get one number telling you how well you
did. You repeat the process and get a new number. It's higher this time! But
what is the cause? was it that click you made on frame 22456? or maybe that
unlikely move you made on frame 4567?

Obviously I'm oversimplifying here, and the numbers are probably wrong. but I
still think what I've said gives the right idea for what kind of task we (as a
society/community/whatever) have somehow gotten a computer to solve. Computers
are DUMB, the fact that it's able to play this game at all, let alone at a
high level, is still a minor miracle to me.

~~~
timeattack
I think that downplaying the achievement is more like defensive mechanism
which exists to protect illusion of human brain superiority.

~~~
TaupeRanger
Not at all. We've known machines are better than humans at machine-like tasks
for hundreds of years. IMO responses like yours are defense mechanisms of AI
researchers/enthusiasts who want to protect the illusion that they are
creating things that do anything remotely resembling what the human brain
does.

~~~
drchewbacca
Kasparov comments on chess computers in an interview with Thierry Paunin on
pages 4-5 of issue 55 of Jeux & Stratégie (published in 1989):

‘Question: ... Two top grandmasters have gone down to chess computers:
Portisch against “Leonardo” and Larsen against “Deep Thought”. It is well
known that you have strong views on this subject. Will a computer be world
champion, one day ...?

Kasparov: Ridiculous! A machine will always remain a machine, that is to say a
tool to help the player work and prepare. Never shall I be beaten by a
machine! Never will a program be invented which surpasses human intelligence.
And when I say intelligence, I also mean intuition and imagination. Can you
see a machine writing a novel or poetry? Better still, can you imagine a
machine conducting this interview instead of you? With me replying to its
questions?’

~~~
TaupeRanger
The fact that Kasparov was obviously wrong about a computer's ability to solve
a concrete optimization problem better than him says nothing of value
whatsoever and essentially proves my original point. We already knew machines
were better than humans at these kinds of tasks, but people (like Kasparov)
who didn't understand what computers were capable of will make wrong
statements.

------
Nimitz14
The title seems to contradict the subtitle:

> Google AI beats top human players at strategy game StarCraft II

vs

> DeepMind’s AlphaStar beat all but the very best humans at the fast-paced
> sci-fi video game.

~~~
diggan
Yeah, subtitle seems much better. My impression before reading was that it did
beat the top players but reading the articles makes it clear it beats everyone
BUT the top players...

Also, this part seems a bit weird from the article:

> The AI wasn’t able to beat the best player in the world, as AIs have in
> chess and Go, but DeepMind considers its benchmark met, and says it has
> completed the StarCraft II challenge.

So they didn't manage to beat the best players but consider the challenge
complete anyways? I thought the goal was to build something that could things
better than humans.

~~~
Nimitz14
They're framing it positively to distract from the fact they can't do it, the
reality is they've spent millions of $ on compute and yet their agent is still
terrible at strategy (and tactics sometimes too), they probably decided it's
best to stop now before sinking even more money into it.

~~~
TaupeRanger
You're getting downvoted for tone probably, but I think you're right overall.
They simply can't "win" this game in the same way they "won" Go.

------
king-rat
Not being robust to strategies it hasn't seen before is a serious shortcoming
in a real time strategy game. That also indicates an interesting flaw in how
this model is trained - in the millions of games it plays against itself, how
do you ensure that it tries every viable (and some inviable) strategies? Sure,
it couldn't best Serral but I wonder how it would fare against Has, a player
known for some pretty off the wall builds.

~~~
jplayer01
> Not being robust to strategies it hasn't seen before is a serious
> shortcoming in a real time strategy game.

I'm not sure if you've actually played or followed competitive SC2. This is
absolutely _normal_. Players will pull something completely unexpected out of
a hat and win. The losing player will learn from it in future games. That's
just how it goes. Unexpected strategies are really _hard_ to counter when
you've never seen them before, and they're often employed to directly counter
what you're doing right then and there. So you've been countered and dealt a
devastating blow, which means figuring out how to come back from that can be
hard to impossible. It's a thoroughly human failing in every way. I'd be more
concerned if the AI wasn't able to learn to counter it in future matches.

~~~
Nimitz14
I don't think you understood his point. A player doesn't __need __to have seen
a strategy before to react correctly to it. Sometimes a player will pull
something completely unexpected out of a hat and lose, because the other guy
reacted correctly thanks to his game experience. If you watch any high GM
player stream half the time his reaction to what his (worse) opponent is doing
is "WTF is this?" as he then proceeds to crush it. Alphastar cannot do that.

~~~
jplayer01
I do understand his point. There are just as many examples of people being so
overwhelmed by the new strategy that they don't know how to respond to it and
lose. And then learn how to deal with it in later matches. Also, downvoting me
for disagreeing is a dick move.

~~~
TaupeRanger
And yet, in Go, the AI adapted to any "weird strategy" and won regardless. The
fact that AlphaStar can't is an obvious weakness.

~~~
empath75
I would say it’s a weakness but I think it’s shared by most players. A ‘weird’
go strategy isn’t going to involve new kinds of pieces on the board or someone
developing a win condition at a place on the board you can’t see.

You can lose a Starcraft game easily if someone is doing something novel and
you don’t happen to scout the right place on the board soon enough.

------
narrator
We need to create an open source organization to build these brains at home
because: "Recomputing the AlphaGo Zero weights will take about 1700 years on
commodity hardware."[1]

I see these huge AI brains creating a new class divide between people who have
access to these new AI brains and those who don't. The mission of open source
has always been to break down these barriers to empowerment with technology.
Thus, this is a great area for open source innovation.

[1] [https://github.com/leela-zero/leela-zero](https://github.com/leela-
zero/leela-zero)

~~~
sanxiyn
AlphaGo Zero was extremely computation inefficient because in (IMO misguided)
attempt to be "general", DeepMind avoided even slightly specializing training
to Go. KataGo optimized training to Go, slightly, and obtained 100x speedup,
and that was low hanging fruit.

~~~
narrator
1700 years sped up 100x is still 17 years though. Even so, for the models that
actually need it, Google and other people with huge GPU farms will have an
enormous advantage over a single individual and only a large community can
have any hope of challenging their ability to dominate.

------
yters
What would be interesting is to limit the AI processing speed to human
capacity, which is something like 60 bits per second.

In all these AI v. Human games I see, it is really apples to oranges because
the human consumes vastly less resources and compute cycles to perform at the
same level as the AI. And when I say 'vast' I mean Vast. There is like a
quintillion factor difference between the AI and the human.

There is no way the AI is even comparable to the human. To be comparable, we'd
have to parallelize the game and lock millions of humans onto the game full
time for millions of years.

At the end of the day, the AI is just a more sophisticated lookup table. There
is as yet no analogous AI to human play.

~~~
whichquestion
It would be interesting to see if there are any meta reports on people doing
research on the topic of resource constrained AI. Has anyone explored whether
or not an AI can improve on its own algorithm while being heavily resource
constrained?

Is the ability to use less resources a limit of our current hardware? Do we
see the current hardware performance improvement trajectory being able to
reduce the amount of resources an AI consumes to perform a task?

Are our algorithms simply not tuned well for using smaller resources? Could we
build better algorithms around resource deficient environments?

~~~
yters
Great questions. From what I can tell, AI is just as resource hungry as it
ever was, if not more. The only difference is now we have more resources to
throw at the traditional algorithms.

------
plopz
> placing within the top 0.15% of the region's 90,000 players

> 61 wins out of 90 games against high-ranking players

This doesn't seem to be quite as commanding as it was in Go. Do we know what
MMR it reached or if it consistently beat players like Serral?

~~~
CydeWeys
It seems like it's only a matter of time though.

StarCraft II also has a rock-paper-scissors nature to it though, so you
wouldn't expect even a perfect player to win 100% of the time. There are some
strategies that are hard-counters to other strategies, and because of the
imperfect information nature of the game, by the time you scout your opponent
and see what they're doing, it may be too late to shift and deal with it.

~~~
Erlich_Bachman
> by the time you scout your opponent and see what they're doing, it may be
> too late to shift and deal with it.

Unless you're Serral and your observers magically come out of nowhere and
cover every single pixel of your base.

~~~
Erlich_Bachman
Like this is happens when you try to surprise Serral
[https://www.youtube.com/watch?v=HTMIh9wzDjo](https://www.youtube.com/watch?v=HTMIh9wzDjo)

------
sjunlee
I won’t agree until they beat South Korean players at Starcraft I.

~~~
ajudson
They didn't even beat South Koreans at Starcraft 2 in the article

------
hartator
Instead of playing the APM rate, maybe they can slow down the game for the
human players? If you play at 0.25x or 0.5x, it will be harder for the AI to
out micro. Couple of micro strategies used by the AI, like staying at a 7
range when enemy can fire at 6 range is still a big reason why they are so
effective.

------
cjbprime
I hope they don't stop here without doing a world champion challenge. It looks
like they're still significantly below professional player strength.

~~~
sanxiyn
They said they will stop here.

------
gambler
From the paper:

 _> Humans play StarCraftthrough a screen that displays only part of the map
along with a high-level view of the entire map, to e.g. avoid information
overload. The agent interacts with the game through a similar camera-like
interface_

What exactly does that mean? Does it or does it not play by operating purely
on image data human players would see on the screen?

How much of the system's interaction with game's interface is learned as
opposed to hand-crafred and filtered through APIs?

It's amazing that most people here seem to think that system's ranking in a
computer game are more important than its ability to learn from and interact
with unstructured data.

~~~
ionforce
Everything about AlphaStar is typed and discrete. It has perfect inputs
because it uses an API (and does not read pixel data).

Human limitations that AlphaStar shares:

\- Data that requires the camera to see (e.g. enemy location, enemy HP)

\- Inability to examine/target cloaked units

Possibly unfair, super-human things AlphaStar has access to:

\- Instantaneous awareness of cloaked units

\- Knowledge of things humans need to infer/click (e.g. upgrades)

\- Global map awareness of unit positions (taking into account fog of war)

Definitely unfair:

\- Can select arbitrary collections of units, including outside of camera view

~~~
callmekit
Do you know if it can target a particular unit from a clump of air units? If
yes, then there is another "unfair" thing.

Also I wonder how their "camera-like interface" works with tactics like fly a
building above units to make them harder to target.

~~~
ionforce
Yeah. I think watching it split a line of Stalkers into two flanks that isn't
done with a rectangular selection is crazy unfair.

I think you're saying does it suffer from occlusion during selection. Based on
how APIs typically work, I would say no it does not. So yeah that's another
thing humans can't do. AlphaStar could hypothetically stack units into a
singular mass and it would be discretely untargetable by human players.

------
raldu
Link to all game replays: [https://deepmind.com/research/open-
source/alphastar-resource...](https://deepmind.com/research/open-
source/alphastar-resources)

------
jcampbell1
I know this is an endless discussion about fairness, but they handicapped it
to have the capability in terms of EAPM of the worlds best player at peak. The
result is that it could beat all but the top players. While the mechanics of
not using hot keys was strange, I never saw it do anything the elite players
couldn’t do.

It is worth watching the match between alpha star and Serral who is current
best player. Serral beats it like a walk in the park.

[https://m.youtube.com/watch?v=_BOp10v8kuM](https://m.youtube.com/watch?v=_BOp10v8kuM)

------
hervature
Having a hard time to parse what is the action space here.

The paper claims: AlphaStar’s action space is defined as a set of functions
with typed arguments

Looking at citation 7, it seems like they are structuring the action space as
(First pick high level action)->(Pick argument 1 for action)->...->(Pick
argument n for action). If this is the case, this seems to be "cheating"
calling this AI as humans have completely picked out the actions. That is, the
achievement here this: given what humans consider useful actions, AlphaStar
can play at a grandmaster level.

The achievement here is mostly engineering in my opinion. One that extends far
further than the 40ish people list on the paper. Probably an effort of over
1,000 people. From casually looking over the paper, there is nothing
significantly different than AlphaZero or previous art. Again, the achievement
here is listed under the infrastructure section of the paper.

In summary, this is a great step forward but now we need to start developing
techniques to learn these action space hierarchies instead of throwing more
power at increasingly difficult games.

~~~
sanxiyn
It is extremely different from AlphaZero... In fact, they heavily rely on
human knowledge, which is like opposite of AlphaZero. To quote the paper, "We
found our use of human data to be critical in achieving good performance with
reinforcement learning".

~~~
hervature
Ok, you’re right. I should’ve said AlphaGo. But that in itself shows what I
mean that this is almost a step backwards.

~~~
sanxiyn
AlphaZero was miraculously good, almost to the point of straining credibility.
AlphaGo and AlphaStar are more like normal advances. They are mostly
engineering, although theoretical contributions are not trivial. (Using
reinforcement learning for value network in case of AlphaGo, and multi-agent
self-play setup in case of AlphaStar, since straight self-play doesn't work.)

------
altmind
There's a great video highlighting irregularities in AlphaStar MMR
calculations and Blizzard matchmaking, notably, Alphastar, playing protoss, on
average played with people 700 MMR below and victories for much weaker enemies
gave unexpectedly huge boost to calculared MMR. When only matched with
>6000MMR players, alphastar final had 6 win and 15 losses.

There is a problem with MMR calculation used for alphastar. More specific,
there is matchmaking problem when alphastar did not get matched against equal-
MMR and most of his protos games were against much lower-MMR players, skewing
the data for MMR calculations.

The example given in video: you won 10 times against 5100, you lost against
7200. The calculated MMR would probably be around 6300. The problem is you
wont be matched like this in real game. When matched mostly against the weaker
enemies, MMR calculations have a lot of uncertainty.

The video, unfortunately in russian
[https://youtu.be/mpAUufSzaUo?t=1323](https://youtu.be/mpAUufSzaUo?t=1323)

------
ogig
So they added exploiter agents showing the AI what strats are on the meta.
When the meta changes, will it still perform? Maybe im reading it wrong but
seems much less impressive with the exploiter agents in the mix.

~~~
aketchum
My understanding is that the exploiter agents are not based on the meta. My
take is that the exploiter agents have a different goal than the main agent.
While the main agent is trying to develop a strategy to win against as many
opponents as possible, the exploiters are focused on finding new ways to beat
just the main agent. It seems similar to the inspiration behind GANs

~~~
ogig
I think you are right, from the paper:

>During training we used three agent types that differ only in the
distribution of opponents they train against, when they are snapshotted to
create a new player, and the probability of resetting to the supervised
parameters.

So exploiter agents aren't fed a specific strat, instead they discover the
weak spots in the same way as the main agent tries to win. The GANs similarity
is there.

------
readparryrepost
Love the work done into this, SC2 is such a 'human' game I'm impressed that it
can be so successful but I'm wondering whether the model takes the raw
graphical information or whether it takes some kind of full map
representation. Some kind of machine representation might feel like cheating
just a little as it's not quite competing on the same playing field as a human
player.

~~~
sanxiyn
It does not use raw graphical information, information is machine readable.

------
Razengan
1\. Create a realistic military combat game with AI that learns from thousands
of players and builds on that knowledge across multiple games/matches.

2\. Load that AI into real-world Terminators.

~~~
uname_a
Yes, this was exactly my reaction: this work seems to be depressingly
applicable to real-world warmaking, yet we are only focused on the technical
successes, and not on the systemic dangers that we are heedlessly creating.

Assuming that everyone looking at this work is a friend who will not turn it
to military advantage is extraordinarily naive on our part.

------
jdpigeon
I'm being lazy and don't want to dig, but does anyone know which of the
AlphaStar networks (Protoss, Terran, Zerg) is best at beating the others?

~~~
sanxiyn
They didn't publish cross-race self-play statistics, but AlphaStar's MMR is
6275 for Protoss, 6048 for Terran, and 5835 for Zerg. So it is best at
Protoss, and then Terran and Zerg, in that order.

------
angel_j
I don't think folks realize how much of a game changer AlphaStar is. If it can
do the same "job" those high level players, it can probably do any job that
doesn't require lifting. It could be playing all the traffic lights in a city.
It could manage a fleet of factory drones. It could coach humans engaged in
sport or business. It could probably figure out spreadsheets. Drill down, and
perhaps it can play with chemical synthesis and quantum theories, or drive a
truck.

~~~
empath75
Or control a fleet of armed drones, more likely. I’d be surprised if darpa,
etc aren’t investigating applications like that right now.

~~~
83457
Isn't it likely they have been doing that for decades?

------
Trav5
What do you guys think about having the ai communicate to a person and having
them execute the moves?

~~~
gpm
Having played starcraft - the idea doesn't work. Starcraft happens to quickly,
with too much precision, and too low a tolerance for latency to wait for a
human to absorb understand and execute instructions.

Having programmed simple NNs the idea doesn't work. The amount of time you'd
have to spend having humans execute instructions during training would be
astronomical. They were training 16000 games simultaneously, for 44 days, most
likely running at some multiple of how fast the game normally runs. Moreover
you lose out on supervised learning, because we don't data sets of "the human
told the other human to do this", we only have data sets (almost a million
games large) of "the human did this".

------
Madmallard
Time to do League of Legends next. I wonder how they would model teammates
behavior?

------
Mikeb85
As impressive as this is, APM matters a lot in an RTS such as Starcraft and
computers have a massive advantage here. I'd like to see them tackle an RTS
like Civ where APM doesn't matter whatsoever, only decision making.

~~~
nullbyte
Interestingly, AlphaStar has very low APM compared to human players. This is
because humans do a lot of insignificant actions when button mashing, whereas
AlphaStar is extremely precise with its moves.

That's why when you compare APM graphs between Alpha and a human player, the
human almost always ranks higher in actions-per-minute.

~~~
Mikeb85
It's still a high APM game. Reducing redundant button presses still doesn't
change the fact that APM is very significant and that being able to process
information and act quickly is more important than pure strategy.

~~~
nullbyte
That's the difference between APM vs EPM in Starcraft.

------
tossAfterUsing
yeah, but can it should political messages on a livecast?

------
hamilyon2
Replays are in replays.zip in supplementary data

~~~
dpcx
Is there a link to that? A video of this would be interesting to watch.

~~~
casefields
[https://deepmind.com/research/open-source/alphastar-
resource...](https://deepmind.com/research/open-source/alphastar-resources)

------
nullbyte
But can it beat Serral?

~~~
vkou
Not at the moment.

------
imvetri
Why they wouldn't solve more impactful problems rather than playing games?

~~~
linuxftw
Because that's all they're capable of doing with it. Games can be easily
repeated, have clear outcomes. There's no need for nuanced thinking or actual
problem-solving. Just approximate the current scenario against previously
successful strategies and pursue that strategy.

A real challenge would be to invent a new game, have both a human and an "AI"
read the rules for the first time, and then compete in their first game
together. Would the "AI" be able to formulate a winning strategy without
historical data? If not, then it's not "AI."

~~~
sinuhe69
There is no deep learning without massive data. What do you expect? :D

~~~
linuxftw
I expect less hype and more substance. I don't think computers beating people
at video games (or really, any games) is particularly novel at this point, nor
do I think it qualifies as "AI".

------
cryptofits
I wonder which game is more complex, start-craft or DOTA

~~~
tim58
The complexity of DOTA comes, in large part, to teamwork.

------
CzarnyZiutek
calculator beats top human at multiplication (1957)

AI is just a f.... calculator.

~~~
Sohcahtoa82
I learned how to use a calculator in Elementary school, that doesn't mean I
knew how to solve calculus problems.

AI needs to know what to calculate. According to the article, at any step in
the game, there are 10^26 possible actions to take. Good luck calculating
which ones to try via brute force.

------
pmoriarty
Starcraft is a relatively simple game (which is one of the main reasons for
its enormous popularity), so seeing an AI do well at it is not particularly
impressive.

I'd have been far more impressed had an AI beaten some of the best wargamers
at a complex wargame, or had it beaten some of the best text-adventure game
players at novel text adventure games neither of them had played before.

The former would be difficult because of the enormous search space that
results from the large variety of units, possible actions, and board size in
complex wargames, and the latter is from the creativity required to do well at
unfamiliar text-adventure games (which are hugely varied and can't be boiled
down to any simple set of rules).[1]

Text adventure games would present a further difficulty in that not only do
they not have very many players compared to games like chess, go, or
Starcraft, but their players aren't normally ranked relative to one another,
and there's no enormous record of human-played games to draw on. Further, in
text adventures sometimes there are clear winning conditions and sometimes
not, and sometimes clear signs of progress and sometimes not. It's also hard
to objectively rate a text adventure game's difficulty. Without these features
researchers would have a much harder time training an AI to do well against
humans.

Another game which I briefly thought about nominating as a challenge for AI's
was Factorio, due to its enormous complexity. However, when I thought about it
for a little while I realized that this was a game that computers would find
it easy to beat humans because they'd do a much better and faster job at
number crunching and optimizing. Factorio is really more of a game where a
human almost tries to emulate a computer, much like programmers do when
they're mentally evaluating or stepping through an algorithm or trying to
optimize something.

Of course, seeing an AI beat humans at any of these games would still make the
front page of HN, but most gamers (nevermind most non-gamers) who would
probably just say "What?" ... and that's probably yet another reason for the
reasearchers choosing Starcraft. It's relatively easy, low-hanging publicity
fruit, where they could claim success and a relatively large number of people
would kind of understand why it was an accomplishment.

[1] - Yes, I'm aware of the recent HN post about fuzzing done on the
Z-machines, but by the author's own admission that fuzzer cheated by having
access to the game's word dictionary, and it also cheated by having access to
the game's internal state -- neither of which a human player would have access
to. Also, many Z-machine games are ancient and relatively simple and
straightforward compared to more modern games.

~~~
Erlich_Bachman
> Starcraft is a relatively simple game (which is one of the main reasons for
> its enormous popularity)

In the most practicle uses of the word "simple", especially in this context
(number of possible actions and search space and whatnot) your statement is
just incorrect.

------
ryanmercer
Am I the only one that doesn't care about _software_ beating humans at video
games?

I'm sure there's some sort of useful learning being done here by the people
that created the _software_ that might some day help them create _software_
that can better predict what is needed for a specific application but to me it
just feels like entities like OpenAI and Google researching this _predictive
software_ are just wasting obscene amounts of money.

How about train the stuff on better typed OCR? Better handwriting OCR? This
will have actual commercial application.

Why not make something for grading papers. English papers, math homework, etc?
Start at a first grade level and as you train the _software_ up move on to
higher levels of education. My fiance is a high school teacher, she currently
only teaches math but previously has done math and English, she sits there
grading papers off the clock while watching television in the evenings and on
the weekends... MANY high school teachers are in this situation, think of how
much free time could be reclaimed by training this instead of teaching
software how to beat humans in video games!!!

This would even help teachers have more time during school hours to help
struggling students, if you aren't trying to grade papers in class while
students are doing work you free up time that you could actively be assisting
one or more students. Instead, these "AI" researchers keep training software
to be the best at video games... _facepalm_

Then take that and apply it to something like my job. I clear international
freight through customs for a living, I look at paperwork all day and then
have to determine what tariff number I should use for something (cell phone
8517120050, silver ring over 1.50 USD 7113115000) and classify every single
line on an invoice by using familiarity with the tariff
schedule/description(s) on the invoice/any other supporting
documentation/customer profiles for customers that have paid to have product
databases on file with us. My employer, we do thousands of these things a day
and have to keep everything for several years (5 IIRC) in the evening CBP or
any other OGA wants to see it during that time period.

So take that huge, pre-existing, data set like that, identify which shipments
were classified correctly and which were not, and then let the _software_ have
a stab at trying to do it with that better OCR you created.

Then, you know, actually get rid of (or drastically reduce) soul crushing,
mind-numbing, highly repetitive digital paperwork jobs like mine.

------
mc32
How soon is this put into the hands of battlefield commanders? Not presuming
it’s a bad thing, but I wonder when we’ll hear the outcome of a battle was
aided by this kind of intelligence.

~~~
dmichulke
I assume obtaining a decent-sized training dataset is gonna be a bloody
business.

~~~
xrisk
Isn't this trained via playing against itself? I don't think there's a dataset
involved.

~~~
Voloskaya
AlphaStar is first trained in a supervised way using the publicly available
replays from SCII matches.

Even if you just consider self-play, I don't see how you would create a
simulation that is realistic enough so that you could generalize afterwards to
reality. RL today has an extremely hard time dealing with different
distributions.

You can see the amount of work OpenAI [1] had to do to go from a virtual
hand+rubik's cube to real ones, even though you can make a very accurate
simulation much more easily.

[1] [https://openai.com/blog/solving-rubiks-
cube/](https://openai.com/blog/solving-rubiks-cube/)

~~~
xrisk
Ah, ok. I've read about OpenAI Five (it plays Dota 2); and as far as I
understand it's fully trained with self-play.

