Hacker News new | past | comments | ask | show | jobs | submit login
Grandmaster level in StarCraft II using multi-agent reinforcement learning (deepmind.com)
355 points by hongzi on Oct 31, 2019 | hide | past | favorite | 311 comments

Honestly, the article needs to be replaced with https://deepmind.com/blog/article/AlphaStar-Grandmaster-leve... which actually goes into some technical detail. Nature.com's article is purely for laypersons and, imo, not particularly useful for HN's crowd because of how little insight it gives.

It also provides the paper and an archive of all of the AI's matches for anybody who wants to take a closer look. These can be viewed with the free version of SC2 (afaik).

Further links:

https://doi.org/10.1038/s41586-019-1724-z (supplementary data available here in json form)

https://rdcu.be/bVI7G (public paper)

https://deepmind.com/research/open-source/alphastar-resource... (replays)

https://www.youtube.com/playlist?list=PLtFBLTxDxWOSrWZ8krQt6... (list of older AlphaStar matches cast by an SC2 player)

https://www.youtube.com/watch?v=l82wBa3UoZU (one of the newer matches cast by another player)

https://old.reddit.com/r/starcraft/comments/dpaunw/deepminds... (win/loss rates across all the played matches by race, includes apm)

This comment was originally posted on https://news.ycombinator.com/item?id=21408024, but we've merged it into the earlier submission, which used the link you mentioned.

Neat, thanks.


For those who wonder, the Nature article is here: https://www.nature.com/articles/d41586-019-03298-6

It's about the original source, per the guidelines.

This is a really interesting one to digest. As with previous announcements about AlphaStar, much of the feedback (here and elsewhere) is about the fundamental challenge of assessing human vs. machine in an RTS. These points are very valid - stepping back however, this still feels like a pretty incredible accomplishment.

I'm a gold league SC2 player, so maybe in the 30th-50th percentile. Three years ago, when DeepMind started this project (and after nearly two decades of research into SC/SC2) I could probably have beaten the best non-cheating AI. Now, after 3 years, this AI is playing at a Grandmaster level, under at least a reasonable approach to fairness. By comparison, according to the AlphaGo paper [1] the best Go AIs prior to AlphaGo were playing at a 6 Dan level, which looks to be somewhere in the 90-98th percentile [2].

The speed at which AlphaStar overtook previous AIs seems to me to be nearly unprecedented in AI research. This is like if the world's best chess AI had gone from losing high school tournaments to being competitive with Kasparov in less than 3 years. Valid criticisms aside, this feels like an incredible achievement.

[1] https://www.nature.com/articles/nature16961.pdf [2] https://senseis.xmp.net/?KGSRankHistogram

>much of the feedback (here and elsewhere) is about the fundamental challenge of assessing human vs. machine in an RTS

It's amazing that most people here don't understand that AI performance in any one computer game relative to humans is largely irrelevant. A system that can play many games at a mediocre level, but does it without any hand-holding, clever APIs or architecture adaptation is infinitely more impressive than a system that can beat everyone in a specific game with all those things applied.

Remember, most humans are completely mediocre at Chess, Go or StarCraft.

Most of the approaches used are re-usable, which is a big part of why we can develop new AIs for games faster than ever. You can take the algorithm(s) that was used in one game and use it in another.

Yes, a human is still needed to decide which approach to use, but we are slowly approaching a world where building an AI becomes easier and faster. It will become absolutely irrelevant that a single AI can not play all games, because whenever a new game comes out, someone will be able to build a superhuman AI on it within a week/month.

The brains of an AI are also transferrable. Built a superhuman AI? Send it to a friend! Compare it to a human, who would need to spend enormous amounts of time to transfer their game-brain to someone else. If you want, you can bundle all those AIs into one and pretend it can play any game.

"It's amazing that most people here don't understand that human performance in all computer games relative to AIs is largely irrelevant"

>> The brains of an AI are also transferrable. Built a superhuman AI? Send it to a friend!

So far I haven't seen Google sending their AIs to a friend.

What do you mean? Basically all Neural-network models (from Google and others) are freely available. And to top that you can also get pre-trained weights for the models for free.



I mean the Alpha* family- AlphaGo, AlphaStar etc. I don't think they're sharing their analytics models either, or their DMT models, etc.

"All" neural network models is a stretch. Some researchers, including some that work in the industry, do releast their models. The majority don't.

With transfer learning, you can download some of the worlds most advanced pre-trained NLP models and adapt them to a specific data set.


They are putting ML inference on the Pixel phones.

> The speed at which AlphaStar overtook previous AIs seems to me to be nearly unprecedented in AI research.

Is it not simply the case that, before AlphaStar, very little money and effort was being put into developing AIs for Starcraft 2?

There's definitely some truth to that. But looking back on arXiv there are papers going back years, including from Harvard/UofT [1], Tencent [2], Facebook [3], etc. There has also been an SC1 tournament running since at least 2011 [4]

I'm sure that the AlphaStar effort absolutely dwarfed everything that came before in terms of investment (as I'm sure AlphaGo did for Go as well) but based on my reading I'd also bet that Starcraft has likely seen the most continued AI research effort of any imperfect information, real-time game over the last decade - I think that's also part of what made it appealing for DeepMind over Dota 2 or any of the other options in this class of game.

[1] https://arxiv.org/pdf/1909.02682.pdf [2] https://arxiv.org/pdf/1907.09467.pdf [3] https://arxiv.org/pdf/1906.12266.pdf [4] https://liquipedia.net/starcraft/SSCAIT

What makes this amazing isn't specific to StarCraft 2. AI in strategy games has been really lackluster. I can't think of a single example of a strategy game where an AI was competitive against experienced players due to strategy and tactics, rather than inhuman speed, accuracy or cheating. So it's not just about AI in StarCraft 2, but rather AI in essentially any (strategy) game. Now we have an example of an AI that can compete with real players.

There is a genuine advance here, but keep in mind that when an AI is developed by the game developers, they're not necessarily playing to win, but to make the AI fun to beat, and without using too much computer power, which would make the game slower. Also, due to commercial pressures it's tough to put a lot of effort into the AI for a game that's still being changed to make it more fun.

Even now, I wouldn't really expect to see most game developers start using the latest machine learning techniques to build their own AI's. Maybe the really successful games with competitive leagues might be interested?

Hi Brian,

I really hope this will change, though. Skilled AIs could make so many games more interesting and better.

Many online games really suffer from bad multiplayer experiences. Players that drop in the middle of a game, ...

If Overwatch, Destiny, ... had really good AIs, that would make the game more attractive. If a player drops, they could replace it with similarly skilled AI. Instead of waiting for 12 players of the same skill-level, they could just play with some AIs.

They could even make it such, that humans win 70% of the time, and let the AIs just lose more often.

Yep, I'd like to see it too. If one popular game really did single-player with AI opponents well, maybe it would start a trend?

AI development probably needs to be made easier, somehow. Game developers have a lot to do.

>There is a genuine advance here, but keep in mind that when an AI is developed by the game developers, they're not necessarily playing to win, but to make the AI fun to beat, and without using too much computer power, which would make the game slower.

This point is being brought up a lot, but I don't really buy it. Yes, there have been instances where the AI being too good discouraged players from playing the game as much, but this almost always happens due to the AI having some inhuman advantage that a person could not really replicate. I think players do want the AI to pose a challenge - that's what a lot of casual PvP games are about. Humans are (were?) the only ones that could offer a fair match against another human.

I will absolutely grant you the computational power point though. However, as hardware advances, this cost will become more and more acceptable.

I don't expect to see widedspread adoption of this in commercial games within a decade, but I think that in 2 or 3 decades this will be the norm. In fact, I would bet that once we can make an AI that is good at a game, we can also make it weaker. A game could estimate a player's skill rating silently and then adjust the AI's strength to give the player a difficult/fun time.

Of course, it could just be that AlphaStar is able to play well against humans, because players treat it like a human. Maybe the AI still has standard game AI-like weaknesses that can be exploited if people know that they're playing against an AI. Eg some Diamond league player went mass ravens and kicked AlphaStar's ass. The AI would have to learn how to deal with stuff like this on the fly and I don't think we're there yet.

I think part of the problem is that every videogame is a parlor trick in a sense. Unless you are a tournament player or someone who enjoys a really hard challenge, the bulk of the customers who buy the game just want to enjoy the illusion of a challenge. You don't really need a complex AI directing the hordes of enemies you want to defeat; you just want the illusion of battling enemies and emerging the victor, and how the game accomplishes this is not that relevant... unless the trick is so blatant that it breaks the suspension of disbelief, of course.

Why does this matter? Because the trick is often less effort than the real deal. The point of videogames is to entertain you, and the goal of game publishers is to turn a profit while doing so. Obviously there are some exceptions, and indie/free games might have different goals (we all know indie games that can be punishing, like Dwarf Fortress).

I remember with the original Left 4 Dead by Valve, they claimed there was an "AI Director" that modified the map according to how well you were doing, "adapting" to how you were playing. In practice it amounted to closing some sections of the map according to how easy you were killing zombies without dying. So this wasn't really AI but a glorified IF-THEN-ELSE trick, but what does it matter? It served its purpose. Do we really need something more complex (and expensive!) when killing zombies in L4D?

While there are certainly plenty of parlor tricks and other illusions, I think in the best case, a game can contain a finely tuned learning experience, where by training you actually do get better at something. (Often something unimportant, but still fun.)

Throwing people into the deep end of the pool usually doesn't work so well. Scaling down the difficulty helps, but really what you want is challenges that teach you something. And you don't really need AI for that.

Last year I started learning to play accordion out of a method book. It has a bunch of songs ordered from easy to hard, and they're also chosen to teach certain skills. It reminded me quite a bit of good level design.

I agree with everything you said :)

To extend your point, it is also clear from analytics in Steam & mobile gaming that games have massive amounts of player dropout. It is pretty rare, even in relatively short single player campaigns, to see 30% of people who bought the game finish it. So effort making the AI even better is always going to only be for an extremely small niche of players that play a game long enough to exhaust the limits of simple AI.

Just look at the achievements for Left for Dead. Only 40% of all players have got the achievement for "kill one of each Uncommon type" which is trivial to get if you play for a few hours.

Steam numbers are massively inflated by the tons of game purchases that come through sales, bundles, and giveaways and are never played. I have something like 600 games in my Steam library, and I would bet that the median playtime would be 0.0 hours...

Just today I saw something that looked interesting, and it was 85% off because of a Halloween sale. So I grabbed it for a couple bucks. Free time being such a rare commodity, there's a solid chance I never even install it... and then there's always Stellaris or Mount & Blade or Medieval Total War that I could happily play until the heat death of the universe, competing for my attention.

Isn't that just proving my point?

Most people who buy your game aren't going to invest even 10 hours in it or come close to maxing out the AI because they only picked it up because it was 85% off during a Steam sale.

Maybe you'll be able to figure out from Early Access numbers how popular your game is and whether you need to invest in super-awesome AI. But it seems like for most games it would be a waste of effort. More likely it would be retrofitted into an already popular game in a DLC or future patch or something.

>Maybe you'll be able to figure out from Early Access numbers how popular your game is and whether you need to invest in super-awesome AI. But it seems like for most games it would be a waste of effort.

Maybe the reason people quit your game is because of the AI. What made me stop playing Heroes of Might and Magic 6, Civilization 6, and Total War Three Kingdoms is the poor AI. It takes the fun out of the game, because at lower difficulties the AI puts up no challenge and at higher difficulties you're just exploiting the AI the same way over and over again to keep up.

You might think that that's fine, because you already got the money, but then you'd be looking at it from the perspective of singular games. But studios usually don't just release one game and then disappear. After my experience with HoMM 6, I had zero interest in HoMM 7. 2K Games could release Civilization 7 tomorrow, but without them showing that the game's AI is much better than in civ6, I probably wouldn't pick it up. Yes, the Civilization series will still have players, but it could easily just fizzle out like many other franchises and even genres.

You also can't add this in as a DLC, because people make up their mind about a game near the start. This only works if the game already has longevity, but at that point the game is already popular, so does a better AI at that point make a difference?

It doesn't matter how short or long the singleplayer campaign is. AI in games is so simple and rudimentary that you'll have it figured out in most games in less than an hour. You might not figure out the most effective ways to exploit it, but usually you'll figure out something. This alone can make the experience boring, especially when your game is heavily focused on playing against the AI.

>You don't really need a complex AI directing the hordes of enemies you want to defeat; you just want the illusion of battling enemies and emerging the victor, and how the game accomplishes this is not that relevant...

Yes, and the current game AIs are completely and utterly incompetent at it. AI in video games only provides a challenge when the mechanics of the game are challenging or the AI is cheating.

>Do we really need something more complex (and expensive!) when killing zombies in L4D?

Not in L4D, because you don't play the game to kill zombies. You play the game to play with other people. If L4D didn't have coop then the game would likely not have gained popularity.

You're right that players want a challenge that they can overcome, but current AI in video games does not offer that. The AI always ends up being so weak that play against the AI cannot be the focus of the game. The focus has to be in mechanical play, puzzles, story, atmosphere, multiplayer or something else. The only way AI keeps up is with inhuman mechanical play or cheating. Neither feels fun to play against, because the counter to it is to figure out how to exploit the AI, after which the game becomes more or less trivial.

> Yes, and the current game AIs are completely and utterly incompetent at it. AI in video games only provides a challenge when the mechanics of the game are challenging or the AI is cheating.

I wouldn't say "incompetent", but yes, the mechanics of the game must be challenging and/or (often) the AI must cheat. I'm saying this is enough most of the time; there's really no incentive to build a better AI for videogames because (my hypothesis) most gamers don't care. The illusion of challenge is enough, most of the time. The goal of videogames is to entertain, not to really challenge (again, with the exception of tournaments and competitive gamers, who are a niche within a niche).

You mentioned Heroes of Might & Magic in another comment. I played the heck out of HoMM2, some of (I think) HoMM3 and got bored with the rest. But I wasn't bored with the AI; I got bored because it was the same formula again and again. The genre itself soon felt like a tired formula. Note that the "AI" -- cheating or not -- of HoMM felt horribly difficult to me. I lost battles more often than I won; the only trick that reliably worked was to start the fight with a lot more troops than the computer enemy, and that's not much of a trick! Not a lot of places to learn effective tactics back then either, maybe the hellpit that was GameFAQs?

Well, yes, you need to ramp up the difficulty in a way that's fun.

The article describes "exploiter agents that focus on helping the main agent grow stronger" as key to their approach. That seems promising? Maybe it could be used to make bosses with specific strengths and weaknesses where part of the game is figuring out how to beat them.

Is this AI better due to strategy and tactics? I am a low-level player but a fairly consistent spectacle, and in the videos I have seen I didn’t spot anything that looks like superior strategy or tactics. Just insanely good mechanics (which is still awesome!)

If you watch the alphastar games inhuman speed and accuracy are a big part Of why it is so successful

I am not sure what you mean by 'strategy' here, don't Chess, Go, and No-Limit Hold'em fall under those categories?

They don't really count as the computer game genre "strategy games". Computer strategy games tend to have many orders of magnitude more complex rules and state than Chess, Go or poker, so the game-play is very different and making an AI for them is also very different.

The video game examples they give are what's called Real Time Strategy which is quite different from the games you described in which players explicitly take turns.

Chess and Go are already quite different from NLH, because they are purely about strategy. The best strategy in Chess either wins or leads to a draw if played by both sides every game. In NLH an optimal strategy just breaks even (ignoring the rake) against other optimal players and makes money on average against anybody else. But over even a few hundred hands you can't tell.

That depends on what you consider 'very little'. SC has had annual computer tournaments and research was published on it routinely. Few computer games see nearly as much money and effort put into developing AIs for them, yet the curve of progress was not exactly impressive.

Well, I don’t have any idea of absolute amounts of money, but I’d be more interested in relative amounts of money before and after DeepMind.

Starcraft is an incredibly complex game. 10^26 possible moves at any point (you can click/drag anywhere on the screen, pressing a keyboard button as you do so), imperfect information, real-time constraints, etc.

That pales in comparison to starcraft at a higher resolution, which has even more points to click on screen! And even that is nothing compared to playing horseshoes in the physical world. There are an infinite number of moves at any given moment! And an infinite number of moments! Horseshoes is clearly the most complex game of all.

Plank would like to say something to you about lengths.

Which is why the most complex game of all is actually intergalactic horseshoes.

Starcraft is an incredibly complex game.

Even basic strategies will win if they’re done faster. APM (actions per minute) is a very significant factor into who is winning. Apparently they limited their AI player to 264 APM but that’s still incredibly high and done with machine level consistency. That’s almost 4.5 actions per second!! I know there are human level players at and probably above that level but that really allows for basic strategies to win out.

> Even basic strategies will win if they’re done faster.

This isn't really true. Basic strategies done faster still lose miserably to humans, we can see this from the long history of SC Broodwar AI tournaments where they have a human play the best AI at the end (and always win).

Faster helps, better strategy helps more, maybe we can't say the AI is doing as well as the top humans at strategy since it's faster, but we can say that it's doing pretty good.

AI that ships with games (even the hardest difficulty) are made to be beatable. Of course pros beat them.

You misunderstand me, I'm not talking about the AI that shipped with broodwar. I'm talking about third party AIs made via bwapi competing against eachother in the various starcraft AI tournaments, such as

    AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE)
    IEEE Conference on Computational Intelligence and Games (CIG)
    Student StarCraft AI (SSCAI) Tournament
    BWAPI Bots Ladder

Ah, thank you for the clarification, I did indeed misunderstand.

And the bot also can parse the entire screen in .03 seconds and then jump to a new area of the map. No human can monitor the entire map like the bot can.

If they want to model the constraints of humans, they probably need to create some “attention” system where the AI needs to choose where to invest its finite attention resources. For example, the AI could choose to focus more on the minimap, which would give them higher reaction times to move around the map, but would reduce their reaction times for things on the main screen.

A great example would be seeing the faint image of cloaked units. How does that work with the AI? Can the just phrase the current screen and instantly see any cloaked unit? I can imagine an attention system where more attention dedicated to part of the screen would increase the probability of noticing a cloaked unit there.

Sometimes these critiques seem to be complaining that the ai is too intelligent. That’s sort of the point isn’t it.

But when the machines start competing with humans it should be fair?

>264 APM

Actually 792 APM. They limited their AI to 264 "things" (choose unit+ability+target or change view) per minute. Each "thing" can be counted as between 0-3 actions by starcraft. So it really has a peak APM limit of 792.

From the article:

>Agents were capped at a max of 22 agent actions per 5 seconds, where one agent action corresponds to a selection, an ability and a target unit or point, which counts as up to 3 actions towards the in-game APM counter. Moving the camera also counts as an agent action, despite not being counted towards APM.

But from replays it has an average APM of 248 at the highest, so nothing crazy going on. Plenty of pros have peak APMs during fights that are higher than 800.

264 is not that high. People (if Serral is not a cyborg) can achieve even 1000 APM; and ~300 EPM average for a game.

Those aren't meaningful actions, though, are they? That's 17 actions a second.

Yes and no. There are many repeat actions in SC but at peak they can definitely reach that high. Microing units while simultaneously performing larger macro actions at the same time is a staple of any decent player and is really obvious at the higher levels. I personally play on an average of 100-120 APM for the game but regularly see 200+ in Platinum/Diamond.

Yes it is, but you can prune the decision space significantly by discarding bogus choices, or more concretely by modelling higher level goals and then compiling them into actual moves.

> This is like if the world's best chess AI had gone from losing high school tournaments to being competitive with Kasparov in less than 3 years.

I don't think it's like that at all. On the high level, there is no "chess AI", "go AI", "image classification AI" and "dexterous manipulation AI". These are all sides of the same coin, that gets significantly better every year. Adding support for the new game or new "environment" to existing deep learning based backbone still requires a bit of engineering work and a few creative tricks to unlock the best possible performance, but the underlying fundamentals are already there and are getting better and better understood.

There is a reason why the progress in AI is so hard to measure. Anytime a next task is solved, there is a crowd saying it's not a "real AI" and that scientists are solving "toy problems". Both statements are totally true. But the underlying substance is that each of these toy problems is of increasing complexity and brings us closer and closer to solving the "real problems", which are mostly so undeniably complex that we couldn't attack them upfront. Still, the speed of progress in the field of AI research is staggering and it's hard to keep up with it even for professional researchers who spend all their waking hours working on these things.

6 years ago we were able to solve some Atari games from pixels. Today, that feels like a trivial exercise compared to modern techniques. With billions of dollars of investment pouring in and steady supply of fresh talent, it is very hard to predict what the pace of research will be in the coming years. It is entirely possible we'll encounter a wall we won't be able to overcome for a very long time. It is also possible that we won't, and in that case we're in for a very interesting next few decades.

> On the high level, there is no "chess AI", "go AI", "image classification AI" and "dexterous manipulation AI". These are all sides of the same coin, that gets significantly better every year.

On a practical level, this is not true. There are different algorithms, different architectures, different hyperparameters required for each of these problems, and often for each subdomain within each of these problems, and often for each specific instance of these problems. It's difficult to draw any kind of holistic picture that combines all of the individual advances in each of these problem instances; that's why progress in AI is so hard to measure, and why a statement like "each of these toy problems...brings us closer and closer to solving the 'real problems'" is probably a bit too coarse-grained to be fair as well.

Are you writing this from last century?

Deepmind's best-in-class chess and Go AIs are the same code (AlphaZero) just given respectively rules and game state input for either chess or Go and then allowed to train on the target game.

One of the fun works in progress in this space is teaching AIs to play a suite of 80s video games. Getting quite good at several games where the idea is to go right and not die is pretty easy these days, but Deepmind's work can do a broader variety only coming badly unstuck on games where it's hard to discern your progress at all without some meta-knowledge.

I don't mean to imply AlphaZero is not impressive; it surely is. Nor do I mean to imply that any of these advances aren't impressive. I do mean to imply that "closed-world games with well-defined rules" is a relatively small subdomain of problems. And that BERT looks pretty different from AlphaZero.

The post you disputed pointed out that there aren't separate AIs needed for things like Go or Chess. Because there aren't (any more) the Deepmind work showed that you can just generalize to learn all games in this class the same way.

You claimed that "different architectures" are needed. Not true. And further you claimed this is true even for "each subdomain". This would have been a fair point in 1989. Traditional chess AIs approach the opening very differently for example, relying on fixed "books" of known good openings. But AlphaZero since it is a generalist doesn't do this, it plays every part of a match the same way.

Now you've gone from asserting that Chess and Go need separate AIs to claiming that since BERT and AlphaZero are different software it makes your point. Humans pretty clearly don't have a single structure that's doing all the work in both playing Go (AlphaZero) and understanding English (BERT) either - so that's a pretty bold bit of goalpost moving.

First of all, nice comment! That said,

> Anytime a next task is solved, there is a crowd saying it's not a "real AI" and that scientists are solving "toy problems". Both statements are totally true. But the underlying substance is that each of these toy problems is of increasing complexity and brings us closer and closer to solving the "real problems"

I wonder if this is true. This belief may seem like common sense, but it's not obvious to me that domain-specific problems must generalize to General AI ("real problems") or even bring us closer to it. That is, it's not evidently true that many small problems will eventually lead to a general solver of everything (or to human-like intelligence). Or to say it in yet another way, it's not obvious to me that human-like intelligence is the sum of many small-problem-intelligences.

Again, common sense may lead us to believe this, and maybe it's true! But I think this conclusion is far from scientifically evident.

The key thing you're missing is transfer learning. Instead of starting from scratch, you start with a model that was trained to do something and then train it to do something else. It takes much less time and labeled data to get the model to do something similar.

You can even interleave the training for the second task with a few training rounds for the first task to maintain proficiency. There's a group that's using this sorry if technique to make a general "plays videogames" AI. I couldn't find a good link from my phone, but here's a less good link about something similar: https://towardsdatascience.com/everything-you-need-to-know-a...

>> On the high level, there is no "chess AI", "go AI", "image classification AI" and "dexterous manipulation AI".

As another poster said these are all tasks performed by different systems. For chess and Go AI it's Deep Reinforcement Learning with Monte Carlo Tree Search. For image recognition it's Convolutional Neural Networks. Importantly, these systems are very task-specific. You won't find anyone trying to beat humans at games using CNNs, for example, or using Deep-RL to do text recognition. Far from "a few creative tricks" these are systems that are fundamentally different and are not known to generalise outside their very limited domains. They're one-trick ponies.

The OpenAI paper on "dexterous manipulation" reported learning to manipulate one cube, the same cube, always, after spending a considerable amount of resources on the task. It was a disappointing result that really shouldn't be groupwed with CNNs and Deep-RL for game playing. The level of achievement does not compare well.

>> Anytime a next task is solved, there is a crowd saying it's not a "real AI" and that scientists are solving "toy problems".

This used to be the case a decade or more ago. In the last few years the opposite is true. The press is certainly very eager to report every big success of "AI"- by which of course is meant deep learning.

>> 6 years ago we were able to solve some Atari games from pixels. Today, that feels like a trivial exercise compared to modern techniques

6 years ago DeepMind showed superhuman performance in seven Atari games with Deep-RL (DeepQN in particular): Beam Rider, Breakout, Enduro, Pong, Q*bert, Seaquest and Space Invaders. Since then more Atari games have been "beaten" in the same sense, but many still remain. I'm afraid I can't find references to this but I've seen slides from DeepMind people a few times and there is always a curve with a few games at the top and most games at the bottom, below human performance. There are some games that are notorious for being very difficult to solve with Deep-RL, like Montezuma's Revenge which was claimed to be solved by Uber a couple of years ago however this was done using imitation learning, which means watching a human play. The result is nothing like the result in Go, which remains the crowning achievement of Deep-RL (and its best buddy, MCTS).

Bottom line: Atari games remain anything but a trivial exercise.

And the architectuers that play Atari do not perform as well in Go or chess, say. You are mistaken that it's simple to train the same system to do all of those things. The AlphaZero system that played Go, chess and Shoggi well enough to beat its predecessor (you will excuse me that I don't remember which incarnation of Alpha-x it was) had an architeture fine-tuned to a chessboard and pieces with discrete moves, so it would not be possible to reuse it to play Starcraft, say, or even tic-tac-toe. The cost to train AlphaZero is also very high, in the hundreds of thousands of dollars.

> The speed at which AlphaStar overtook previous AIs seems to me to be nearly unprecedented in AI research

In pretty much any field, top performing humans are at the physical limitation level, you will not see any sort of breakthrough, just incremental improvement.

Machines on the other side, can be scaled arbitrarily. Once you've built a small crane, you can build even increasing ones, it's just a function of money and interest.

Some say that intelligence is not like that, that it can't be scaled arbitrarily. But the burden of proof is on them, especially given the consequences if it can.

>Machines on the other side, can be scaled arbitrarily. Once you've built a small crane, you can build even increasing ones, it's just a function of money and interest.

It doesn't matter how much money or interest we have, but right now it isn't technically feasible to build a 36000 km tall crane (also known as a space elevator). Humanity simply couldn't get it done even if we poured all our current resources into that project. It isn't physically impossible, but such a behemoth has requirements that current materials science cannot meet. Building tall cranes generates useful know-how for a space elevator, but it's a really different problem, not just a matter of resources.

Following the analogy, a general purpose AI simply isn't a bigger Deep Blue or AlphaGo; it's probably something different that requires knowledge that we currently don't have. Sure, building Deep Blue and AlphaGo most likely generated part of that money, but that doesn't mean we have everything we need.

You can argue that is simply also money and interest, and that's true, but it's not in the same as building a 10 meter crane and a 20 meter crane.

> Following the analogy, a general purpose AI simply isn't a bigger Deep Blue or AlphaGo; it's probably something different that requires knowledge that we currently don't have.

I agree with this, but it isn't clear to me that a general AI will have a significantly different impact on society than a world where task-specific well performing AIs are easy for anyone to develop.

Sure, a general AI has a set of properties that are really fascinating to discuss and debate (including what is consciousness and whether AIs should be given rights), and perhaps a general AI is required for doomsday computers-taking-over scenarios, but the impacts that AI will have on our economy and politics don't require general AI.

You're right. Of course, we're not yet in a world where _task-specific well performing AIs are easy for anyone to develop_, but we're getting there and looks feasible. A 500 meter crane.

However, I disagree that the impact is the same. I think that the difference between a lot of specific jobs being automated and the world being run by a MULTIVAC world optimizer is pretty big.

> Agents were capped at a max of 22 agent actions per 5 seconds, where one agent action corresponds to a selection, an ability and a target unit or point, which counts as up to 3 actions towards the in-game APM counter. Moving the camera also counts as an agent action, despite not being counted towards APM.

I'm happy to see that they've greatly improved the APM cap. During the earlier showmatch they had an extremely naive cap on the average APM during the whole game. Which meant that the AI could take it easy for most of the time and then during battle sequences hit peaks of 2000 APM. Also the previous cap was based on the in-game APM counter, which doesn't count some things like moving the camera, which is now addressed. The current state sounds a lot better.

However it seems still superhuman in mechanical non-strategy ways, e.g. a human can misclick (click too early/late when moving the mouse, or just miss the target with the cursor completely) or do accidental double clicks etc. These end up being very costly mistakes against a zero mechanical mistake AI. Which in turn means that the AI can win with an inferior strategy due to the extra gains it has via superior mechanical accuracy. In other words this artificial intelligence is still under significacnt artificial motor skills welfare, and thus even if it starts beating pro players we shouldn't be too quick to talk about how it's the intelligence part that got it the win.

All of that said, I'm liking the progress and am excited to see what they achieve with this next. Would love another showmatch against pro players.

Yes, those might have some impact, but it is clear that the progress is there, with this new APM cap and camera movement etc.

You can also see in replays that the AI often makes mechanical mistakes, missing spells, missing units, even ordering wrong units from outside the screen - so it surely seems that if it's win rate was conditioned in any strong way on its sheer mechanical ability, it would have learned to not make any such mistakes. Since the mistakes are clearly there - it seems that its power comes primarily from somewhere else, likely from the AI ability to choose the right actions, not from mechanical power of executing them perfectly.

Thus it is not likely that "this artificial intelligence is still under significacnt artificial motor skills welfare". They have consulted top players before they set this AI up, and if those details were important, they would have baked them into the limitations list already.

The mistakes it makes are due to bad decisions. There have been no claims by DeepMind that they have some sort of chaos engineering [1] going on where the AI decides one thing and then the output system actually does another thing.

Also I think you overestimate the AI/IT knowledge of these top players that they're consulting. I have great respect towards them, but they're not renaissance men [2] who both play 10 hours of StarCraft per day and also know the subtleties of how computers work, not to mention bleeding edge AI. You can watch the previous showmatch [3] to see how DeepMind people lack knowledge of StarCraft and how the pro players they're consulting lack understanding of the AI and both are learning new things live on air. Its obvious that their cooperation is bearing fruit as they spend more time talking to each other, as evidenced by the new APM limits. However I'm willing to bet that they would reach many more good conclusions if they just continue working together.


[1] https://en.wikipedia.org/wiki/Chaos_engineering

[2] The problem with both parties (the pros & deepmind) is that they're so overspecialized. I'm nowhere near as good as them at their respective fields, but I am a professional programmer and diamond in StarCraft II. In addition I've built StarCraft II AI myself, although with different goals related to finding optimal strategies.

[3] https://www.twitch.tv/videos/369062832

You're not giving them enough credit. Oriol Vinyals (head of this project) is both one of the leading AI researchers + was highest ranked StarCraft player in Spain back in high school.

Certainly the game is different, meta has changed, etc. But he's definitely not an amateur and is probably in the top percentile of players.

>[2] The problem with both parties (the pros & deepmind) is that they're so overspecialized. I'm nowhere near as good as them at their respective fields, but I am a professional programmer and diamond in StarCraft II. In addition I've built StarCraft II AI myself, although with different goals related to finding optimal strategies.

So many things wrong with this comment.

You're nowhere near a professional StarCraft player if you're in Diamond league (I play casually and I'm on high plat, bordering Diamond), and Oriol Vinyals, the lead research scientist behind this project, is one of the most renowned scientists in the field and used to play StarCraft at a professional level. They also said that other employees at DeepMind are at Masters level, and helped test the AlphaStar.

You quoted me saying that I'm not as good as them and then you say that my statement is wrong because I'm not as good as them. We seem to be in agreement. Maybe you misread? Anyway that was just a footnote to point towards my generalist nature, because I think that's a fundamental reason why I immediately see these things while for them it takes time to realize. Also to be even more fair towards them, my generalist knowledge around computer controlled gaming goes way beyond StarCraft II as I've written bots for many different games.

My main argument revolves around APM. Oriol Vinyals might be great, but I've also seen him make a statement on video in 2019 [1] how restricting the AI based on average APM during the whole game is reasonable. He has blindspots that someone like me can immediately spot.


[1] https://www.twitch.tv/videos/369062832

These live on air admissions of prior ignorance that you mention maybe for illustrative purposes, a PR gag to explain the difficult approaches to each other they had taken in development. It's like late night shows.

You must be one of the people who think really tv programs were really real.

Why don’t they ever just have a virtual mouse api or whatever, with some lag and some jitter just to make it an actual apples to apples comparison?

The AI is supposed to call this API. Then actions per minute would be irrelevant.

I'm just guessing but I think it would be easier to start from a perfect system and progressively work towards an accurate imperfect system rather than the other way of guessing an imperfect system and working back towards an accurate imperfect system.

Easier but that’s cheating LMAO

Grandmaster level players make those kinds of mistakes extremely rarely and if they’re the only difference between winning and losing, the ai is an extraordinary achievement.

It's not exactly the same as a showmatch, but you can find lots of YouTube videos analyzing replays of AlphaStar playing on the ladder. For example, this game against 2018 world champion Serral: https://www.youtube.com/watch?v=_BOp10v8kuM

And even with limited APM, the response time can still be superhuman. Bit disappointed that they didn't add some human-level latency to the agents' actions.

How much do misclicks or accidental double clicks matter at the pro level?

A lot. In the same way that errors reduce data bandwidth, misclicks reduce effective APM in hectic situations. Classic examples are moving marines into banelings or units into disruptor shots.

Sorry, I asked that wrong. Clearly misclicks matter at the pro level, as pros are going to exploit their competitors mistakes ruthlessly. What I should have asked is, at the pro level, how often do misclicks affect the outcome of a match?

Every game of Starcraft is decided by an extremely large amount of small events. It is very rare that a single event or misclick is responsible for the outcome of a game, but it happens occasionally.

Since players take so many actions during the course of the game (well into the tens of thousands), inevitably some clicks will be sub-optimal, and they all have a tiny impact on the outcome. Some professional players do specific exercises to improve their clicking accuracy in order to gain efficiency by reducing misclicks, but generally clicking accuracy is not considered a big factor compared to raw speed. Most players try to attain the highest possible clicking speed while maintaining an accuracy level that is "good enough".

from what I've heard from a starcraft player, what you've just described is just how it is. You don't need good strategies if you have superior techniques (apm & precision). It's not really a game of intelligence (after you've got the basics down).

an SC robot with alphastar brain--learning to use a mouse for a start--would probably only shift the blame to those super human materials. They should put alpha* into a human! That's where it's going anyway, isn't it?

Has anyone read the actual paper ? This summary really makes it look like "mission accomplished", but this was much much more interesting than that. We saw AI do "obviously stupid things", and we also saw them improve a lot in the middle of the trial, as many youtubers showed.

AI was also much more interesting when playing the protoss race, and really felt like it was responding to the opponents actions, and the other races not so much.

But most surprising is that it didn't make any "breathtaking" moves or actions, as opposed to AlphaGO. Actually not a single game made pro player realize something new about the game. Which is really embarassing, because it suggests that the AI just was able to correctly reproduce existing strategies and build orders, that it probably "learned" from existing pro games in the training sample.

I was really hoping for a more interesting report, honestly explaining the shortcomings of the technics used and giving hints for the obvious blunders. As well as a roadmap for a second round, this time aiming at beating the very top players.

IMO it's a testament to how games like SC are collectively and thoroughly "solved" by the community, and how the games aren't that complex after all. I never followed SC or SC2, but my observation of the pro scene and competitive ladder for Warcraft 3 was that cookie cutter strats dominated. Pro players were typically those who executed best, not those who innovated best.

Personally I felt disappointed by the fact that real-time strategy played a pretty minor role in RTS games. If you left the well-beaten path of cookie cutting and tried something new you invariably gave up a pretty obvious advantage to do so.

Former masters Zerg here. You are partially correct, but actual high level play involves a great deal of small variations to these cookie cutter strays, and these small variations lead to a constantly evolving metagame. A good example was the kespa league during hots where innovation and strategy played a big role.

It’s just that a lot of that strategy is irrelevant unless you have amazing mechanics - and the difficulty of those mechanics means that you can’t as easily think, plan, or adapt in-game (because your brain is busy)

A big part of the game is thinking about “how do I win against somebody who does that...” between games

There is a lot of strategy in RTS's but alpha star doesn't need to exploit any of it because it can win on mechanics. It has perfect macro, and perfect micro, which allows it to beat players without having to learn strategies or tactics.

It's kind of like competing against a gorilla in boxing chess. Just because a gorilla is dominant doesn't mean boxing chess doesn't require chess skills, only that a gorilla doesn't need them.

Are you sure it has perfect micro? Can you even execute perfect micro on 30 actions per 5 seconds?

"Perfect" no, much better than any human yes. 30 actions per 5 seconds is 360 actions per minute. The very top players might have similar stats, but the similarities would end there. A lot of human actions are mindless spam clicking and those 300+ actions would contain several misclicks. The ai would never misclick. They can also do things like pull back every weakened unit before it died much more accurately than a human. A human knows to do these things but it has much less mechanical skill than the ai.

Cookie cutter strats or the metagame changes over time. Knowing what strategies you are likely to face influences what strategy you are likely to try. The meta is different across regions. The europeans play differently to the americans who play differently to the koreans. Some people dedicate themselves to different playstyles and your 'cookie cutter' strat will have to grow with everyone else's. They patch the game often to update the balance.

Designing a game that is a fun and balanced experience whilst also encouraging true innovation is a difficult problem. Every game ever has this problem, putting thousands of minds on a problem and letting them confer about their results usually solves all the low-hanging fruit in about a week. Creating a game where you only play to share a new idea makes each match feel like a dice roll.

> my observation of the pro scene and competitive ladder for Warcraft 3 was that cookie cutter strats dominated. Pro players were typically those who executed best, not those who innovated best.

Warcraft 3 and broodwar were much more micro-oriented than sc2 (because ai was much worse for example if you let dragoons to walk to their target on their own they spend half the time fighting broken pathfinding while the enemy shoots at them :) This almost never happens in sc2, and microing units below master league is usually detrimental to the game).

I think the micro vs macro is not the full image, and sc2 lets player express themselves on what i would call mediumgame (something that is between buildorders and microing - like deciding when to expand, what tech-switch to make, how to position your army in between fights, etc).

There's lots of decisionmaking that isn't APM-limited but thinking-time limited in SC2, and it's not as codified as buildorders. But you can certainly simplify the games to "archon immortal vs bio terran", and then it's very repetative.

It's a bit more interesting when people play several matches against each other. They are forced to switch tactics.

Raw, spur of the moment, innovation probably isn't a good idea generally, as you'd be doing something with no practice. People do have a fairly big bag of tricks from older tactics that are less common these days.

because the real skill is in the micro

> Actually not a single game made pro player realize something new about the game.

People are now over saturating their mineral line (making more probes than before), so I don't think that's true.

There were other cases of this such as 3 rather than 2 early air units. The AI seemed to anticipate losses in a way that is almost more rational than humans.

I remember it also built like 8 observers in one match, supposedly because the AI was so good at sniping them (and it trained on itself).

This is wrong. See posts here from ptitdrogo: https://www.reddit.com/r/starcraft/comments/d4n3tw/alphastar...

> People oversaturated in wol and hots because you didn't expand a lot in these games and bases have a lot less minerals in LOTV, not because of some kind of lost knowledge like some comments seem to think here.


> Meanwhile alphastar was going 2 gate robo staying on one base making a fuck ton of probes to take a super late natural, it was bad, and anybody calling it the future really bothered me.

Alphastar has not introduced anything new. Its play is strategically poor.

Can confirm. I did experiments and math about this years ago- overtraining workers is only good if you know you are about to get severely harassed and plan to lose a bunch


Is it because they expect to lose workers to harassment? Or is it so that they can saturate new expansions quicker?

In terms of total mineral output oversaturating would seem to be disadvantageous since being able to get more workers on the new mineral line is offset by the cost of not expanding sooner and then being able to build from both bases.

I'm interested to understand why Alpha did this since worker production seems like one of the most solved and optimized parts of the games and not where you'd expect innovation

Most people speculate it's to account for losses from harassment where it somehow finds easier to overprobe than bother with more defense.

> But most surprising is that it didn't make any "breathtaking" moves or actions, as opposed to AlphaGO.

I don't play any competitive StarCraft so my view might be moot but I was surprised at the number of siege tanks it uses. It made me wonder if there's some critical advantage to having so many tanks stacked up in a line so that the splash is spread out.

Also I find it weird that it did not build any marauders at all, in any of the games I've seen.

And just in general, no sexy units. No battlecruisers, no infestors, no swarm hosts...

>Also I find it weird that it did not build any marauders at all, in any of the games I've seen.

This is off-topic, but I was thinking about this the other day - couldn't AI be used to balance a competitive game in this sense? Imagine that the AI becomes so good that human players very rarely win against the AI in a best of 7 series. Then we find out that AI doesn't ever build a specific unit. That would be a pretty strong indicator that said unit isn't good enough compared to the rest.

You wouldn't be the first person to say this. But it only proves that the unit isn't useful __ as piloted by AlphaStar __.

Based on the initial AlphaStar against TLO/Mana, you would think that Stalkers are insanely OP and the only thing worth building.

I (and others) have wondered that if you continue to lower AlphaStar's APM, you would see a diversity of units. The money would be in where AlphaStar decides to spend its really tight APM budget. Is it worth casting that Psionic Storm?

I personally feel like it has an insane micro advantage by being able to select arbitrary units on the battlefield, as opposed to dragged squares or control groups. But I'm not a pro gamer, so I don't know what that feels like.

You can also do this directly with metrics on human games. It’s something that I know wizards of the coast does with Magic the Gathering decklists, just counts the copies of cards and keeps track of the “conversion rate” of them (how likely is this deck to pass some threshold, like have a positive win percentage, given that it has this card). They are doing it primarily to spot ban targets (the paper aspect of cards means you can buff without printing a new set of cards), but it is the same idea.

This sounds like it was posted before the link was changed, but at least one pro was impressed by its novel style:

> AlphaStar is an intriguing and unorthodox player – one with the reflexes and speed of the best pros but strategies and a style that are entirely its own. The way AlphaStar was trained, with agents competing against each other in a league, has resulted in gameplay that’s unimaginably unusual; it really makes you question how much of StarCraft’s diverse possibilities pro players have really explored.

- Diego "Kelazhur" Schwimer, professional StarCraft II player

yes indeed, it seems the most recent batch of games brings interesting new things

> Actually not a single game made pro player realize something new about the game. Which is really embarassing, because it suggests that the AI just was able to correctly reproduce existing strategies and build orders, that it probably "learned" from existing pro games in the training sample.

Why is it embarrassing if the AI behaved like a collection of all the knowledge of existing pro players?

Obviously it'd be more interesting if we also learned something new about the game. Or AI, or both.


I watched Dota2 first OpenAI 5v5 show at TI8, and they did had different actions, like which spells used to initiated, how some spells are used, etc.

But tht AI was very limited in Hero pool.

> After 50 games, however, DeepMind hit a snag. Some players had noticed that three user accounts on the Battle.net gaming platform had played the exact same number of StarCraft II games over a similar time frame — the three accounts that AlphaStar was secretly using. When watching replays of these matches, players noticed that the account owner was performing actions that would be extremely difficult, if not impossible, for a human. In response, DeepMind began using a number of tricks to keep the trial blind and stop players spotting AlphaStar, such as switching accounts regularly.

So instead of fixing the unfairness... they tried to hide it?

Funny how they put it as:

> When watching replays of these matches, players noticed that the account owner was performing actions that would be extremely difficult, if not impossible, for a human.

framing it like the agent was doing some sort insane play, when the reality is the main evidence of a player being alphastar (other than its garbage decision making) was the fact that it wasn't using hotkeys!

The correct terminology would be "control groups".

The configuration of control groups is visible in replays; AlphaStar's replay data does not have any control groups in it.

Though, via its API, it is able to select arbitrary groups of units from anywhere on the map.

Personally, I would have loved to see it work through control group management because I think that is something that is important and steals attention away from humans. But probably to the researchers, it is just annoying data to model that doesn't get to the "core" of StarCraft.

These extremely difficult/impossible things didn't really give an advantage. For example, AlphaStar would sometimes click on an object at the border of the screen. For humans that would be almost impossible, because the screen would scroll when the mouse approaches the border.

Similarly, AlphaStar would not play with group hotkeys, but use a different technique. However, in none of the analyses, people noticed things that would give AlphaStar an unfair advantage.

> These extremely difficult/impossible things didn't really give an advantage.

One of the videos I watched compared APM (Actions Per Minute) with EPM (Effective actions Per Minute). AlphaStar always has them nearly identical, which would be (according to him) basically impossible for humans.

Of course it's impossible for humans. Humans are used to clicking multiple times to do an action. Some actions you want to take in the game are very important, failure to do them correctly can mean a lost game, for example moving units in combat properly. This means that it's better to spam click that action a few times to make sure that the button press is registered properly, because it is possible for button presses to fail due to hardware, software, and most often a coordination error. This means that essentially every player has a much higher APM than EPM, but it's really EPM that counts. APM itself is mostly irrelevant.

Hmm, seems like it might be interesting to add a slight error rate for AI clicking and see how it handles it?

You only need excess dexterity than your opponent, and only for a couple seconds, to decisively win a match in SC2.

Error rates, EPM and APM are all red herrings.

Right, one example I remember from watching the games back in January was stalker micro. The high EPM of the AI allowed it to micro them picture perfectly and blink every single one away right before it died. In doing so it won a battle against impossible odds, and the announcers even commented on how it was way beyond the capabilities of the best players.

That exact strat is the reason they implemented the APM limit, it was a cheese strat only a machine could implement.

What you said is a meaningful advtange.

Scrolling screen is a major action to gather info and control units.

If one can control near edge units without scrolling it gives more stable view and lower chance of making mistakes.

There's an in-game option to disable scrolling the screen with the mouse and doing it with only the keyboard.

> For humans that would be almost impossible, because the screen would scroll when the mouse approaches the border.

That humans cannot reliably perform these actions because of the limitations of our corporeal form means that Alphastar has an advantage over a human player. Limiting APM isn't enough.

The question isn't whether the AI still has advantages, but whether it's an unfair one. The current restrictions were implemented with input from TLO, and seem to work quite nicely.

One could, of course, add more restrictions (like not noticing things on the minimap all the time...), but that's not what's it about anymore. At this point, micro is comparable to humans, and we can start to compare macro and strategy.

Fwiw, when I watched some of the replays, I was disappointed by AlphaStar. It's a very consistent player with few mistakes, but it isn't very reactive and definitely not inventive. Instead of switching tactics when things don't work out, it generally continues with the chosen strategy. Often, that's enough: a well executed strategy with few errors often wins, even if it wasn't optimal.

> a well executed strategy with few errors often wins, even if it wasn't optimal

I think this is hits the chord with why a lot of people are talking about this bot as cheating more than they are being outplayed.

It's like fighting an aimbot in an FPS. They arent outsmarting you, but the mechanical consistency is inhuman and you are unable to force mistakes. At best itll feel like playing someone on their best day.

The heart of gaming, and why starcrat2 in particular is popular, is finding ways to play around each others mistakes.

Honestly, they should reduce Micro to the point that humans can bully it a little bit. How does the AI handle finding new ways to trade units to its advantage against a mechanically stronger opponent? How does it try to bait the human into having to take fights with unfavorable units / strategy.. as opposed to beating through them with mechanic mastery

The issue with limiting it past what humans can do is that in the self play training they'd never encounter what humans could do. The benefits of trying to make the limitations somewhat human-like is that the self play optimizes around roughly what it will encounter when it goes against humans

May be make self-play asymmetrical so sides in the game would have different sets of limitations? So main agent would have subhuman micro and exploiter agents would have superhuman micro.

I don't think the comparison with aimbots is fair. Aimbots are more like a perfect micro. In the beginning (when they did their first show match), Alphastar was insane at that. Now the micro is not giving an advantage anymore.

Where alphastar shines, is in making a strategy work. It won't attack into something that looks dangerous. It won't forget to build more units. It won't be distracted on two fronts. (That said: Alphastar did some really stupid mistakes as well.)

I agree with you, though: it would be much more interesting, if Alphastar was actually handicapped in the micro (at least at a later point in the game). Then it would really need to find interesting strategies and counter opposing units with more cost-efficient ones.

>Now the micro is not giving an advantage anymore.

I guess I need to learn more about the details here. Robotic consistency is a significant advantage, so unless it is handicapped I dont think the result is much of a leap from what we learned by it being insane at micro.

You can feel consistency on your opponent and it is extremely intimidating. Usually, players are consistent at different things.. like my micro might be much better with marines than tanks. To have consistent human-level micro across all units is already a huge advantage.

I agree the aimbot comparison is excessive, but the hyperbole was good for getting the point across.

You could take an fps AI with aimbot, and give it 220 ms input lag to say it doesnt have advantage over players anymore - but its still going to climb to the top of the ladder system because it isnt going to miss. It is either advantaged at aiming or disadvantaged based on that 1 number

I think it's less about fairness and more about what's interesting? When the AI finds an advantage that makes its gameplay kind of boring, nerf it somehow to see what else it can do.

It's an AI system, of course it has some advantage, that's why we are building them.

The advantage here is due to artifice and not necessarily generalizable.

To play against alphastar, you have to opt-in. IIRC it states Alphastar will hide itself.

One of the things people noticed in replays was the lack of control groups and in the case of zergs, the ability to select larvas directly, which no player ever does. It could have been as simple as removing these quirks.

Nobody selects larvas directly? I guess I am a computer then. Way worse than Alphastar tho.

Ha! So I always did that in Brood Wars, but in SC2, I put each hatchery on a hotkey (0, 9, 8, etc.) and then just pick the units I want to queue up. No need to select individual larva any more.

I like to map tilda, tab, and caps lock to 0, 9, and 8 so that my left hand doesn't have to reach as far.

New players (like when I first played) are likely to select individual larva before they do tutorials or learn the hotkeys, but you're right that high level players would almost never select one directly. Maybe after a hatchery has died and there are still larva remaining?

No, it doesn't select the larvae by dragging, it just selects them instantly from the other side of the map. I'd guess it either has a hotkey for the hatcheries and press it and the larvae key instantly without it showing in the replay, or it has a cheat ability to instantly select larvae.

Its API allows it to select units arbitrarily from anywhere (regardless of camera). Camera vision only controls the specificity of data it sees for on-screen units.

Oh, interesting. Thanks for clarifying.

For the type of beast AlphaStar is, they do not sound like simple tasks at all.

I'm not sure this is unfairness. IIRC they put in a fair bit of effort to put it on a level playing field with humans by limiting APM and not allowing it to observe multiple areas simultaneously by spam moving the camera.

It might have some minor unfair advantage in terms of being able to click with pixel perfect accuracy, but they're marginal and from watching this project evolve, it's pretty clear that the strategic planning aspect of alphastar has indeed become better than humans.

"Agents were capped at a max of 22 agent actions per 5 seconds, where one agent action corresponds to a selection, an ability and a target unit or point, which counts as up to 3 actions towards the in-game APM counter. Moving the camera also counts as an agent action, despite not being counted towards APM."

This is worse micro than top human players.

Wasn't blink stalker micro pretty central to its playstyle as protoss, the race it was best at? Why would you say that is minor and marginal?

It can no longer do micro like that, I have watched most of its games several times and it never did any micro feat like that again. On the contrary its micro was often lacking compared to humans, it won mostly through very strong timing attacks and uncanny ability to pick when to fight.

> It can no longer do micro like that

Is that supposed to be an intention in design? I'd figure an AI could easily outplay a human if that weren't the case, given its inherent advantages (e.g. better accuracy, the ability to instantaneously prioritize which units to blink in/out).

For example, [this](https://youtu.be/pETcAm82vXU?t=322) game is one example where an AI could perform even better using such tactics.

EDIT: another commenter mentioned it would make the game too unfair, so it was an intention in design.

Just as an example of how extremely unfair it would end up being, flawless AI micro looks like this: https://www.youtube.com/watch?v=IKVFZ28ybQs

100 zerglings vs. 20 siege tanks. Without insane micro the zerglings barely kill 2 siege tanks. With insane micro the 100 zerglings mop up the whole army with ease.

It's fascinating & fun to watch, but if your goal is to make an AI that can out-think a human it's super not useful, either.

That was from the version of AlphaStar that they demo'd in the earlier show matches. From what I gathered in the articles over the last few days, they took in a lot of feedback from that and adjusted this to make strategies like that inhuman blink micro less possible...

"... the strategic planning aspect of alphastar has indeed become better than humans."

This is not true. Top-level human players take into account who their opponent is, what builds they've used in recent games, their proclivities, strengths, and weaknesses. In a 7-game match they'll intentionally use builds which (in order to deceive) appear the same but have very different effects.

AlphaStar has better strategy than other AIs, but that's a low bar. My young son has better strategic appreciation of the game, and easily points out the difference between good strategy and AlphaStar's next-level micro.

They may be marginal but at this level of competition the margins are where the differentiation lies.

There has always been the issue of interface when playing videogames AI vs human. Either give the human a brain-computer interface or give the AI a mouse, keyboard, monitor, robot hands and a camera. Anything else seems inherently unfair.

They are testing the AI's ability to do tactics and strategy, not motor and visual tasks. It has built in delay and needs to move around the map to gather information just like a human, so it doesn't have any significant unfair advantages.

Edit: Note that the version they sent out to the ladder had significant larger delay, significant lower APM, and didn't get any information not visible on the screen unlike the first iteration.

When Watson won Jeopardy, its lead designer thought a 5 ms reaction time was fair because one time a human anticipating the buzzer light got 2 ms.

Of course to any reasonable person a consistent 5 ms is obviously vastly superior to once in a blue moon lottery-winning luck.

Sounds like it's the same deal here, where the designers made some effort to level the interface playing field, but also left in some advantages like instantly being able to select individual units (or whatever it was the other players felt was not humanly possible to do). Probably because, at some level, they really want their system to win.

Yes, the point of the entire project was for the system to see if it can win.

If we have reached the point where to even risk a loss requires implementing artificial restraints on the AI...well...

They aren't really artificial restraints though, more like artificial advantages without them. The interface delay challenges are pretty relevant to making things fair in almost ANY competition. The physical part of the interface is an essential part of the game here.

You aren't really testing the thing you really want to test otherwise.

You could probably play tennis pretty well against Nadal if you made him wear goggles that gave him a quarter second delay in response time, but that's not really a fair test.

All other things being equal, if you match up two chess players, and one can just say or think the position instead of having to physically move the piece before they hit the clock, that advantage will accumulate over time.

I wonder if implementing tactical pause would eliminate interface advantages. Players can pause game, scroll around map and give orders to units before unpausing and watching them execute.

It will be a different game, but you can see how AI does against humans tactically and strategically.

Strategy is limited by available tactics, and the ability to execute them. Your (human) tactics and strategy would necessarily change if you and your opponent were required to use a mouse that can only move across the screen over the course of a couple seconds, or if you could only click once per second, or if the computer added noise to when/where your mouse click arrives.

Strategy and tactics are a function of constraints. If an unconstrained computer can beat a constrained human, have we really shown that its strategy and tactics are better? To do that, you'd have them to play under the same constraints. Or at least have the constrains be close enough, which is what people are debating.

However the computer can macro while in a fight which is something people can’t dI. We would miss the fight and lose the game. It’s not just a tactics game. Attention span and where your eyes are matter.

It’s so bad that below master people shouldn’t really be doing too much micro during a fight because they’ll lose out on macro. Even in pro matches you see attention span issues and sometimes avoidance of too much micro to win. Really good Micro is going to win you any battle.

I’m in diamond 2 and I definitely micro in battles. The difference between platinum 1 and diamond 3 is largely macro but it very soon becomes micro as well.

Elite players do this all the time. Serral would use his standard tactic of attacking in two places as the same time, and alpha star couldn’t figure out the best way to deal with it. When alpha star did the same thing to him, he would mitigate the smaller attack and even predict alphastar’s escape route and crush it.

The competition is not the goal - the point isn't to see "who's better", the competition with real humans is just a tool to help improve the reasoning, planning and decision making capabilities in a less artificial environment than usually. The purpose of that project is to use Starcraft as a playground to test, review and improve reinforcement learning methods so that they can be used for other needs.

Any aspect of "fairness" is relevant if and only if it helps that goal. APM limits are important because we want the computer to choose strategies that would actually be good choices in that situation even for a human player, instead of being able to win with strategies that are powerful only because of superhuman clicking speed. On the other hand, requiring the computer to press buttons with a robot hand doesn't really facilitate improvements to the decisionmaking part, so it's irrelevant even if it would make it more fair.

Yeah. Bulk of the other comments are useless for that reason. It's almost like humans have inferiority complex against AI...

Interesting thing about AI is no matter how you spin things, the more advanced it becomes, unfairness grows proportionally.

After all, isn’t that the point of AI? To perform better at certain tasks than humans? We are visitors in the digital realm, just like when we put on a scuba suit and jump in the ocean—even the smallest fish can out-swim us.

What are you solving for here? They already limit the AI to human like speed, reaction times, and accuracy.

Then we are stuck with an inherently unfair system, as robotics is nowhere near ready for keyboard/mouse, and neuroscience is nowhere near ready for CBI.

This current modality is important, IMO, because we could potentially see neural networks performing tasks on other software, not just SC2. Imagine a neural network performing copy-editing in Word, writing code for CRUD applications, etc. Those are some mind-blowing potentials, we'd be losing out if we slowed down to work on robot hands.

some korean players were so quick at inputs you'd think they were half way to brain-computer interfaces already

Interestingly, the hardest part of AlphaStar looked like it was researching how to stop the computer winning through an overwhelming mechanical advantage.

Depends on what you're trying to test. You could just as easily argue that _not_ allowing the AI to make use of its natural APM and interface advantages is unfair, and that it should be allowed to play without restrictions in order to demonstrate its full potential.

AlphaGo had an "unfair" advantage in its games against Lee Sedol in that it was able to "think" far faster than you could reasonably expect a human to, and could therefore evaluate many, many more moves than a human player possibly could have. If you artificially limited the speed at which AlphaGo was able to "think" to match human limitations, would it have been able to win against Lee? I'd argue probably not.

Similarly, here we have an AI that's easily able to crush top SC players if it plays with an uncapped APM and no camera limitations, but only reaches grand master level when you impose artificial restrictions that force it to behave more like a human. The question of which configuration is more "fair" is kinda arbitrary; it depends on what your goal is.

If you read the paper, this is exactly what they have done, and this is why this achievement is so interesting.

Maybe we can train a better AI at turned based strategy games.

I hope Civ5 could open up its API.

Or even a turn based game, where click speed does not matter...

But reaction time in a real time game is also an interesting problem with AI. If they factor out the merely mechanical (the UI itself), reacting to a situation unfolding in real time where you don't have a lot of time to think is a nice test of AI vs human strategic and tactical thinking.

If Google could tackle Civilizations god-aweful AI next, I'd be over the moon.

Do you mean Google should write a better AI than the one in Civilization?

Interesting. But also consider this: in casual strategy videogames -- actually, strike "strategy" and just consider videogames -- most players don't want a really hard opponent. A computer opponent that is really very hard to beat is not what we want, because that'd be frustrating and many of us play videogames (yes, even strategy games!) to unwind; we want the illusion of challenge, an opponent that is challenging to beat but within the possibilities of every person who buys the game.

Which, by the way, is also the case with Starcraft II. Most people who bought it aren't tournament players. They expect a challenge, but not a really hard challenge. Game difficulty is all about perception ;)

PS: I shamefully confess I reloaded my X-COM (DOS!) game every time I lost one of the soldiers I was emotionally attached to. I don't like losing! :P

In strategy games like Civilization the "preferred" difficulty level is that hard/easy to beat because it gets extra resources. It would be preferrable to have the same difficulty level through opponents that play better/smarter while having the same game mechanic consequences as players if they make the same actions, but we currently can't, so they get artificial production multipliers and such.

Yes, I understand this. I guess I disagree better/smarter would be better, because in videogames what matters is the illusion of challenge, not a real challenge. So spending resources into developing a real AI for Civilization is probably not the best idea; as long as it tricks casual players into believing it's putting up a fight, that's good enough.

This approach creates a mismatch between singleplayer and multiplayer modes - this means that playing against a computer opponent rewards/requires different strategies than a human; a challenging computer opponent has more income and units but poor usage of them, while a similarly challenging human opponent has less income and units but uses them very differently; so playing against challenging computer opponents doesn't help you improve against other players but possibly is even counterproductive as you learn to adopt strategies that are bad in the other environment.

True. I don't know many videogames in which playing against the computer really helps you against human opponents.

I don't know whether a more capable non-cheating AI would help though. Not unless it specifically imitated how a (good) human opponent would play, which I guess is an additional and difficult to implement constraint.

I mostly agree with you - there is a little bit of a fine line though. I can learn from my opponents strategy playing against someone a little bit better than me. Maybe i lost because i need more or less of X? Tough to learn from when the opponent is creating the illusion of playing well, rather than playing well.

Current civ A.I.s may not be the smartest that the Civ team could do. Sid Meier once said on video that on play tests, when the A.I. would do something brilliant, players would just assume it's cheating.

I would pay a lot of money to be able to play a human level Emperor+ level AI on civ

My understanding is they've capped the AMP/EPM against a reasonably old EU/NA player.

Or connect the computer to human neurons directly.

You'd be able to save the time it takes a nerve impulse from head to hand that way. It's not much but apparently it can be significant enough that in some DotA games shorter players are disproportionately successful in the roles requiring the fastest reflexes.

Is that supposed to be a joke? DotA doesn't rely on reflexes nearly enough for that to matter.

Data doesn't lie.

When so many player play a game, even an almost irrelevant advantage can tip the scale.

Or maybe it's a sociological factor, not a phisics ones.

But the data doesn't lie.

Sure, data doesn't lie, but where is the data? I'm a pretty seasoned DotA player myself and this is the first I'm hearing of any such phenomenon. I can't think of any hero that shorter players have been more successful on than taller players, or vice versa.

22 actions each 5 seconds isn't that generous. Pro players have to manage apm too. Conserve it or spend where needed. I would say it is fair.


Here is a video of someone finding out that he played against alphastar. (He won)

I see a lot of comments downplaying this achievement, saying it's not impressive or it won't be impressive until X condition is met.

I welcome skepticism and criticism for this sort of thing, and think most of it that I've seen here is well founded. But I would like to take a second to explain why I think this, and really all the progress in this area, is actually a really impressive achievement to me.

Let me try and frame this from the computers perspective. Let's assume a resolution of 1024x780. I'm not sure what size frames they actually feed their agent but it's not that important to the discussion, the point is it's a big image, and according to the article this agent is learning from pixels. So, you the computer are given let's say 1024*780 = 798720 numbers to look at. You then choose a number between 0 and 798720 (or the crazy 10^26 number the article gives as the possible number of actions at each frame) as your action for that frame, and then you get another 798720 numbers to look at. After the round is over (on average 20 minutes, if you make a decision every frame that's 20x60x60 = 72000 rounds). You get one number telling you how well you did. You repeat the process and get a new number. It's higher this time! But what is the cause? was it that click you made on frame 22456? or maybe that unlikely move you made on frame 4567?

Obviously I'm oversimplifying here, and the numbers are probably wrong. but I still think what I've said gives the right idea for what kind of task we (as a society/community/whatever) have somehow gotten a computer to solve. Computers are DUMB, the fact that it's able to play this game at all, let alone at a high level, is still a minor miracle to me.

I don't think the AI learns from pixels (nor does it move through mouse/kb commands), it has direct API access to the game. Nonetheless it is impressive.

I think that downplaying the achievement is more like defensive mechanism which exists to protect illusion of human brain superiority.

Part of the problem is the hype in the other direction. I guess there are those who claim that this sort of result constitutes super human intelligence or something close to AGI. When in reality this is a hyper specialized system that has taken many man years of work to realize.

Not at all. We've known machines are better than humans at machine-like tasks for hundreds of years. IMO responses like yours are defense mechanisms of AI researchers/enthusiasts who want to protect the illusion that they are creating things that do anything remotely resembling what the human brain does.

Kasparov comments on chess computers in an interview with Thierry Paunin on pages 4-5 of issue 55 of Jeux & Stratégie (published in 1989):

‘Question: ... Two top grandmasters have gone down to chess computers: Portisch against “Leonardo” and Larsen against “Deep Thought”. It is well known that you have strong views on this subject. Will a computer be world champion, one day ...?

Kasparov: Ridiculous! A machine will always remain a machine, that is to say a tool to help the player work and prepare. Never shall I be beaten by a machine! Never will a program be invented which surpasses human intelligence. And when I say intelligence, I also mean intuition and imagination. Can you see a machine writing a novel or poetry? Better still, can you imagine a machine conducting this interview instead of you? With me replying to its questions?’

The fact that Kasparov was obviously wrong about a computer's ability to solve a concrete optimization problem better than him says nothing of value whatsoever and essentially proves my original point. We already knew machines were better than humans at these kinds of tasks, but people (like Kasparov) who didn't understand what computers were capable of will make wrong statements.

The title seems to contradict the subtitle:

> Google AI beats top human players at strategy game StarCraft II


> DeepMind’s AlphaStar beat all but the very best humans at the fast-paced sci-fi video game.

Yeah, subtitle seems much better. My impression before reading was that it did beat the top players but reading the articles makes it clear it beats everyone BUT the top players...

Also, this part seems a bit weird from the article:

> The AI wasn’t able to beat the best player in the world, as AIs have in chess and Go, but DeepMind considers its benchmark met, and says it has completed the StarCraft II challenge.

So they didn't manage to beat the best players but consider the challenge complete anyways? I thought the goal was to build something that could things better than humans.

The difference between top 200 players and the very best players is quite significant in Starcraft II. Some argue that the difference in skill between the best players and a top 200 player is the same as between a top 200 player and a median player.

I suspect if they remove artificial limitations (like APM limit and no screen jumping), it will be unbeatable.

They're framing it positively to distract from the fact they can't do it, the reality is they've spent millions of $ on compute and yet their agent is still terrible at strategy (and tactics sometimes too), they probably decided it's best to stop now before sinking even more money into it.

You're getting downvoted for tone probably, but I think you're right overall. They simply can't "win" this game in the same way they "won" Go.

Professional players that play in tournaments know each other and how they play. This might be a further weakness even if Alphastar can beat the best players. Other players may adapt to the play of Alphastar when they know that they play against it.

This comment was originally posted on https://news.ycombinator.com/item?id=21408024, but we've merged it into the earlier submission.

Not being robust to strategies it hasn't seen before is a serious shortcoming in a real time strategy game. That also indicates an interesting flaw in how this model is trained - in the millions of games it plays against itself, how do you ensure that it tries every viable (and some inviable) strategies? Sure, it couldn't best Serral but I wonder how it would fare against Has, a player known for some pretty off the wall builds.

> Not being robust to strategies it hasn't seen before is a serious shortcoming in a real time strategy game.

I'm not sure if you've actually played or followed competitive SC2. This is absolutely normal. Players will pull something completely unexpected out of a hat and win. The losing player will learn from it in future games. That's just how it goes. Unexpected strategies are really hard to counter when you've never seen them before, and they're often employed to directly counter what you're doing right then and there. So you've been countered and dealt a devastating blow, which means figuring out how to come back from that can be hard to impossible. It's a thoroughly human failing in every way. I'd be more concerned if the AI wasn't able to learn to counter it in future matches.

I don't think you understood his point. A player doesn't need to have seen a strategy before to react correctly to it. Sometimes a player will pull something completely unexpected out of a hat and lose, because the other guy reacted correctly thanks to his game experience. If you watch any high GM player stream half the time his reaction to what his (worse) opponent is doing is "WTF is this?" as he then proceeds to crush it. Alphastar cannot do that.

I do understand his point. There are just as many examples of people being so overwhelmed by the new strategy that they don't know how to respond to it and lose. And then learn how to deal with it in later matches. Also, downvoting me for disagreeing is a dick move.

And yet, in Go, the AI adapted to any "weird strategy" and won regardless. The fact that AlphaStar can't is an obvious weakness.

I would say it’s a weakness but I think it’s shared by most players. A ‘weird’ go strategy isn’t going to involve new kinds of pieces on the board or someone developing a win condition at a place on the board you can’t see.

You can lose a Starcraft game easily if someone is doing something novel and you don’t happen to scout the right place on the board soon enough.

I do actually follow competitive SC2. My point is that for DeepMind to master this game in the same way it has with Go and chess, it must be able to anticipate counter plays ad infinitum. Otherwise humans will continue to beat it.

Are new strategies more important in SC2 than go/chess? From the blog it seems like the AI learns from self-play so if it doesn't come up with a strategy, it won't know how to counter it. But that seems like it should apply to chess and go as well.

We need to create an open source organization to build these brains at home because: "Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware."[1]

I see these huge AI brains creating a new class divide between people who have access to these new AI brains and those who don't. The mission of open source has always been to break down these barriers to empowerment with technology. Thus, this is a great area for open source innovation.

[1] https://github.com/leela-zero/leela-zero

AlphaGo Zero was extremely computation inefficient because in (IMO misguided) attempt to be "general", DeepMind avoided even slightly specializing training to Go. KataGo optimized training to Go, slightly, and obtained 100x speedup, and that was low hanging fruit.

1700 years sped up 100x is still 17 years though. Even so, for the models that actually need it, Google and other people with huge GPU farms will have an enormous advantage over a single individual and only a large community can have any hope of challenging their ability to dominate.

What would be interesting is to limit the AI processing speed to human capacity, which is something like 60 bits per second.

In all these AI v. Human games I see, it is really apples to oranges because the human consumes vastly less resources and compute cycles to perform at the same level as the AI. And when I say 'vast' I mean Vast. There is like a quintillion factor difference between the AI and the human.

There is no way the AI is even comparable to the human. To be comparable, we'd have to parallelize the game and lock millions of humans onto the game full time for millions of years.

At the end of the day, the AI is just a more sophisticated lookup table. There is as yet no analogous AI to human play.

It would be interesting to see if there are any meta reports on people doing research on the topic of resource constrained AI. Has anyone explored whether or not an AI can improve on its own algorithm while being heavily resource constrained?

Is the ability to use less resources a limit of our current hardware? Do we see the current hardware performance improvement trajectory being able to reduce the amount of resources an AI consumes to perform a task?

Are our algorithms simply not tuned well for using smaller resources? Could we build better algorithms around resource deficient environments?

Great questions. From what I can tell, AI is just as resource hungry as it ever was, if not more. The only difference is now we have more resources to throw at the traditional algorithms.

I think we should just accept that humans and computers will be good at different things. Otherwise when AlphaStar 5 walks on stage and smiles at the camera we'll be criticizing its use of fuel cells instead of digesting food.

There's a lot of knobs to wiggle here. I'm more interested in finding out what emerges in the system that we can actually call "AI" the more restraints we put on it.

> What would be interesting is to limit the AI processing speed to human capacity, which is something like 60 bits per second.

What part of human processing capacity is this slow?

Our conscious thought. E.g. how many words can you read in a second, vs how many words can a computer 'read' in a second? Conscious thought seems to be the bottleneck in our processing, since we don't play these games unconsciously.

> placing within the top 0.15% of the region's 90,000 players

> 61 wins out of 90 games against high-ranking players

This doesn't seem to be quite as commanding as it was in Go. Do we know what MMR it reached or if it consistently beat players like Serral?

It hasn't beat Serral once, from the article: "The AI wasn’t able to beat the best player in the world".

To play devil's advocate:

They didn't say anywhere they scored a win against the top 10 either.

To elaborate: to be among the "top 0.15%" of 90k is place ~14 (13.5) in the worst case, so nowhere near the "commanding" abilities mentioned by GP

You'll struggle to work it out, go look at the top 20 of any of the ladders IIiiIII1iI1iII!II

It seems like it's only a matter of time though.

StarCraft II also has a rock-paper-scissors nature to it though, so you wouldn't expect even a perfect player to win 100% of the time. There are some strategies that are hard-counters to other strategies, and because of the imperfect information nature of the game, by the time you scout your opponent and see what they're doing, it may be too late to shift and deal with it.

> by the time you scout your opponent and see what they're doing, it may be too late to shift and deal with it.

Unless you're Serral and your observers magically come out of nowhere and cover every single pixel of your base.

Like this is happens when you try to surprise Serral https://www.youtube.com/watch?v=HTMIh9wzDjo

There aren't really 'hard' counters, there's just a high chance you'll win.

Flash had a 70% winrate overall and had time periods of 90%+ winrate.

I won’t agree until they beat South Korean players at Starcraft I.

They didn't even beat South Koreans at Starcraft 2 in the article

Instead of playing the APM rate, maybe they can slow down the game for the human players? If you play at 0.25x or 0.5x, it will be harder for the AI to out micro. Couple of micro strategies used by the AI, like staying at a 7 range when enemy can fire at 6 range is still a big reason why they are so effective.

I hope they don't stop here without doing a world champion challenge. It looks like they're still significantly below professional player strength.

They said they will stop here.

From the paper:

>Humans play StarCraftthrough a screen that displays only part of the map along with a high-level view of the entire map, to e.g. avoid information overload. The agent interacts with the game through a similar camera-like interface

What exactly does that mean? Does it or does it not play by operating purely on image data human players would see on the screen?

How much of the system's interaction with game's interface is learned as opposed to hand-crafred and filtered through APIs?

It's amazing that most people here seem to think that system's ranking in a computer game are more important than its ability to learn from and interact with unstructured data.

Everything about AlphaStar is typed and discrete. It has perfect inputs because it uses an API (and does not read pixel data).

Human limitations that AlphaStar shares:

- Data that requires the camera to see (e.g. enemy location, enemy HP)

- Inability to examine/target cloaked units

Possibly unfair, super-human things AlphaStar has access to:

- Instantaneous awareness of cloaked units

- Knowledge of things humans need to infer/click (e.g. upgrades)

- Global map awareness of unit positions (taking into account fog of war)

Definitely unfair:

- Can select arbitrary collections of units, including outside of camera view

Do you know if it can target a particular unit from a clump of air units? If yes, then there is another "unfair" thing.

Also I wonder how their "camera-like interface" works with tactics like fly a building above units to make them harder to target.

Yeah. I think watching it split a line of Stalkers into two flanks that isn't done with a rectangular selection is crazy unfair.

I think you're saying does it suffer from occlusion during selection. Based on how APIs typically work, I would say no it does not. So yeah that's another thing humans can't do. AlphaStar could hypothetically stack units into a singular mass and it would be discretely untargetable by human players.

It's meant to operate purely on data viewable on the screen to mimic a human player's experience and provide an even playing ground.

Can you cite something that confirms your assertion?

From what I'm reading in the paper, it sounds like there is some custom interface in play:

>AlphaStar can target locations more accurately than humans outside the camera, although less accurately within it because target locations (selected on a 256x256 grid) are treated the same inside and outside the camera.

It's really hard to parse what such statements mean. The fact that someone who is cited as a co-author of the paper approved the interface as "fair" isn't particularly reassuring.

> The fact that someone who is cited as a co-author of the paper

They are not just some random person, they were top SCII players. Who else if not them would know this well enough to make assumptions?

There is a custom interface in the sense that the bot does not read pixels from the screen - it reads the information through the API, same information that is usually presented on a screen. But the amount of information is limited to exactly what a human would see at the same time using the standard SCII interface.

Purely on image data.

From the article, they have a special interface, "developed in consultation with professional StarCraft II players and Blizzard employees". They are a bit vague on the details, but, for example, "Agents can also select sets of units anywhere, which humans can do less flexibly using control groups".

I know this is an endless discussion about fairness, but they handicapped it to have the capability in terms of EAPM of the worlds best player at peak. The result is that it could beat all but the top players. While the mechanics of not using hot keys was strange, I never saw it do anything the elite players couldn’t do.

It is worth watching the match between alpha star and Serral who is current best player. Serral beats it like a walk in the park.


Having a hard time to parse what is the action space here.

The paper claims: AlphaStar’s action space is defined as a set of functions with typed arguments

Looking at citation 7, it seems like they are structuring the action space as (First pick high level action)->(Pick argument 1 for action)->...->(Pick argument n for action). If this is the case, this seems to be "cheating" calling this AI as humans have completely picked out the actions. That is, the achievement here this: given what humans consider useful actions, AlphaStar can play at a grandmaster level.

The achievement here is mostly engineering in my opinion. One that extends far further than the 40ish people list on the paper. Probably an effort of over 1,000 people. From casually looking over the paper, there is nothing significantly different than AlphaZero or previous art. Again, the achievement here is listed under the infrastructure section of the paper.

In summary, this is a great step forward but now we need to start developing techniques to learn these action space hierarchies instead of throwing more power at increasingly difficult games.

It is extremely different from AlphaZero... In fact, they heavily rely on human knowledge, which is like opposite of AlphaZero. To quote the paper, "We found our use of human data to be critical in achieving good performance with reinforcement learning".

Ok, you’re right. I should’ve said AlphaGo. But that in itself shows what I mean that this is almost a step backwards.

AlphaZero was miraculously good, almost to the point of straining credibility. AlphaGo and AlphaStar are more like normal advances. They are mostly engineering, although theoretical contributions are not trivial. (Using reinforcement learning for value network in case of AlphaGo, and multi-agent self-play setup in case of AlphaStar, since straight self-play doesn't work.)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact