Hacker News new | comments | show | ask | jobs | submit login
Beating the World’s Best at Super Smash Bros. with Deep Reinforcement Learning (arxiv.org)
202 points by willwhitney on Feb 22, 2017 | hide | past | web | favorite | 55 comments

Note: it doesn't learn from pixels but features directly from RAM; and superhuman reaction time, with performance badly degrading when human-like delays added.

Good discussions on Reddit: https://www.reddit.com/r/MachineLearning/comments/5vh4ae/r_a... https://www.reddit.com/r/smashbros/comments/5vin8x/beating_t...

I could see this technology used for the bootstrapping of highly emergent MMO game worlds. It could be used to populate a world with fake "player" NPCs that are actually part of a simulated online ecosystem. Give the NPCs a large enough population, such that players cannot exert significant selection pressure, but give the NPCs real selection pressure through interaction with artificial life evolved with Genetic Algorithms. The rate of evolution of the a-life and the NPCs could be tuned to provide a comfortable rate of change for the human players, and the NPCs would insulate the players from the frustrations GAs might cause.

League of legends for example has bots appear in PvP games. While these bots are not produced by the game's developers not a lot was done then to get rid of these things. I guess they were tolerated since it just make the queue times smaller for human players.

( http://boards.na.leagueoflegends.com/en/c/gameplay-balance/b... )

Well, Riot solved that by getting rid of Dominion. ;) But seriously, they were mainly present in the game mode mentioned, and now in Co-Op vs AI, just because they want to level the account and sell it. Doesn't really affect the majority of the population until selling point, and once it's been sold, it's easy to tell a botted account.

Yeah, I had a game with a sub-30 friend on Dominion. We were the only non-bots that match. I called it out in chat... not a single bot responded.

They all pretty much just walked counter-clockwise and just used autoattacks or tried to capture points.

Bots has come a long way through Guild Wars where they were basically fancy scripts to inducing AI in the start to becoming chatbots and having bots capable to do elite areas with teams "undetected" (knew the person within the guild, slipped up on guildrun using multiple instances of ghosts in ts). There was also pvp (different bots) bots that can/could do alot more than predetermined patterns.

In pve they became so common that the game economy became completely based off them (without the majority's knowledge).

The intelligence of other bots/programs to give a player an unfair advantage of some sort has also come an awfully long way. One example is the leaps the aimbot took in Halo PC, from being easy to detect even when used by top players to being nearly indetectible (except when priority issues or other bugs/glitches appear).

Runescape's economy was largely based on bots and the economy crashed when Jagex finally purged them all.

Or even some kind of "Always online" MMO like chronicles of elyria. Being able to just tell your character to play while you are away without 'scripting' them would be nice.


This reminds me of Starcraft AI experiments. They can't actually make the computer smart, so they just jam 2000 button presses per second down the tube, giving every single unit its own simultaneous AI, and it out micromanages anyone.

With Marines usually.

I heard that the DeepMind Starcraft project intends to limit their AI's APM (actions per minute) down to something human-like.

I read that too but I hope they know the difference between APM and EPM. Pros spam APM that they could never do real actions on but their EPM is considerably lower (if the bots make actions based on pros APM they will have an insurmountable advantage).

Pros spam APM to keep warm, during battles or production macros they will frequently have a high EPM as well.

There's also accuracy.

Unlike a human, a bot will always "click" exactly where it intends to.

That's not what Starcraft AI field is about. They actually started with a combo of people doing planner-oriented systems and micro-oriented systems. Hybrids followed that. There's many methods at play. Here's a survey:


The competitions that involved humans showed humans destroyed them by spotting their patterns and beating those patterns. Also with bluffing or distractions such as having one unit do weird things around their base as the human player built up an army. The bots that beat humans will have to learn to spot bluffs and other weird patterns humans will do to screw with them. On top of all the stuff prior AI did with human-level talent. My money is on humans for DeepMind vs Starcraft although I'm happy to be proven wrong.

I don't think DeepMind will get into a high-profile competition against human pros unless they're fairly certain of winning. So if we see the equivalent of the AlphaGo versus Lee Sedol match being announced for Starcraft, then my money would be on DeepMind.

In starcraft there's a much bigger advantage since humans are inherently "single threaded", and so you can get much bigger discrepancies in APM (or EPM). Smash is more like 1 unit vs 1 unit micro. The precision and timing are still advantages for the AI, but not so much raw parallel compute.

Broodwar bots perform poorly against competent humans though. Micro advantage or not, the strategic decision making isn't there yet.

Or individual muta micro. That was the winning "strategy" in the first BWAI cup many years ago.

I remember this. And even the first bot vs bot competition was won by a clever bot scripter simply denying the gas early on from his opponent. Took all his competition down to a standstill.

Honestly, it still isn't even that good. Best startcraft AI in the world that cheats, still can't beat the low tier pros.

That's an interesting definition of smart that doesn't including being able to manage hundreds of units simultaneously.

I was similarly disappointed when I read this, but upon further reflection I still like this paper. It is very plausible that both of these problems could be fixed, it would just take a lot more time/power to train, and the resulting system would likely not run in real time making it impossible to test against real humans.

Further advancement in this area will require huge leaps in hardware performance. Luckily in the next few years I expect that the pace of improvement in specialized hardware for neural nets will far outpace Moore's Law.

I'm not nearly that pessimistic. Beating SSBM is well within the capability of a well-tuned A3C, and definitely within the capabilities of a group like DeepMind. More neuromorphic hardware is unnecessary and with current RL methods, they are more CPU-bound than GPU-bound (take a look at the NN they use, it's trivially small; most of the computation goes towards running many SSB games in parallel in order to generate any data to do some small updates on the NN).

I believe they've handicapped themselves, actually, with their shortcuts: the performance of agents is crippled by the inability to see projectiles due to the choice to avoid learning from pixels (which I bet would actually be quite fast, as learning from pixels is not the bottleneck in ALE), and likewise the use of the other RAM features is the path of the Dark Side - allowing immediate quick learning through huge dimensionality reduction, seductively simple, yes, yet poison in the end as the agent is unable to learn all the other things it would've learned (such as projectiles). I suspect that this is why their current implementation is unable to learn to play multiple characters: because it can't see which character it is and what play style it should use.

So I would not be surprised at all to hear in a year or two that human-delay-equivalent agent using raw pixels could beat human champs routinely.

The main blocker on using pixels has been getting them from the emulator. I doubt pixels would give a big advantage over RAM features, especially after projectile and stage info is added (there's a PR pending). Captain Falcon (the main character used) doesn't have projectiles anyways.

In fact, RAM features are likely to be much more useful for model-based approaches, which may be important for solving the action-delay problems.

As for multiple characters, the character ID is available to the network. I doubt pixels will be help there either.

"I'm not nearly that pessimistic."

Me either. Bots for fighting games have always been easier to write or even fake for the button-mashers. This proves nothing. It's just fun. Let's see them get top 2-4% skill & kills at Battlefield 4 with shots hardwired to miss 30-50% of time, playing with weakest carbines w/ suppressors, and on weak teams. If these AI's are so amazing, let's see them similarly use good tactics in open-ended battles to win as I do with a brain injury. I'll even give them training data in form of virtual lead. :)

Handling delays (and the uncertainty they entail) is a huge challenge, and I think it'll be a rich area of research. The simplest part of the problem is that delays in action or perception also slow the propagation of reward signals, and credit assignment is still a really hard problem.

Thinking further afield, future models could learn to adapt their expectations to fit the behavior of a particular opponent. This kind of metalearning is pretty much a wide open problem, though a pair of (roughly equivalent) papers in this direction recently came out from DeepMind: https://arxiv.org/abs/1611.05763 and OpenAI: https://arxiv.org/abs/1611.02779 It's going to be really exciting to see how these techniques scale.

Really naive question, can't they just train the net to react instantaneously on a $d$-delayed screen? I don't see conceptually why this approach would succeed with d=0 but fail for (say) d=25ms. (I am too busy/lazy to read the papers and understand what breaks down.)

Basically we tried that and it sort of works, but performance degrades pretty fast with each frame of delay. The issue is likely that it makes credit assignment much harder. Instead of seeing an immediate change in the state (which your critic can interpret as good or bad), you have to wait a bunch of frames during which your previous actions are taking effect and interfering with the reward signal.

we instead use features read from the game’s memory on each frame, consisting of each player’s position, velocity, and action state, along with several other values

So it's cheating, presumably knowing the opponents action before the animation even starts to play.

Smash is played on analog displays precisely so that the lag between RAM and the display can be as small as possible, usually 50 ms. In fact there's a 50 ms delay added to the AI for this reason. However, the AI takes no account of the fact that it takes about 230 ms for a signal to travel from a human's retina through the occipital lobe and motor cortex and activate the motor neurons in the hand. The AI can also generate input sequences that are nearly impossible for a human, such as the "dustless [i.e. perfect] dashdance".

But this is what a top player (who regularly beats both of the players tested in the study) looks like playing against a hand-coded bot:


and this is what the humans eventually learned to do:


Even if you add reaction time, a big part of Smash skill for humans comprises accurately manipulating the analog stick. The computer can just declare any angle it wants; you're not having a fair competition until you build a robot thumb that manipulates a joystick the way humans do, IMO. Otherwise a character like Pikachu can recover perfectly every time.

The bots are given a very reduced action set. The Falcon master bot actually had only cardinal directions on the control stick - later bots were also given the 4 diagonals.

It does not see the opponent's actions before they take effect on screen, and the actual controller states are not part of the feature representation we used (though they actually are somewhere in the RAM).

Part of the skill in competitive play is to be able to predict what move your opponent is going to do next.

Most mid-level players already have a good grasp of prediction, which is arguably along the sames lines of being able to know with certainty what action your opponent is taking a few frames before he does it.

Coupling that with pretty obscene frame-lag for Smash, it's not really that much of an advantage.

As well that competitive isn't really that impressive considering how limited your actions are by banning items and more dynamic stages (see: restricting RNG). In this way, it's nothing more than a simple chess-bot. Now, if it could actually take in complex environments and multiple tools, that'd be pretty next level.

You're wildly swinging between advantageous and not.

No, this is just playing games. The ground rules must be clear: you get the screenshots and keyboard input in every frame, as a normal player. If the resulting AI sucks, who cares? Failure is part of doing science.

I disagree with this. I don't think it makes sense to make the problem space harder and include translating screenshots into a resulting keyboard input when you can look at the memory and figure out the same information. That's an entire problem of its own, and I really don't understand why people keep ignoring that. Or is the issue that with the amount of information being supplied, it's too revealing? In which case, would limiting the amount of data being passed in to just, map, positions of the players, their current moves, and direction of the current move be okay in your book? Or is even that too much information to give.

> You're wildly swinging between advantageous and not.


> The ground rules must be clear: you get the screenshots and keyboard input in every frame, as a normal player.

Perhaps if you want to start from flawed assumptions/ want to create an AI that's tweak-able to appear as human. Which would be pretty useful and practical for other applications, but not competitive play.

We could go on and on about digital vs. analog, but digital is good enough for your argument and doesn't require you to spend enormous resources on a trivial pursuit.

This going in the direction of nonsensical handicaps. You don't give AlphaGo stamina parameters that artificially slow down processing speed. You give it all the tools it needs to beat a human player.

> You give it all the tools it needs to beat a human player.

Okay, why not allow an NPC to just mess with the human player's actions then (blocking or delaying button clicks, for instance)? Surely, that falls into "all the tools", no?

IMO, the way you went about things isn't particularly compelling—your human opponents don't have white-box access to game internals, and if they did, guess what? They'd play better too.

So I agree with the GP: this is just playing games.

> Okay, why not allow an NPC to just mess with the human player's actions then (blocking or delaying button clicks, for instance)? Surely, that falls into "all the tools", no?

Are you talking about physically delaying their inputs? As in from the controller to the main board? This would fall under the same category as a player hitting the controller out of his opponent's hand -- foul play.

> IMO, the way you went about things isn't particularly compelling—your human opponents don't have white-box access to game internals, and if they did, guess what? They'd play better too.

I'm not sure what exactly you're referring to here, but I'll respond to how I think you're trying to take this.

Source code wise: Yes. If the players had access to the source code the learning curve would be significantly shortened. Though, in due time, most would have figured out the mechanics fully, or within a short deviation, in closed source. A part of competitive play is this exact aspect. Players experimenting, sharing, and building up their understanding of the game. If the source was freely available to explore, most players would stick to the "show" part of the process, i.e working reflexes and learning combat -- what most elite players focus on (since they've mastered the science of the game already).

Video of the AI here, playing as the black captain falcon: https://www.youtube.com/watch?v=dXJUlqBsZtE

We all know that Mew2King is first reinforcement learning AI capable of beating Super Smash Bros pro players.


and he still can't beat Armada

I am possibly being here the person who accidentally takes the joke literally, but Mew2King has in fact beaten Armada on three occasions: Once at SKTAR 3, once at Smash Summit 2, and most recently at UGC Smash Open.

haha i know he has, but M2K isn't performing the way he used to and the record is like 12-3 in favor of Armada, if I'm not mistaken

While the AI might be cheating by taking salient features from RAM rather than from pixel values, this is still an incredible feat. Just a few years ago we did not have generic algorithms that could take even salient features and self-learn policies to near this level this quickly.

Yup, it's definitely an advantage to get all the correct values from the game state. But not as much as you might think; the vision portion of a DQN or similar trains quite quickly.

Plus, our bot doesn't have any clue about projectiles. We don't know where they live in memory, so the network doesn't get to know about them at all.

Can I ask what the feature set looked like? I always kind of wanted to do this with the Skullgirls AI, but never had the time while we were developing it. As a developer, I obviously had full access to the game state, but I'm still not really sure what the best way to represent that state to a neural network is.

It was just basic stuff like player positions, velocities, and animation states.

Getting them from RAM instead of the screen doesn't give you an advantage on (for example) DI or ledge teching?

As someone who's played for quite a while I can tell you SSBM is one of the most complex games I've ever come across.

Why do you think the game is complex? Fairly simple game with low barrier to entry which is great when you invite guests over for games. Super Simple Button Mash!

Likely due to the advanced, non-intuitive mechanics that have been discovered over the years. The entry barrier may be low, but the skill cap is high.

I'm impressed it beat the likes of S2J and Zhu. I wonder how it'd fare against the Five Gods?

What's the key insight here compared to previous systems?. As far as I can tell, still no one can beat simple non-deterministic games that require some planning.

My favorite example is Ms. Pac Man because it seems so old and simplistic. Been tried by a dozen teams and no one can beat a decent human.

Civ AI has denounced this research

I was expecting a video.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact