
Why Can a Machine Beat Mario but Not Pokemon? - myinnerbanjo
https://medium.com/@shayaan.jagtap/why-can-a-machine-beat-mario-but-not-pokemon-ff61313187e1
======
CM30
I think the battling issue may be a hard one for AI to deal with alone, though
not so much in terms of the main game. Remember, the main story is meant to be
beatable by people who don't want to think very much and who don't really care
about training their team beyond 'reach a certain level'. Hence in the main
story, 'grind until you significantly outlevel the next opponent' would
probably be an optimum strategy for an AI.

But in PvP battles? Yeah, that would be interesting to train an AI on. In that
case, everyone is the same level, so grinding doesn't help. Players actually
use strategies beyond 'attack with whatever is roughly super effectice against
their opponent's current Pokemon', and the number of choices available is in
the tens/hundreds of thousands rather than merely what's available at any
point in the main game.

Teaching an AI to work in that environment would certainly be a tricky
challenge. I mean, at least chess and go had 'symmetrical' teams, meaning that
any strategy could be done in any game. Pokemon? You have no idea what your
opponent has on their team, and if you're unlucky, that could basically hard
counter your own strategy without much you can do about it. Heck, even certain
Pokemon are hard to predict on their own, with stuff like Charizard, Mewtwo or
Necrozma having at least two possible form change options at any time.

It'd be interesting to see someone try and make an AI to do that, to fight
against players in the Nintendo World Championships or on Smogon or what not.

~~~
nbeleski
Given the results obtained by the OpenAI in Dota[1] (with asymmetrical teams
nonetheless) I am pretty confident RL could be used to train a pretty
efficient pokemon pvp agent. From my experiences the nuances and
mindgames/predictions in a pokemon battle are much simpler than those in a
high level chess/go game.

I would say the model isn't as straightforward as the Mario or Sonic AI
players, but is still achievable. Actually, I wish I had more time because
this is definetly a project I would like to tackle.

[1] [https://blog.openai.com/openai-five/](https://blog.openai.com/openai-
five/)

~~~
dragontamer
DOTA is a bad example.

Poker is a better example, because Nash-Equalibrium estimating algorithms have
begun to perform better than humans in the past year or two.

Pokemon, like Poker, is a game of bluffing and partial information. I expect
Pokemon's optimal strategy to be the same mix of fold (aka: switch your
Pokemon out to a defensive Pokemon... eating an attack but minimizing the
opponent's damage to your team), and bluff (stay in, maybe use a move that
exactly counters your opponent's choice. Ex: An unrevealed Choice Scarf Draco
Meteor, surprising the opponent that your pokemon is faster than the opponent
expected).

~~~
Bartweiss
The poker analogy seems like the right one to use, although Pokemon is made
messier by the level of variance. (Meaning both "semi-random effects" and also
"far more than 52 possibilities for mon and moves".) I'd imagine the
completely-hidden playstyles would be incredibly hard for an AI to learn, but
the popular Showdown style that has team preview might be workable. The poker
analogy seems like a good one, at least for studying the sorts of things an
agent would need to do.

There's definitely a recognizable 'tempo' to pokemon, where A picks a move
that threatens B, B switches to something that can take it and threaten back,
then A in turn switches to take the hit and threaten back. Which, much like
just accurately betting your hand strength in poker, is enough to beat a lot
of amateurs. The metaphor goes from there - though I might use 'raise' for
leaving a threatened pokemon exposed, which lets us differentiate a strong
hand ("I'll use a coverage move with higher speed") from a bluff ("I can hit
his switch if I call it.") As an example, opening Koko v Landorus. The fold is
switching Koko to Skarmory, the honest raise is HP Ice, and the bluff is
Thunderbolt.

The basic ebb and flow of the game seems like it's that and one more layer -
double switches and attempts to predict them. Above that, there's just not
enough probability mass left to benefit from trying to triple switch, counter-
counter-switch, and so on.

Of course, it's all made vastly more complicated by trying to trap, set
hazards or status, and make space for setup moves. I'm not sure what it would
take to get an unsupervised learner to value e.g. Rocks appropriately. My
experience has been that neural nets struggle badly on assessing that sort of
long term state change, though of course I'm not working at OpenAI or DeepMind
levels.

~~~
dragontamer
> The metaphor goes from there - though I might use 'raise' for leaving a
> threatened pokemon exposed, which lets us differentiate a strong hand ("I'll
> use a coverage move with higher speed") from a bluff ("I can hit his switch
> if I call it.") As an example, opening Koko v Landorus. The fold is
> switching Koko to Skarmory, the honest raise is HP Ice, and the bluff is
> Thunderbolt.

I'd argue that the raise is U-Turn :-). Which instant-wins any switching
contest (ex: U-Turn on the switch, leaving the option to switch into Magnezone
to trap the Skarmory, or if Lando stays in you can switch to your dedicated
Lando counter... not that Lando really has a solid counter mind-you, but you
get the idea.).

The U-Turn war however, between Lando and Koko demonstrates the bluffing game
once again. Koko staying in and doing something weird like Calm Mind, or even
Reflect/Light Screen would be absurd, but it would definitely beat the Lando
U-Turn in most cases.

~~~
Bartweiss
> _Koko staying in and doing something_

Heh, good example. I keep running into defog Koko, I think precisely for this
reason. In raw number terms it's not a great use of a Koko or a moveslot, but
Koko forces so many U-Turns or outright switches that it's a strong way to
gain momentum. And if Lan-T just switched _out_ to avoid HP Ice, the check
might not be ground, opening the door to Volt Switch away for even more
momentum. Taking a time-biding move for specific switches is a pretty great
example of this back-and-forth pattern.

(Although - I'm not sure Lan can/does U-Turn on Koko? If it's scarfed it can
lead with Earthquake for a kill, if it isn't it'll drop to HP Ice before the
turn.)

~~~
dragontamer
It really depends on what I'm predicting. U-Turn on Lando wins a surprising
number of options:

* Beats Koko Volt-Switch: Lando is immune, so Koko fails to switchout.

* Beats the Koko Uturn: Lando is slower, as the 2nd U-Turner you capture the switching momentum.

* Beats the Koko Thunderbolt: Its prediction-on-top-of-predictions going on here, but this happens sometimes.

* Beats the Koko Hard-Switch: Hey, maybe they thought your Lando was scarf'd so they hard switch out.

\--------

* Loses to HP-ice: This is the "obvious move" for Koko to do, and will happen more often than not. But as you go up the ranks, people start going for 2nd tier or 3rd tier mind-games, and you see fewer and fewer "obvious moves", especially in the early game where momentum is such a big deal.

It really depends where you are on the ladder: how stupid or aggressive you
think your opponent is and all that.

------
phire
The complexity of Pokemon is much lower than the author implies.

It's not really an open world game. There are lots of choke points where your
branching factor is limited drops down to two, continue forward to the next
trainer or go back and regroup.

There is no issue regrouping in Pokemon, you always end up stronger. There is
also very little punishment for pushing fowards and failing, you lose half
your money and maybe some time/frustration getting back to the same point,
which is meaningless for an ai.

~~~
chongli
All of the details you're referring to are way above the level of any machine
that plays Mario. For SMB1 the branching factor is uncommonly low. Pokemon has
an extremely high branching factor by comparison. This being measured at the
level of controller inputs and memory content outputs.

SMB1 can be beaten by applying random controller inputs, measuring the
player's x value, and seeing what works. Yes, you do need to look a few
seconds into the future on some levels but you never need to look ahead
further than what's in the current level.

Pokemon has no simple, short term, monotonic measure of progress to work
against, akin to Mario's x position. That's it, really.

~~~
chongli
To be fair to the machine, a human wouldn't even be able to tell what game
they're playing, let alone be able to finish it, if all he had to go on was
the raw memory values in response to controller inputs. The fact that Mario
can be beaten at all given such an incredible dearth of information is
impressive.

RPGs and action adventure games like Pokemon or Zelda are too laden with
human-specific cultural details: heroes, villains, a world, exploration,
battle, progress.

Think of Zelda for example. The wooden sword is not even necessary until the
final encounter with Gannon. This isn't even immediately obvious to a human
player: he knows that the hero needs a sword to defeat the villain. These are
all cultural details a human already knows before beginning to play the game.
The machine, on the other hand, knows the sword only as a bit in memory that
causes a different branch to be taken in response to pressing the A button; a
branch that returns to the main loop very quickly without any other obvious
indication of progress. Yes, if used at the right time with the right position
and direction, thrusting Link's sword causes enemies' health values to
decrease. But since there are multiple ways to kill every enemy in the game
(except Gannon) the sword is a luxury until the end.

~~~
Retric
I once watched my 8 year old sister play an RPG in a language she did not
understand. She had only the vaguest idea what was going on but still made
progress. Just randomly trying things looking for novelty is enough to
progress through such games given sufficient time.

~~~
chongli
The RPG may have been in another language, but it presumably still depicted
human or humanoid characters, objects, enemies, etc.

Your sister knew far more about life (and consequently the game) than what a
computer would know. Just having a basic concept of reality, that the world is
made up of objects and agents, that some things can be interacted with, cause
and effect, etc. is way beyond what a computer has (which is basically
nothing).

To a computer playing a game, "randomly trying things" has only one meaning:
random control inputs. As mentioned earlier, random control inputs will not
get you very far in any open world game. A computer playing Zelda would be
lucky to reach the bottom right corner of the map (by maximizing x and y
positions) using only random inputs and reinforcement learning. Actually
completing the game is so far beyond that it's ridiculous.

~~~
Retric
This was simple menu driven early Final Fantasy style game on rails.

So, some basic path finding and press X near stuff would be useful. But, it
did not need to react to enemies, or have complex environments like Zelda
games do. On the other hand, the meaning of all those random text blurbs was
rather critical.

~~~
chongli
Without being told, a machine is not even going to know the difference between
a menu and the game world. Pathfinding? You need to have a goal in order to do
that. How does a machine set high level goals without having any high level
concepts of the game itself? It's all just a bunch of memory to the machine.
There's no meaning attached to any of it.

~~~
Retric
Menus where extremely limited. Repeating (Hello) > (Hello) > (Hello) is little
different than just walking into a wall constantly.

Item Shops would IMO be the largest issue, but I did not see anything that
looked like one.

------
lacker
I think this article's key conclusion is incorrect, and that is _is_ likely
possible for modern AI techniques to beat Pokemon.

The main reason a machine has not already beaten Pokemon is that it is a
nontrivial amount of work to connect a standard AI algorithm to a new video
game, and nobody has crossed that hurdle for Pokemon. Mario is one of the most
popular video games of all time, and so it is one of the few games that people
have connected AI to.

If there was an AI that was connected to a Pokemon game, I am fairly confident
the AI would be able to beat Pokemon. The article discusses the problem of
having an unclear goal metric. That doesn't seem like a very hard problem to
me - you can start with something like, win as many battles as possible. You
might beat the game just randomly after you become powerful enough, or you
might need some more tweaks to the metrics, but it doesn't seem like that
should be a showstopper.

A lot of people are pointing out that Pokemon is fundamentally harder than
games like poker or go. That is true, but the bar in this article is _beating
the game_. For poker or go, AI is now better than any human. That is a much
higher bar, that is not even relevant for the single-player Pokemon game.

------
iMart1n
Related Talk "Solving Pokemon Blue With a Single, Huge Regular Expression":
[https://www.youtube.com/watch?v=Q2g9d29UIzk](https://www.youtube.com/watch?v=Q2g9d29UIzk)

------
em0ney
I'm glad that I'm not the only one who doesn't understand pokemon

~~~
Aissen
If you're following video games, Pokémon isn't really anything special. It's
an RPG. It's doesn't have any innovative gameplay. What was new at release, is
the way it combines things which already existed at the time.

For example, in Final Fantasy V (which came out 3 years before pokémon), a
character could already catch monsters and have them battle against others.
Many games already had a collecting element. The exchange/trading might be new
though, as well as selling the same game twice, with only a few differences
between the two (to encourage exchanges… and boost sales).

~~~
majortennis
It's special to me, the tv show and the trading cards assisted with that. I
rekindled my love of the franchise getting into speed running a few years
back. 10yo me would spend weeks slow grind progressing. 22yo me trying to
speedrun it in sub 2 hours.

~~~
Aissen
I'm not denying that it's special to many people, but this is true of many
things in pop culture. I was answering the GG-parent with regards to
"understanding" Pokemon. I'm arguing that it's easy to understand as a video
game. As a pop culture phenomenon, the question doesn't really make sense,
calling for a tautological answer.

------
mrob
Branching factor alone doesn't tell you much. The important thing is how many
branches are meaningful. Consider the board game Arimaa, which has an enormous
branching factor[0]. It was designed to be difficult for computers, but in
2015, David Wu's "Sharp" software beat some of the best humans players[1].
This didn't need any revolutionary new AI techniques, only some clever human-
written heuristics used in combination with classical computer chess
techniques to prune the game tree. There are many possible moves each turn,
but most can be discarded as obviously bad. The same could be true of Pokemon.

[0] [http://arimaa.com/arimaa/](http://arimaa.com/arimaa/) [1]
[http://icosahedral.net/downloads/djwu2015arimaa_color.pdf](http://icosahedral.net/downloads/djwu2015arimaa_color.pdf)

------
gepoch
This isn't exactly a case where a machine beats a game, but if you want to get
an idea of how such a machine would need to think, and if you want to see
someone apply a lot of interesting algorithms to brew a sequence of steps to
speedrun a game with much more complexity than Pokemon, check out Artjoms
Iškovs's series where he comes up with a speedy approach to become the head of
all factions in Morrowind:

[https://kimonote.com/@mildbyte/travelling-murderer-
problem-p...](https://kimonote.com/@mildbyte/travelling-murderer-problem-
planning-a-morrowind-all-faction-speedrun-with-simulated-annealing-
part-1-41079/)

Absolutely awesome.

------
somuchtyler
back to "mari/o", i was wondering why is it that a human brain can learn to
play mario without having to die a billion times? Why is it that humans don't
need a large training dataset? Is there a way to design a neural network in
such a way where there is very little training to get a pretty good model?﻿

~~~
dragontamer
Because artificial neural networks work nothing like human neural networks,
despite the same names.

Its a false equivalence. Its no more valid a comparison than asking why "CLOS
Networks" can't play Mario by themselves, despite also being a network. (A
CLOS Network is one kind of network topology for switches)

The human brain does NOT take the derivative of the error function and glide
down it using gradient descent. There are no matrix multiplication circuits in
the brain. Things work in completely different ways that biologists and
psychologists barely understand today. It has to do with chemicals,
neurotransmitters and other such, which are completely alien to Comp. Sci.

------
ziaddotcom
A* over a Bayesian Network.
[https://en.wikipedia.org/wiki/A*_search_algorithm](https://en.wikipedia.org/wiki/A*_search_algorithm)

[https://en.wikipedia.org/wiki/Bayesian_network](https://en.wikipedia.org/wiki/Bayesian_network)

~~~
krapht
What would your admissable heuristic be?

~~~
pas
Discover as much of the map as possible? Talk to as many NPCs as possible? Try
to get as many different dialogues from NPCs as possible? (I assume that to
get to the you won dialogue means you encounter more dialogues than if you
cleverly lose.)

~~~
ziaddotcom
With the bayesian mesh you could arbitrarily apply heuristics over what
otherwise looks pretty similar to a simple coordinate map.

If you happen to die near a certain node you could add an ajacent node of
"scary" or whatever. Reverse idea for nodes where you get free food and
pokemon.

~~~
MereInterest
Do you have a link to any papers about bayesian meshes? This sounds like a
really neat technique, but my google-fu is failing me.

------
skocznymroczny
Buuut I thought we already have bots that beat Civilization by reading the
instruction manual?

------
deathanatos
While Pokémon might have a higher branching factor than Mario, I don't think
that means a "machine can't beat it". First, I'm going to use Gen I/II
(Red/Blue/Yellow // Gold/Silver/Crystal) here as "Pokémon", the game, as I am
most familiar with them; I am not particular aware if the formula has changed
in more recent generations.

> _Pokemon is an open world game_

… not really. While you can walk around, sure, the actual game is mostly
linear; the order in which you explore and visit towns is mostly
predetermined. (It has to be, as the Pokémon and trainers you encounter become
more and more powerful, so have to progress with that.) Most of the "branches"
that occur while walking around any given part of the map will all coalesce on
one of the entrances/exits to that part of the map. (I.e., either you leave
the town, or you visit one of the buildings in the town, or you talk to
someone in the town. Mostly, that's it; my point here is that one can simplify
all the "standing at coordinate X, coordinate X+1, ... etc. greatly; those
positions are essentially equivalent.)

As for a goal, I would just say "AND them together, then". Or just do the
Elite Four; the credits scroll when you beat them, which I think is a pretty
clear indication of "win". Catching them all is more akin to completing all
the achievements in the game. (And requires running multiple coordinated
games, as, for example, the starting 3 are only available once, at the
beginning of the game. In order to "catch them all", you need two games where
the player 1. doesn't evolve their starter and 2. trades it to you. Since
usually the starter is the core of someone's team in a normal human game, I
think one would normally run a small game. Or find someone you trust and only
briefly trade the Pokémon, then trade it back, which counts as far as the
Pokédex cares. Eevee, a Pokémon that can evolve 3 different ways in Gen I,
represents a similar problem: you only get one, and have to choose. (I think
Gen II's breeding system might work around some or most of this issue.))

While Super Mario is perhaps comparatively easy, I would offer up NetHack. It
is "open-world" in much the same sense as Pokémon: you have an explorable
area, but one that is still mostly linear. There are several "sub-modes" to
solve, like the article notes about Pokémon: in Pokémon, you need to explore,
train, capture new Pokémon, battle; in NetHack, you also need to explore,
battle, manage items, solve Sokoban puzzles, etc.

Pokémon is fairly hard to "lose"; losing a battle just returns you to the
nearest Pokémon center w/ half your money gone. (That might be more of an
issue in Gen I, where, IIRC, money is finite until you beat the Elite Four; in
Gen II, as soon as you have a trainer's phone number, I think money is
technically infinite. Regardless, simply training a few Pokémon to Lvl 100
should be sufficient.) NetHack, however, is _very_ easy to lose; death is
permanent, and requires restarting from scratch.

And NetHack has been won by a machine:
[https://www.reddit.com/r/nethack/comments/2tluxv/yaap_fullau...](https://www.reddit.com/r/nethack/comments/2tluxv/yaap_fullauto_bot_ascension_bothack/)

