
A way to deal with enormous branching factors in strategy games - magoghm
http://togelius.blogspot.com/2016/03/a-way-to-deal-with-enormous-branching.html
======
dripton
Link to the actual paper:
[http://julian.togelius.com/Justesen2016Online.pdf](http://julian.togelius.com/Justesen2016Online.pdf)

I've written an AI for a similar game, where you can't always even iterate
over all your moves for this turn, let alone do a multi-ply tree search.

Using their terminology of Action and Turn, my method was to first evaluate
individual Actions using a simpler evaluation function, and apply a cutoff so
that only a reasonably number of Actions had to evaluated, then combine that
reduced group of Actions together to make a reasonable subset of full Turn
moves, then use the full evaluation function to find the best Turn.

Writing a good evaluation function for a complex game is hard, so I picked a
whole bunch of inputs and then used a genetic algorithm to find weights for
them. (Let the various AIs play entire games, and let the winners breed.)

Works okay, but still can't beat a decent human player very often. (There's
enough randomness in the game that the better player doesn't always win.)

In this paper, their Online Evolution beat the other four computer strategies,
but they don't mention whether it can beat good human players. If it can't,
it's not clear to me whether Online Evolution is a good algorithm. Beating
their other four algorithms doesn't seem to be a very high bar.

------
scrollaway
Small nitpick regarding Hearthstone: Its branching factor is really not that
high. Higher than Chess, but not in the same category as Civilization, which
gives you thousands to tens of thousands of potential inputs every turn.

In Hearthstone, you have a specific amount of options you can play (anywhere
between 0 and 10-ish), and each option may have one or more targets (usually
again between 0-10 ish). This sounds high, but you're able to discard a lot of
those options _much_ more easily than you would in Go/Chess (eg. damaging
yourself is seldom a good idea).

There's often multiple options you can play per turn, but each option has to
be evaluated under the current state of the board since playing a single
option can _radically_ alter the game state. So a much fairer approach is to
look at a branching factor of about 15, with multiple successive "turns" (eg.
each option is a single "turn", and you can play multiple turns before ceding
control to the opponent).

~~~
scriby
I think the branching factor is somewhat higher given that the order of
actions makes a big difference. Also remember that when playing a card you
have to choose where to place it, which can multiply the branching factor by
up to 7.

If both of you have 3 minions on the board, each of your minions has a choice
of 4 targets, which is already up to 4^3=64 choices. Then considering each
order of attacks, and the fact that you can play any card of your hand before
or after attacks, and the branching factor has the potential to explode.

~~~
scrollaway
For nearly all cases (except some you can specify manually and/or discover
through deep learning), you can throw away positioning which is why I didn't
mention it.

As for the order of actions, I did mention that. The basis of my argument is
that you shouldn't consider an entire player's turn as a "turn"; instead,
consider each action as a turn.

------
tschellenbach
I wonder how long it will take till we find a way to accelerate the rate of
learning. If you give a user (experienced with games), a tablet with the FTL
game, chances are they will beat it in 10-30 attempts. AI on the other hand
still takes thousands of playthroughs to get good at a game.

~~~
ivanhoe
Well, to be fair even the youngest gamer has spent years and years as
baby/toddler focused almost solely on learning how to use his/her brain to
reason and make quick strategic decisions. As any parent knows babies know
almost nothing and have to learn it all from the scratch and watching them
figuring out the world and improving their skills is very insightful to anyone
interested in AI. And we often forget how much time and hard work we all
invested as kids to perfect our (basic) abilities...

~~~
Houshalter
A good example is when you throw humans into a game with physics they aren't
familiar with. E.g a 4 dimensional game. Or QWOP. Humans then suddenly don't
seem so amazing without also taking thousands of attempts.

------
Kjeldahl
So you've now moved the computational complexity to the fitness function,
which needs to consider future moves as well?

~~~
oh_sigh
Not necessarily. If there are factors of a board position that make it strong
regardless of future moves, then why can't that information be encoded in the
fitness function without having to play future moves forward?

~~~
Kjeldahl
Sure, but how would you know? Sacrificing officers in chess is a typical
example. Negative expected value short term, but typically long term positive
when done intentionally. But generally, if you can come up with a robust
fitness function, genetic algorithm is one method, although typically there
are better alternatives giving faster results. In my experience, genetic
algorithms can be a great start to get something but typically slow. Which
matters if you need to simulate a few billion games... Interesting article
anyway, thanks.

------
markbnj
Seems to me this is the basic difference between a search and situational
approach to evaluating game position. Backgammon is another great example, and
one that I think has a higher branching factor than even Go does. Minmax only
works when you can construct a tree of possible outcomes that is reasonably
searchable. Many "contests," like backgammon and Go, cannot be represented as
a tree of outcomes so neatly. So you have no choice but to fall back on
algorithms that evaluate the current positions without full knowledge of
possible outcomes, i.e. heuristics, neural nets, etc.

------
magoghm
In a game like Advance Wars a player turn consists of moving all your units,
and very often the order in which you move your units is important, so the
branching factor would be even greater than what they estimate.

------
shoyer
This seems like a less sophisticated variant* of the AlphaGo approach. In both
cases, Monte Carlo Tree Search (MCTS) is constrained by model for likely
future moves. Here, they use a genetic algorithm. In AlphaGo, this is called
the "policy network".

* This is not to say that these authors copied AlphaGo -- my uninformed guess is that this general approach of constraining MCTS has been known for a very long time.

~~~
jebediah
If I understood correctly, they are not using any MCTS, I believe they are not
doing any look ahead, only greedily selecting the best turn.

~~~
togelius
Correct - we are not using MCTS and the approach has nothing to do with
AlphaGo.

------
TTPrograms
Are there theoretical reasons people use genetic algorithms for problems like
this? It seems like you lose any local knowledge that something like annealing
or stochastic gradient descent type algorithms use, but it's not clear to me
that there are many advantages of using it versus some sort of random
sampling. I guess in the case where your merit function is decomposable as,
for example, the product or sum of a number of lower dimensional functions
then it makes sense - but if that's true then I would think there would also
be other approaches that would work even better.That is, if genetic algorithms
work better than random search then your problem has some structure that would
make it amenable to techniques that explicitly take advantage of that
structure, thus performing even better.

~~~
sago
You don't lose local knowledge, no. You're basically doing a bunch of parallel
stochastic hill-climbings that can share information. In fact, with suitable
parameters, a GA can reduce to a stochastic hill climber.

You're right that the problem structure is key. There's a key theorem called
the No Free Lunch Theorem for Search which says that will always be the case.
But on search problems with many related local maxima, evolution will often
outperform a series of stochastic gradient ascents, because the information in
one hill can be applied to another.

Of course, search and knowledge are always two sides of the same coin. The
less knowledge you encode, the more search you'll have to do. But GAs allow
you to encode knowledge too, in their operators, particularly.

There's been quite a bit of work done on the kinds of fitness landscapes that
evolutionary algorithms are better suited for.

[NB: Used 'ascent' terminology to avoid confusion - it's a convention only, of
course, energy minimization or fitness maximization]

~~~
TTPrograms
Thank you, very good points! If you have the time to suggest some references
I'd greatly appreciate it :)

------
ggggtez
Go wasn't "solved" with MCTS. They ignore that there is caching of good moves
(a good move 5 turns from now is probably good now too). In StarCraft, the
difference between moving 36 degrees to the left doesn't make a big
difference, so there are many move combinations which will provide nearly
identical output. Additionally, alpha go did use an evaluation metric to
measure the board value. Evolution of strategies just sounds like generic
algorithms all over again

------
danbolt
On downtime, I've been playing the new Fire Emblem, which has made me really
wonder how the AI works on the simple 3DS hardware. This is a nice
complimentary read.

~~~
cableshaft
At least with the older 3DS Fire Emblem (and to be fair I didn't play it too
much), it didn't seem particularly clever. Mostly just choosing which
character is more vulnerable, moving into range (probably using a standard A*
path algorithm), and attacking with their best option (plus some healing and a
few stage specific things).

Pretty standard A.I. from what I could tell. But my memory might not be that
great, it was awhile ago that I played it.

Is the new one significantly different from that? What stuck out for you as
being particularly clever?

~~~
danbolt
Hmm, I guess the way the enemies would line up for attack stances (being
adjacent to one another to deal extra damage) caught my attention. That's
probably the most notable feature I've seen in the game.

At the same time, you've laid out how the game works quite plainly, and I've
read enough about Pac-Man to know that video games can seem more clever than
they seem!

------
murbard2
You need to to reason abstractly about your moves rather than enumerate
through every possible combinations. If you pick a general course of action
and then implement the details, the branching factor becomes less of an issue.
One way to do that might be to learn a representation of the action space.

------
xxbondsxx
Isn't this similar to A* or alpha-beta search but with a fitness function and
the random-ness of evolutionary algorithms?

I feel like it might apply well to a strategy game but for something like
chess, would not fair as well. Mixing two great move sequences in chess would
probably result in disaster

------
skittlebrau
Very cool to see this (I worked on Hero Academy). When you say the randomness
was removed, does that mean player's hands were a specific fixed shuffle or
something else?

How does it fair against good human players? Would it do much worse with
randomness added back?

------
mooneater
Im not a hater, I like reading about different approaches, it's cool. But I
dont think anything is being learned here? I wish the system would get better
at the game, the more games it plays. From skimming the paper I do not think
that is the case here.

------
mgalka
Interesting way of approaching the problem.

I didn't realize the go solution "just" used monte carlo branching. Still
impressive, but does not seem as revolutionary as it's been portrayed.

~~~
hyperbovine
The insight was to use CNNs to prune the search space. In a sense, the AlphaGo
algorithm is to MCTS as importance sampling is to regular Monte Carlo.
However, even without the search part, AlphaGo crushes MCTS-based Go
algorithms (see [https://www.tastehit.com/blog/google-deepmind-alphago-how-
it...](https://www.tastehit.com/blog/google-deepmind-alphago-how-it-works/)).
That in and of itself is quite extraordinary.

------
jwatte
Another way to deal with large branching spaces is MCMC.

------
onetimePete
If you cant find a optimal approach in the forrest of possibilities, compile a
list of all actions, and start to search from the defeat scenarios?

