
The Mathematics of 2048: Optimal Play with Markov Decision Processes - jonbaer
http://jdlm.info/articles/2018/03/18/markov-decision-process-2048.html?href=
======
hervature
As someone who studies reinforcement learning (or at least attempts to), this
article was expertly written and illustrated. The graphics were phenomenal to
explain how MDP's function. The text was written with enough intuitive
explanations to guide the reader but provides the correct notation to the
reader to further investigate. If I had any complaint, it would be that the
author didn't mention backwards induction or dynamic programming when
describing the solution method in Appendix B. Still, I think this was the best
intro/case study of standalone MDP's I have ever read. Great work!

~~~
Cyphase
I haven't read the article yet, but I'm definitely going to; what a great
review! This part is the kicker:

> The text was written with enough intuitive explanations to guide the reader
> but provides the correct notation to the reader to further investigate.

So many explanations about complex or in-depth topics have difficulty finding
the sweet spot between being too basic and hand-wavy on one end of the
spectrum, and too obtuse and difficult to follow for a layperson on the other
end. I've particularly noticed this with articles on quantum mechanics and
relativity in the past. It's either complex mathematical equations or trains
and flashlights (I'm exaggerating a bit).

When I read an article on such a topic that is accurate, yet understandable,
while giving the reader the information and keywords they need in order to
learn more, and providing a good on-ramp into the topic, it's a noticeable
experience.

~~~
quadyeast
abstruse not obtuse /nit

~~~
Cyphase
Ugh, I always forget; I don't have a lot of occasions to use it, so it's hard
to break the habit. I guess I'm just obtuse.

Thanks for pointing it out.

------
OskarS
I feel compelled to write every time 2048 comes up: 2048 is a shameless rip-
off of another game called "Threes" which came out just before it. "Threes" is
more or less a game-design masterpiece, far superior in every way (graphics,
sound, design, gameplay, you name it) to 2048. The reason 2048 is more popular
is that it was a free clone of Threes!, and people refuse to pay even a little
bit of money on mobile platforms for games.

This is nothing against this article, of course, which is an excellent write-
up. But: if you like 2048, you'll like Threes better, and the creators of
Threes deserves to be rewarded for coming up with the design.

~~~
k3d
For folks who are interested in the what went into designing it, the devs
published their emails over the course of development. It's a pretty neat
glimpse into how a game's design evolves, as well as the effort it takes to
come up with a clean, "simple" mechanic:
[http://asherv.com/threes/threemails/](http://asherv.com/threes/threemails/)

~~~
thomasahle
It's interesting he says that only 6 people in the world have ever seen a 6144
tile. Automatic controllers can reach it pretty often,
[https://arxiv.org/pdf/1606.07374.pdf](https://arxiv.org/pdf/1606.07374.pdf)
says 7.83% of the tries.

------
alexbeloi
As the author points out in the conclusion, the state space blows up very
quickly as the grid becomes larger.

There is a large class of algorithms for finding approximately optimal
solutions to MDPs[1] that are model-free or stateless, meaning you don't need
to enumerate all of the state-to-state transitions to get a good policy.

If you google 2048 reinforcement learning[0], you'll find lots of
implementations of such algorithms.

[0]
[https://www.google.com/search?q=2048+reinforcement+learning](https://www.google.com/search?q=2048+reinforcement+learning)

[1]
[https://en.wikipedia.org/wiki/Markov_decision_process#Algori...](https://en.wikipedia.org/wiki/Markov_decision_process#Algorithms)

~~~
indigochill
What a coincidence. I was just reading a blog about concrete vs abstract
interpretations of chess (and how to deal with its massive state space through
abstract representation) this morning:
[http://www.msreverseengineering.com/blog/2018/2/26/concrete-...](http://www.msreverseengineering.com/blog/2018/2/26/concrete-
and-abstract-interpretation-explained-through-chess)

------
svalorzen
Shameless self plug: if you're interested in Markov Decision Processes and
related algorithms , I am the author and maintainer of a C++ and Python
library that implements lots of planning and reinforcement learning methods
for them.

I hope to officially publish it later this year, but has already been used by
a lot of people, in academia and out.

Feel free to play with it if you like!

[https://github.com/Svalorzen/AI-Toolbox](https://github.com/Svalorzen/AI-
Toolbox)

~~~
jdleesmiller
(Author here.) That looks like a great library --- I will have to try it out!

In case it helps anyone, I also wrote a much smaller MDP library in ruby to
support this 2048 project (and several others):
[https://github.com/jdleesmiller/finite_mdp](https://github.com/jdleesmiller/finite_mdp)
. It basically just does value iteration and policy iteration.

~~~
svalorzen
Thank you too for putting the code for your article out there. There is always
a shortage of public code to study this field, and every bit helps.

I've actually wanted to write a similar article on 2048 for a while, but
unfortunately I've never found the time. But it seems now I won't have to,
your article is great and much better than anything I could have ever done!

------
thomasahle
If anyone is interested in the "state of the art" of solving 2048, I believe
[https://arxiv.org/pdf/1604.05085.pdf](https://arxiv.org/pdf/1604.05085.pdf)
is currently the best, reaching the 32 768-tile in 70% of games.

The code on github:
[https://github.com/aszczepanski/2048](https://github.com/aszczepanski/2048)

------
ballenf
Sorry guys, but I won the 3x3 to 1024 on the first attempt, so the next 99 of
you will only see losses. ;)

> We can also see it being ‘lazy’ — even when it has two high value tiles
> lined up to merge, it will continue merging lower value tiles. Particularly
> within the tight constraints of the 3x3 board, it makes sense that it will
> take the opportunity to increase the sum of its tiles at no risk of
> (immediately) losing — if it gets stuck merging smaller tiles, it can always
> merge the larger ones, which opens up the board.

One flaw in the strategy I noticed was failure to prioritize getting higher
values together when board space is scarce. That is, it seems to employ the
same strategy regardless of board free space. (Maybe they addressed this,
admittedly didn't read all.) Or maybe that is better than my strategy.

~~~
tpolzer
It's better than your strategy.

~~~
htgb
Care to elaborate?

~~~
empath75
That’s what optimal means.

~~~
jdleesmiller
Strictly speaking, all we can say is that the optimal strategy used in my post
is _no worse_ than ballenf's. It's possible that ballenf's strategy is also
optimal --- i.e. that they are equally good. :)

~~~
tialaramex
Yes! Optimal strategies for playing a game too complicated to reason out
completely in your head, or which has some random element often aren't unique.

Cepheus [http://poker.srv.ualberta.ca/](http://poker.srv.ualberta.ca/) is a
Heads Up Limit Texas Hold 'Em strategy (two players, no betting strategy
needed beyond whether to fold / call / raise) that is proved to be
approximately optimal. There absolutely must be other strategies that are
similar, and in particular circumstances one would dominate another, it's just
that since they're all optimal if they played random hands over time they'd
all break even.

------
air7
Back in the heydays of 2048, I wrote a solver based on monte-carlo search[0]:
Given a board state, continue playing in memory using random-moves. Do this
multiple times. Choose the best fairing initial move.

It does surprisingly well considering it has no domain-specific knowledge of
the game. (Reaching the 8192 tile)

Also this might be of interest: Many variants of 2048 came out at the time
such as a hexagonal board or Fibonacci tiles. The algorithm is domain
independent so it works just as well for these variants. Also, luckily, the
original version of the game had its game state and controls exposed in JS and
therefore so did most of the variants. This allowed me to write a bookmarklet
version of the solver that could play many of the variants without
modifications.[1]

[0][https://stackoverflow.com/questions/22342854/what-is-the-
opt...](https://stackoverflow.com/questions/22342854/what-is-the-optimal-
algorithm-for-the-game-2048/23853848#23853848)
[1][https://ronzil.github.io/2048AI-
AllClones/](https://ronzil.github.io/2048AI-AllClones/)

------
memco
This was a nice article. I've enjoyed running this 2048-ai[0] every now and
then. It's fun to turn on the browser extension and watch it go. It's come
quite a long way from the earlier versions.

[0][https://github.com/nneonneo/2048-ai](https://github.com/nneonneo/2048-ai)

------
vitorafsr
To reach 2048 the game works by adding the numbers 2 or 4 at each play with
probabilities 0.9 and 0.1, respectively. Then the expected average optimal
number of steps to win is 2048/(.1 * 4 + .9 * 2) ~= 930.9. This optimal number
of steps occur when 2048 is reached and is alone in the board. In the real
case, the larger the board, the more scattered are going to be the numbers,
and the number of step will increase accordingly. This result is valid for
every board size.

------
nmeofthestate
"it takes at least 938.8 moves on average to win"

60% of the time it works every time.

------
c3534l
This task seems particular well suited for Markov models because it actually
is a Markov process underneath. Also, I don't know why, but I was particularly
tickled by the sentence:

> At the risk of anthropomorphizing a large table of states and actions, which
> is what a policy is, I see here elements of strategies that I use when I
> play 2048 on the 4x4 board 6

------
mizzao
For all the in-depth analysis of MDPs, a very good heuristic for playing 2048
is to just alternate swipes toward a corner (e.g. down and to the right), with
an occasional movement in the other direction if it's stuck. How close does
this get to the optimal policy?

------
xaedes
Took me a while to realize playing "optimal" means in this case playing
optimal according to their chosen reward scheme. Not optimal in the sense of
optimal strategy to play 2048 at all which the title can suggest.

Anyway, nice solution and presentation.

~~~
xaedes
"Conclusion

We’ve seen how to represent the game of 2048 as a Markov Decision Process and
obtained provably optimal policies for the smaller games on the 2x2 and 3x3
boards and a partial game on the 4x4 board.

The methods used here require us to enumerate all of the states in the model
in order to solve it. Using efficient strategies for enumerating the states
and efficient strategies for solving the model makes this feasible for models
with up to 40 billion states, which was the number for the 4x4 game to 64. The
calculations for that model took roughly one week on an OVH HG-120 instance
with 32 cores at 3.1GHz and 120GB RAM. The next-largest 4x4 game, played up to
the 128 tile, is likely to contain many times that number of states and would
require many times the computing power. Calculating a provably optimal policy
for the full game to the 2048 tile will likely require different methods.

It is common to find that MDPs are too large to solve in practice, so there
are a range of proven techniques for finding approximate solutions. These
typically involve storing the value function and/or policy approximately, for
example by training a (possibly deep) neural network. They can also be trained
on simulation data, rather than requiring enumeration of the full state space,
using reinforcement learning methods. The availability of provably optimal
policies for smaller games may make 2048 a useful test bed for such methods —
that would be an interesting future research topic."

