> The text was written with enough intuitive explanations to guide the reader but provides the correct notation to the reader to further investigate.
So many explanations about complex or in-depth topics have difficulty finding the sweet spot between being too basic and hand-wavy on one end of the spectrum, and too obtuse and difficult to follow for a layperson on the other end. I've particularly noticed this with articles on quantum mechanics and relativity in the past. It's either complex mathematical equations or trains and flashlights (I'm exaggerating a bit).
When I read an article on such a topic that is accurate, yet understandable, while giving the reader the information and keywords they need in order to learn more, and providing a good on-ramp into the topic, it's a noticeable experience.
Thanks for pointing it out.
That is a good point about mentioning backwards induction in Appendix B, because that's exactly what I used! I'll add a note about it.
This is nothing against this article, of course, which is an excellent write-up. But: if you like 2048, you'll like Threes better, and the creators of Threes deserves to be rewarded for coming up with the design.
It's fair to say 2048 is a shameless rip-off, but that doesn't mean it's not better in ways that matter.
In 2048 you can keep playing far longer and are much more in control. Sometimes you get a number in a bad spot, but some strategy and additional luck can get you out of many of them. The sliding mechanism is also behaves differently which significantly changes the gameplay.
Mario "stole" from Snokie which "stole" from Quest for Tires which "stole" from Jump Bug which "stole" from Wanted: Monty Mole which "stole" from Donkey Kong which "stole" from Space Panic.
The entire history of video games is taking an existing concept and adding small tweaks to create something new. Threes/2048 is no different but for some reason and incredible amount of emotion and vitriol is tied up in this one.
It's like comparing Chess and Checkers. Similar sure 8x8, move one piece at a time, promote on other side of board etc, but they play as very different games.
I had played 2048 first so when I tried Threes, it didn't feel right to me at all. If I had tried Threes first, I probably would have had the opposite opinion.
Some of the rules are slightly different (I don't remember how), but the two games are pretty similar. Other than rule differences, the sounds in Threes were very grating to me. I ended up muting all of the sound effects and music. One huge plus to 2048 (IMHO) is that it was released as open source.
There is a large class of algorithms for finding approximately optimal solutions to MDPs that are model-free or stateless, meaning you don't need to enumerate all of the state-to-state transitions to get a good policy.
If you google 2048 reinforcement learning, you'll find lots of implementations of such algorithms.
I hope to officially publish it later this year, but has already been used by a lot of people, in academia and out.
Feel free to play with it if you like!
In case it helps anyone, I also wrote a much smaller MDP library in ruby to support this 2048 project (and several others): https://github.com/jdleesmiller/finite_mdp . It basically just does value iteration and policy iteration.
I've actually wanted to write a similar article on 2048 for a while, but unfortunately I've never found the time. But it seems now I won't have to, your article is great and much better than anything I could have ever done!
The code on github: https://github.com/aszczepanski/2048
> We can also see it being ‘lazy’ — even when it has two high value tiles lined up to merge, it will continue merging lower value tiles. Particularly within the tight constraints of the 3x3 board, it makes sense that it will take the opportunity to increase the sum of its tiles at no risk of (immediately) losing — if it gets stuck merging smaller tiles, it can always merge the larger ones, which opens up the board.
One flaw in the strategy I noticed was failure to prioritize getting higher values together when board space is scarce. That is, it seems to employ the same strategy regardless of board free space. (Maybe they addressed this, admittedly didn't read all.) Or maybe that is better than my strategy.
Cepheus http://poker.srv.ualberta.ca/ is a Heads Up Limit Texas Hold 'Em strategy (two players, no betting strategy needed beyond whether to fold / call / raise) that is proved to be approximately optimal. There absolutely must be other strategies that are similar, and in particular circumstances one would dominate another, it's just that since they're all optimal if they played random hands over time they'd all break even.
It does surprisingly well considering it has no domain-specific knowledge of the game. (Reaching the 8192 tile)
Also this might be of interest:
Many variants of 2048 came out at the time such as a hexagonal board or Fibonacci tiles. The algorithm is domain independent so it works just as well for these variants. Also, luckily, the original version of the game had its game state and controls exposed in JS and therefore so did most of the variants.
This allowed me to write a bookmarklet version of the solver that could play many of the variants without modifications.
60% of the time it works every time.
> At the risk of anthropomorphizing a large table of states and actions, which is what a policy is, I see here elements of strategies that I use when I play 2048 on the 4x4 board 6
Anyway, nice solution and presentation.
We’ve seen how to represent the game of 2048 as a Markov Decision Process and obtained provably optimal policies for the smaller games on the 2x2 and 3x3 boards and a partial game on the 4x4 board.
The methods used here require us to enumerate all of the states in the model in order to solve it. Using efficient strategies for enumerating the states and efficient strategies for solving the model makes this feasible for models with up to 40 billion states, which was the number for the 4x4 game to 64. The calculations for that model took roughly one week on an OVH HG-120 instance with 32 cores at 3.1GHz and 120GB RAM. The next-largest 4x4 game, played up to the 128 tile, is likely to contain many times that number of states and would require many times the computing power. Calculating a provably optimal policy for the full game to the 2048 tile will likely require different methods.
It is common to find that MDPs are too large to solve in practice, so there are a range of proven techniques for finding approximate solutions. These typically involve storing the value function and/or policy approximately, for example by training a (possibly deep) neural network. They can also be trained on simulation data, rather than requiring enumeration of the full state space, using reinforcement learning methods. The availability of provably optimal policies for smaller games may make 2048 a useful test bed for such methods — that would be an interesting future research topic."