Hacker News new | past | comments | ask | show | jobs | submit login
The Mathematics of 2048: Optimal Play with Markov Decision Processes (2018) (jdlm.info)
209 points by duck on Aug 26, 2021 | hide | past | favorite | 50 comments



I’m the original creator of 2048. I’m always so humbled when I see that it’s still something people talk about, especially (as is the case with this article) in such an advanced way as to completely go over my head!


No exaggeration, your game has saved many a sanity, wasted hours, and has been enjoyed by millions.

You may die but your game will forever be burned into the corneas of the common.

Good shit!


I'm probably responsible for a cumulatively significant loss in worldwide productivity!


And a comparable reduction in stress as well


GDP isn't the metric of happiness and wellbeing :)


Good. Too much Productivity causes cancer. :)


A mainstay of commuters worldwide.


At this exact moment you have 2048 HN Karma.


Quick we need 3 downvotes to restore balance


2052 karma right now. I don't have the heart to downvote them back into 2048.


4096, here we come!


Let me upvote


Hey! I logged back into HN just to say thanks.

2048 was a great distraction as I rode the Boston subway to a job I truly hated.

Distractions like that helped preserve my sanity until I found a better job.


I really love hearing that it's had a positive impact in your life, that's heartwarming :)


I'm the author of this article. :) Many thanks for making this game, which I still play more than I should, even after I spent all this time trying to get computers to solve it.

If you are ever in London (possibly when travel is more normal again), I would happily buy you a beverage of your choice!


Then let me say thanks! There was a programming competition in university where we made bots to play 2048 for (with a few months time), that played against each other in a live tournament (who got the most points mostly). That was the most fun competition I ever participated in!!

The bot was playing way better than me personally and regularly reached the 4096 tile. It was a lot of fun to just watch it play and really awesome to make it play better :)


I recently discovered that my car has 2048 on it courtesy of Android Auto and https://gamesnacks.com/embed/games/2048_v4 . I was pleased to waste a few minutes while waiting for someone in a parking lot.


My girlfriend still plays 2048 everyday. The game is still a gem. Thanks for making it :)


2048 truly is the epitome of mobile games for me. Simple, translates well to a touch screen and most importantly no ads or micro transactions. My one gripe is I can't compete with my wife's high score.


Would that then make you the same GC who frequented FP WAYWO threads when those forums were still around? If so, nice to bump into you again :)


Yup! Good old times. That's actually where 2048 started.


hi gab, hope you're well


Your game has wasted so much of my time...

Kidding. This is the only mobile game I ever play. I had to remove it from my phone and keep it iPad only.


Are you working on anything interesting now?


Here are some references for Markov decision theory, stochastic optimal control, stochastic dynamic programming:

Dimitri P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models.

George L. Nemhauser, Dynamic Programming.

Stuart E. Dreyfus and Averill M. Law, The Art and Theory of Dynamic Programming.

E. B. Dynkin and A. A. Yushkevich, Controlled Markov Processes.

Wendell H. Fleming and Raymond W. Rishel, Deterministic and Stochastic Optimal Control.

There is more from R. T. Rockafellar and R. J.-B. Wets.

And of course we should mention R. Bellman.

IIRC, Dreyfus was a Bellman student, and Dynkin's dissertation advisors were Gelfand and Kolmogorov.


(Author of the article here.)

Thanks, I can vouch for the Bertsekas book. I'd also recommend:

Russell, Stuart, and Peter Norvig. "Artificial intelligence: a modern approach." (2002).

Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018. (As another commenter just pointed out.) They have a great example with a recycling robot.


If you buy "Artificial intelligence: a modern approach," I suggest coughing up the extra money for the latest version. It's several times more expensive than older copies but they really updated the content.


I'd add, R. T. Rockafellar has an interesting technique called scenario aggregation.


I might have mentioned that for your example, we can have good confidence in the Markov assumption -- we just say we need to trust the random number generator.

In real world practice, i.e., where we don't get to drive the stochastic part of the problem with our own random number generator, the Markov assumption (past and future conditionally independent given the present) can be, commonly is, difficult to justify.

But, then in just the theory, every stochastic process is Markov -- just let the state be the history -- I intend this as a joke, but actually, technically in the math, it is true.

If we DO make the state the history and have a problem in continuous time, then, at least in theory, we are conditioning on uncountably infinitely many random variables, and just defining how to do that is a bit interesting.

Then for our decisions. policy, we can encounter the problem of measurable selection, also a bit interesting and surprising.

I.e., dynamic programming in continuous time gets us into measure theory, the Radon-Nikodym theorem, etc.

Stochastic dynamic programming, when find a good application and can build a good solution, i.e., keep the computational demands within the super computer range, can seem really smart, of course is not prescient but first cut, intuitively, can seem to be. Your solution likely also seems prescient.

More applications would be nice.

Here is a potential, potentially large, collection of new applications: Develop a spreadsheet, for an example, for, say, budgeting for the next 5 years. To have a simple description, have one column for each month, for a initial column and then one more for the end of each of the 60 months, and have one row for each variable. Some of the spreadsheet cells might have random quantities (have them all independent, to keep the Markov assumption easy to justify), and some of the cells might be empty and available for decisions.

Now, presto, bingo, whether we knew or intended it or not, we now have formulated a problem in discrete time stochastic dynamic programming, and since in practice problem formulation is a common obstacle, we have maybe made some progress.

In principle at this point the software for a solution is well defined. And since long ago people wrote general purpose stochastic dynamic programming software, the software we would need is in a sense actually simple (if we don't care about execution time).

Now just feed this problem into a suitably large computer and get the resulting decisions and the expected earnings, we wanted to maximize, at the end of the 5 years.

If make the software more complicated, then can put in some ideas to make the computations faster by some factors of 10.


Thanks. I noticed Bellman's book in a local bookstore; what do you think of it as an intro?


An intro?

Nemhauser and Dreyfus and Law are intended as intros and are quite well written. Bellman, who likely should be named the father of dynamic programming, is the oldest book on my list and is not intended as an intro. But if can get Bellman for less than $10, maybe for the price of a Big Mac, then eat a lighter lunch and get the book.

Bellman's book likely covers the Hamilton–Jacobi–Bellman equation, not so easy to understand via the other intros.

Not really a biggie but worth knowing, a big part of this subject in computation is the value function, and at least Bertsekas used neural nets as a means to have an efficient approximation. So more recent treatments may include this idea.

Dynamic programing is one of the leading examples of "the curse of dimensionality".

The field has not been still; e.g., multivariate splines have been an idea for handling the value function in computation.

Of course can have software go through the motions of stochastic dynamic programming when the Markov assumption is not satisfied; the software won't object. But when the assumption does old, and the rest of the effort has good quality, can argue that the work provides an optimal solution, the best possible, of course, in expected value, for any means of handling the data. In short, we have what is called "the principal of optimality".

The continuous time case, with all the math details filled in, is more difficult; so for an intro, stay with discrete time.

There has been a lot of attention to this subject in the ORFE (Operations Research and Financial Engineering) department at Princeton. Long the chair was E. Çinlar. The best course I took in school was from a Çinlar student.

Of course, if expect artificial intelligence to be really smart, then as a special case it should be able to be as good as stochastic optimal control. And apparently now some of reinforcement learning is using the core idea of stochastic dynamic programming. But without the Markov assumption, might count that application as a heuristic with no more than weak claims about optimality.

I spent a lot of time in the field. Eventually I concluded that generally in the US economy a lot of knowledge of stochastic optimal control and a dime wouldn't cover a 10 cent cup of coffee. Would have a lot better chance buying a house and supporting a family getting paid to develop Web sites, e.g., to sell, say, used math books, including Bellman's. Maybe there have been and are some niche applications (maybe somewhere in US national security), but that niche is likely much sharper than any razor.

There is something of an organizational problem for a math guy getting hired to apply stochastic optimal control: The person hiring you, your hiring manager, will likely know much less about the math than the math guy, likely know nothing about the math. So this manager will be very reluctant to allocate much of his (her) budget and risk his job to support work he doesn't understand and, that, indeed, might justify the C-level suits promoting the math guy over his manager.

Due to such issues, the math guy with an application potentially valuable in the economy could be better off doing a corresponding startup. Else, to avoid scaring hiring managers, he might be better off omitting such math from his resume. So, for getting hired as an employee, a lot of math background on a resume can be from useless down to, say, a felony conviction.

A recipe for rabbit stew starts out "First catch a rabbit.". A recipe for applied math might start, "First find an application." Can't argue that some topic in math will be useless forever outside of math and in the economy, but forever is a long time. As it is, it can appear that some math papers and books, including what looks like applied math, are written before seeing any rabbits.

Yet, the best of pure/applied math is in some senses super terrific stuff, way up there with the best of Bach and Beethoven, etc., and at times there are good applications. E.g., there is some pure and applied math at the core of my startup; the pure math provides some special support, a version of optimality, for the whole effort. Just what that pure math says is terrific, gives a solid guarantee where intuitively we have only confusion, is so good it's tough to believe, but still it's true. The applied math is original with me, powerful for my startup but no biggie as research.

Good luck.


Thank you, I picked it up today. There's a chapter on the calculus of variations saying it has a new viewpoint to offer (with the equation you mentioned, iirc). Going to be a challenge with my background.

It does sound like a startup suits you best. Best of luck with it!


Calculus of variations is, yes, a part of optimization, now regarded as an old part. But subsequent to Bellman's book was the Pontryagin maximum principle. There is an entry at Wikipedia:

https://en.wikipedia.org/wiki/Pontryagin%27s_maximum_princip...

There is some treatment in

David G. Luenberger, Optimization by Vector Space Methods.

which I call fun and profit from the Hahn-Banach theorem.

There may be more in

L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mischenko, The Mathematical Theory of Optimal Processes.

and in

E. B. Lee and L. Markus, Foundations of Optimal Control Theory.

and in

Michael Athans and Peter L. Falb, Optimal Control: An Introduction to the Theory and Its Applications.

At one time I was interested in such math as a way to say how best to climb, cruise, and descend an airplane. So, I chatted with Athans in his MIT office and got a copy of his class notes. And I got accepted to grad school at the Brown University Division of Applied Mathematics where Falb was. Then I applied a little skeptical judgment and didn't go back to Athans and went elsewhere for my Ph.D.

I also visited Brown and Cornell. At Cornell I met with Nemhauser, and he gave me three words of advice that was the real direction for my Ph.D. dissertation. Flying back on the airplane, I figured out what Nemhauser told me.

When I got back, I called a meeting to outline what I'd discovered from Nemhauser, Athans, etc. My manager ordered me not to hold the meeting, but I did anyway. All the C-level people and the founder, COB, CEO showed up. I got a promotion and big office next to the founder. What I presented was the start of two Ph.D. dissertations.

The Fleming reference may be enough. Uh, while I visited Brown, I had lunch with Fleming -- a very bright mathematician.

A good source for prerequisites is the first (real) half of the W. Rudin, Real and Complex Analysis.

Broadly that optimization -- deterministic optimal control -- works with curves where each curve is regarded as one point, vector, in some vector space. The space is short on good assumptions so may be only a Banach space. Actually can still do a lot in Banach space, e.g., Luenberger's book -- one of my favorites. Gee, there can also get a really short treatment of Kalman filtering!

The fraction of mathematicians who know all that math well is tiny. The people who make good money from applications has to be smaller -- I would believe so small there are none. From all I can see, good applications are so rare there almost aren't any.

With the rapidly growing power of computing and the rapidly growing complexity of computer applications, discrete time stochastic optimal control (that can attack problems that first cut intuitively seem hopelessly difficult yet be comparatively simple mathematically) has a chance of valuable applications, so far not a very good chance, but a chance, probability small but still greater than zero.

That spreadsheet connection I outlined may be a seed for some applications.

Microbiology research gets some motivation from applications, saving the lives of people dying of cancer, heart disease, new viruses, etc. Maybe research in mechanical engineering works on how to put up buildings and bridges that won't fall down. Some years ago I concluded that math is being throttled by too much isolation from important, motivating applications. So, if my startup works, that will be n = 1 cases of evidence that math can do well, at least make money, with some new applications outside of math.


Where is Sutton and Barto?


I’d be much more interested in an analysis of Threes!


Three is an ever-green game. It requires more strategy and planning than 2048. I still play it, over five years since I bought it. Easily worth the $6 to purchase.


If either of you (or others who like Threes) have an iOS device, check out SPL-T. It’s similarly engaging, and completely deterministic. Which means you have no one to blame but yourself for losing. Someone who’s really into math might be able to solve it and play forever.

https://simogo.com/work/spl-t/


It really is. I’ve played for years. I came so close to beating it last year, I needed a break after that.


I've wanted to do this for Threes for years! Alas, I'm not aware of any API that makes it easy to do this programmatically.


There is a web version at http://play.threesgame.com/ -- I wonder if you could hook into this? It seems a little buggy, though.


It's not unlike a Rubiks Cube!


Figuring a truly global optimal strategy for each potential game state is a nice way to teach about MDS, but somehow less interesting to me than figuring out a strategy that's optimal within some constraint - like a small model size or being describable to humans.


I'm always happy to see dynamic programming's cooler nephew MDP show up at parties. After a few drinks, he pulls out the POMDP dance and goes home with everyone's spouse / funding.


So it's not clear from the paper -- what is the optimal strategy for 2x2, 3x3, 4x4@64?


This is an understudied area, for me. Yes we can come up with mathematically optimal ways of solving a problem, but it's much harder to come up with the mathematically optimal way of teaching a _human_ to solve that problem. Obviously there's a lot of work in 'explainability' in machine learning models but that doesn't really get there.

I often find myself going back to Eurisko as a fascinating example of the road not taken (or perhaps, the road taken to its logical end and abandoned) in machine learning. You had this system that instead of learning opaque weights in a network, learned heuristics that were fairly explainable. In some ways that's a much more useful companion than a model that's strictly superior in its decisions. I find this regularly in chess - you can prep with Stockfish and you know you're probably memorising the best line, and you can follow the sidelines to find very concretely _why_ that's the case, but it still feels like an inefficient way of learning. It always feels like a machine that yielded up fewer, more abstract principles would be helping me more. I recently thought about this for solving Rubik's cubes. I don't really want the fastest way to solve it, ideally I'd have a way that simultaneously optimises the number of heuristics (if you have such and such a colour here and here, do this, if this face is all white, do that etc) and the number of total actions. That seems like a tractable problem for a computer to brute force for you even without smarts.

I face this in my work constantly: you can come up with the most amazing model based on the most cutting edge architecture, and you can trust its predictions above any others. But then you've got to explain to a football coach why _they_ should trust it, and once that trust is built, distil it into few enough rules that they can train a team to reflect its expertise.

Sorry for the ramble!


(Author of the article here.) I agree that it's somewhat unsatisfying that the output of all this is an enormous table of numbers rather than some nice simple rules.

I think it would be interesting to try some of the methods for approximately solving MDPs on this game to see how complicated the machine learning model that approximates the value function (or state-action function) has to be to get reasonable accuracy. Maybe some simple rules would emerge...


What has consistently worked for me is to mostly use 3 directions, forcing all of the large tiles to the bottom row, and crowding the space above the largest number(s) so going in the fourth direction keeps those large numbers along the bottom.

All the way to 1024 is autopilot by just keeping small numbers out of the bottom row. Then, the thinking begins---plan each move to keep the 32s, 64s, 128s on the same edge of the board until you make an inevitable mistake or have bad luck working in the remaining 2x4 or 2x3 section. This last section is where your analysis algorithm would really add value. Plus, using just three directions until it's time to condense large blocks should reduce your computation a lot.

Do you think regular use of all four directions is necessary to get more consistently to higher values like 16k or 32k? (Actually, is 32k possible?) I feel like once you have most of the blocks on board to get to 8k except for perhaps one 256, you can't really have a random 2 or 4 in the corner or it's Game Over in 10 moves or fewer. Moving in the fourth direction makes the 'corner small number' a guarantee when the board is filled like this.

Playing all large numbers on the bottom also means you can calculate an optimal play (using the smaller numbers) in small (2x3, 2x4, 3x3) regions as long as the result comes out at the same grid location consistently---near the smallest large number.

Then, optimization of the large numbers becomes a separate problem. Do you put everything sequential in the bottom row?

4096-2048-512-128

Or do you put high values at the ends?

4096-128-512-2048

Or low values at the ends? (This is not the answer. Getting a medium value from one end to the other---consistently---without messing everything else up is not probable and probably not possible.)

Or should the high values make a wedge along two sides?

512

4096-2048-128

In summary, I don't think trying to solve the full state space makes sense. This game feels like two separate isolatable optimization problems---small and large numbers---plus a little bit of glue when the isolation breaks down: starting a game, fixing a stuck situation, or collapsing a series of large numbers.

Have you thought about using your worst cases to find rules to make Evil 2048 even more evil? Or see how an MDP works against Evil 2048 to get the best outcome in worst case scenarios?

Oh-- Great article and analysis, by the way.


It would be encoded in a lookup-table that is populated by an iterating algorithm.

The size of this lookup-table corresponds to the state space, hence it can't be too big.


As the article itself notes, previous thread: https://news.ycombinator.com/item?id=16790338


Thanks! Macroexpanded:

The Mathematics of 2048: Optimal Play with Markov Decision Processes - https://news.ycombinator.com/item?id=16790338 - April 2018 (43 comments)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: