Hacker News new | past | comments | ask | show | jobs | submit | jeffbradberry's comments login

That is not accurate -- the Andromeda Galaxy is over 3 degrees wide. About 6 full moons.

And getting bigger, as it is moving toward us on a collision course.

Thanks for the correction!

This seems like a cool concept, but it's broken for me.

Firefox 113.0.2 shows the orbits and background, but no particles as far as I can tell.

Chromium 120.0.6099.129 shows just a black screen plus the widgets, nothing else.

Both on amd64 Debian Linux.

Update: bumping FF to 121.0 did not seem to help.


There is a fixed probability of matching all numbers but the value of doing so varies, so sometimes the EV can be positive.


This should be the page, but it requires a Flash plugin (so doesn't work for me): https://rifters.com/blindsight/vampires.htm



Wow! Thank you for finding that.



Yes!!! Sweet. Thank you.



It looks like revealjs.


Excellent. I did want to eventually write something showing how to modify this algorithm to handle games with randomness, and ISMCTS at a glance looks like it does just that. Thanks!


Actually, you don't need information sets to handle games with chance nodes. Standard MCTS works just fine for those. (Don't use a bandit algorithm at the chance nodes, just pick randomly.) Information sets are only needed if you have uncertainty, either because players make simultaneous moves or a player doesn't know (or forgets) the complete state of the game.


I've updated the diagrams with a set of node values that (hopefully) make sense.


Actually, I think it should be 3/8, since it's supposed to be the opposing player's win count.

These diagrams were supposed to just be an update of the diagrams from the MCTS Wikipedia article, adding the correct backpropagation values, but I see there are more problems than that.


This is exactly the problem I've written about - AFAIK basic UCT algorithm does not model your opponent in the selection phase (only in simulation when some heavier than random logic is applied).


Since the win counts are only incremented for nodes "moved into" by the winning player, UCB does correctly model the opponent's behavior here.


In your example script yes, but basic UCT does not do that - simply because UCT was not meant only for multi-player games in the beginning. And this is some assumption we make about our opponents (actually that they want to maximize their payout, not to e.g. win or minimize our score). Of course this is a very straightforward "application" of UCT or MCTS to the multiplayer games that was done in works of Sturtevant and Cazenave in 2008.

But it is not so easy to know about this, e.g. this is a very recent change to wikipedia page on the topic https://en.wikipedia.org/w/index.php?title=Monte_Carlo_tree_... and very often people writing on the topic are not pointing this out at all, which I find very strange and misleading.

Also in the classic MCTS you should select move which has most visits, not the one with the highest percentage of wins.


In what appears to be the originating paper for UCT ("Bandit based Monte-Carlo Planning", L. Kocsis and C. Szepesvári, 2006),

* The choice of action to be returned is the one "with the highest average observed long-term reward"

* For simplicity, the payout value used in the paper is 1=win and 0=loss, which will result in the agents maximizing their wins. Presumably one could choose other payout values (i.e. points for that player in games that have that concept) to adjust the priority of the agents. The mathematics does not seem to forbid it.

* This paper uses as their first experiment a multi-player game. They state, "...for P-games UCT is modified to a negamax-style: In MIN nodes the negative of estimated action-values is used in the action selection procedures." It is straightforward and more generalizable to games with N > 2 players to simply record values from that player's standpoint in the first place, instead of manipulating it after the fact in this manner.

I hope this clarifies some things.


From the "An analysis of UCT in multi-player games", Nathan Sturtevant, 2008: "Multi-player UCT is nearly identical to regular UCT. At the highest level of the algorithm, the tree is repeatedly sampled until it is time to make an action. The sampling process is illustrated in Figure 2. The only difference between this code and a two-player implementation is that in line 5 the average score for player p is used instead of a single average payoff for the state."

I think this is kind of a clear statement that original paper (and after it a lot of writing on the topic) may be lacking. Of course people used this simple generalization before and it is pretty straightforward, but it is not that obvious at a first glance. And I've seen quite a lot of code examples, images explaining UCT for games and articles that were just not saying a word on this. Or even worse - just doing it wrong for multiplayer games.

Choice of action is a different topic, as I remember correctly there was also a paper proving that win rate and most robust branch are in the end performing the same ;)

Hope you will continue this series, because it is really good and code examples are really nice!


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: