1. Each player has a set of tiles that each have a particular color and a particular number.
2. The tiles a player has are known only to other players (they are placed to face away from the player)
3. On your turn, you may either place one of your tiles, or give a hint to another player about their tiles.
4. Hints can only take the form of "you have exactly K tiles of C color <indicate the K tiles>" or "you have exactly K tiles of N number <indicate the K tiles>". There are some rules for how many hints can be given and ways to "earn back" more hints to give.
4b. You can make more deductions than what the hints provide. There are a fixed number of each tile. e.g. There are 2 RED-4 tiles. So if you can see that two other players both have a RED-4 tile, this means that you cannot have a RED-4 tile. So if someone gives you the hint that you have a *-4 tile, you know it is not RED. But the two players with the RED-4 tiles cannot see their own, so they cannot make this deduction. But if you act in a way that shows that you have made the deduction, and they are really smart, they can observe your act and deduce that you would have only made that act if you knew that you didn't have a RED-4 tile. And since they can only see 1 other RED-4 tile, they can deduce that they must hold the other.
5. No other communication is allowed.
6. In the center of the table there are stacks for each color that start empty. The goal of the game is to populate the stacks by playing the color tiles of that stack in increasing number. (e.g. RED-1, RED-2, RED-3 ...). If a player tries to play a tile that is invalid (e.g. the RED stack is only up to RED-2 and the player selects their RED-4 tile), a counter is incremented.
7. The game ends when the counter reaches a threshold or all tiles are played. The score is equal to the number of tiles played. Higher is better.
I have long thought that it should be possible to write an AI that does superhuman hint-giving and deduction, provided that all of the AIs can collude beforehand. That is, if all of the AIs have a giant lookup table that essentially says "If the board is in this particular state, and I give hint H, that actually means FOO" where FOO contains more bits of information than hint H would by itself.
For the 5 player variant specifically, the concept is as follows:
Out of the 8 moves each other player can make (play any of 4 cards, discard any of 4 cards), you figure out the best move, add them up mod 8, and make one of 8 hints (you have a color or number hint available for each player). Then each of the 4 other players deduces which move you wanted them to make by subtracting out everyone else's best move.
There's some room for improvement on figuring out how to avoid one player mistaking another's best play (because it might be predicated on them playing a card they don't know they have first), or how to hint in ways which avoid that.
A friend has put up an implementation up on github: https://github.com/chikinn/hanabi/blob/master/players/hat_pl...
(Side note: in practice, it's almost impossible to avoid the illegal side-channel communication of watching the other players get nervous as you start to do something stupid, so really the only rules-faithful way is to play over a computer.)
(Another side note: there's a game called The Mind which is the board-game version of SleepSport -- the goal is to, without most kinds of communication, play cards from each players hand in increasing order. One way to win is to synchronize everyone's internal clock and play each card when that many seconds have passed.)
The WTFWThat bot achieves the rather remarkable average score of 24.89 and 91.5% perfect games in 5-player self-play (that is, all 5 seats are WTFWThat bots).
In the game Bridge, which involves communication between competing pairs of players, you can communicate a suit and a number, but it is meant to have some kind of meaning within the game, so I think strategies like this would be illegal.
And it can go a level deeper, where the card you played blind is a 5 (or some other card incompatible with actually making the hinted card playable). Since they see that you thought the card wasn't playable, and then it didn't become possibly playable after your play, they now know that the card isn't playable and can be safely discarded. This isn't actually much better than just direct hinting, though, so the straightforward play is higher value. Both have advantages over straight hinting, though.
When playing with friends / casuals I've found that this is the one that trips people up (if you want to play strictly by the rules) - no, you can't ask if there are any reds left. You can't ask what hint Alice gave to Bob on her previous turn. You can't just say "Only one yellow 4 left, let's be careful" etc.
This sounds like reading the rules too stringently and not by their intent if you ask me, since it is asking only about information that the person is supposed to know.
It would be like saying if someone asks what the rules of the game are you aren't allowed to answer because none of the rules explicitly state that you can convey information about the rules of the game.
That leaks the players information state. Proper Hanabi is played with AI, who can't feel how boring the game is when played correctly.
The best to teach casuals, I have found, is to play a round where you actually invert it have the experience player announce all of their reasoning (to demonstrate to the casuals the type of deductions that can be made).
For those who read my previous comment, an example of pre-game communication collusion that the AI in the paper invented was to decide that any hint involving red or yellow _also_ means that your most recently acquired tile is immediately playable.
"Roughly 40% of the information is obtained through conventions rather than through the grounded information and card counting"
(what I have been describing as "collusion" the paper describes as "conventions" but it amounts to the same thing -- you can pass a lot more information than the hints imply if you can plan ahead)
That is super fascinating.
I guess we should just accept that in the society of the future robots will play board/computers games and draw paintings while humans will all be slavering on integration of endless badly written web services.
It would be incredibly difficult to attempt to solve the general problem of having an AI making decisions in the world, including taking into account the goals and mind-states of other humans (and general agents).
Much more practical to restrict the problem space very narrowly - can you build something which can learn to strategize and predict future outcomes in a difficult, full-information competitive game? (Go) Okay, can it do something similar without complete information, and with less discrete-looking steps and game states (1v1 Dota) - can it do so whilst cooperating with other similar agents with asymmetric abilities but common knowledge to achieve a common goal? (5v5 Dota)
Okay, now can it cooperate with other agents with symmetric abilities but asymmetric knowledge, and both convey necessary knowledge to other agents via its actions, as well as deduce knowledge from the actions of other agents? (Hanobi)
These are all necessary steps on the path to creating a general AI - if your AI is not at least able to do the above, it definitely won't be able to navigate a general environment. In the absence of an obvious alternative strategy, it seems to me the most productive approach is to tackle the small, important-but-doable seeming problems, one at a time, until hopefully you are able to unify them or else have a breakthrough of some sort.
Furthermore, their investment into TPU's is going to bring the price down in the long run, which is incredibly beneficial to everyone. And yes we should assume if AGI is possible it will require a lot of compute.
>They are using games to develop algorithms as they are easily defined problems
I believe they use games because it makes good PR. I believed that before they chose StarCraft, but now I am more convinced in it.
Also, because their algorithms are extremely data-hungry and it's easy to generate tons of training data for a game simply by running it. And because it is possible to evaluate performance of an algorithm over and over again.
>they have in turn used those algorithms on real world problems, notably AlphaFold
AlphaFold does not use their board game AI. Naming it similarly is just a marketing tactic.
AlphaFold might be a genuinely useful application of their resources, in which case they deserve praise in credit for it. I have to do more research on it, which is made very hard by all the hyperbole and hype around their work.
>Furthermore, their investment into TPU's is going to bring the price down in the long run
Overall, they do the exact opposite of democratizing AI. They invest heavily into insanely data-hungry and hardware-hungry approaches that a normal company will not be able to use (directly, not through Google's cloud services) for decades if ever.
> And yes we should assume if AGI is possible it will require a lot of compute.
What you're saying here (overall) does not make sense. There is absolutely no reason to assume that breaking records in games without substantial improvements in AI theory makes us any closer to "AGI".
Ignoring the completely unsubstantiated statements about AGI, we are left with the question of practical applications. I am not convinced that the societal benefits of investing in complex and computationally intensive machine learning is higher than investing in improvements of simpler algorithms. Most practical applications of AI outside of a few ubercorps involve simpler, older algorithms. Even someone as big as SAP relies mostly on decision trees and SVMs.
If anything, the hype around deep learning likely means it's harder to get funding for old-school AI research that is far likely to benefit smaller companies.
 http://predictioncenter.org/casp13/doc/CASP13_Abstracts.pdf Page 11, A7D.
Damn, i wish general AI never acquires personnal taste !
I'd be pretty impressed if they beat my 5p strategy without Hanabi-specific stuff, but I assume that's not their goal.
Good luck to DeepMind/Brain either way!
Authors: would love to chat sometime if you're reading this! If I get back to Hanabi botting, I might try to use a learned policy to replace some of the random (very non-optimized) heuristics my bot had.
> The main difference in Hanabi is that there is no “cheap-talk” communication channel: any signalling must be done through the actions played. It is therefore more similar in spirit to learning how to relay information to partners in the bidding round of bridge.
Jakob, why did you choose Hanabi instead of bridge itself? Bridge is obviously much more widely known and seems to present similar challenges.
Does it much matter that the topic being studied is one you have experience with and a passion for? Perhaps more than we might think.
(I'm aware that uncontested bridge bidding is a thing, but it's still harder than 2p Hanabi)
One part of the game's description that immediately jumped out at me was the need to pass information efficiently to other players, without directly giving them the world state. This appears to be a generally similar problem to one described in a recent openai paper: https://blog.openai.com/interpretable-machine-learning-throu...
Any suggestions for implementing complex Hanabi game rules and logic are appreciated! My gut feel is that an omniscient agent would converge on an optimal policy in all current game states.
But that's just one major hole in what the authors describe as "superhuman" AI demos in games like chess and go. The other is that those games don't look at all like most conventional digital games people enjoy playing.
Go looks like matrix math. A neural program has a chance of learning an internal simulation of the game. There's no chance of that for DOTA 2 or StarCraft II, which is why those wins by reinforcement learning AI bots were so impressive.
However, those games (DOTA 2 and SC2) suffer from big rewards for superhuman dexterity. Just a few seconds of excess dexterity (not strictly superhuman dexterity) can tip the game to an absolute win for an AI bot. So most conventional performance diagnostics, like winrates or Elo scores, greatly understate the role of dexterity in those games.
Hanabi seems like a great reaction to where the games AI research is going. Hanabi will be easier to simulate than a game like Hearthstone, so there will be accelerated implementations or neural programs of it. However, the authors criticize something that might be vital to Hanabi research:
> "However, it is impractical for human Hanabi players to expect others to strictly abide by a preordained policy."
That seems unfair. If you're going to research an idiosyncratically cooperative game, the most competitive strategies are going to involve preordained (meaning coordinated prior to a match) policies. Why turn your back on the path towards strategy development that obviously leads to the best play?
The biggest problem with Hanabi is that, as a cooperative game, we have no real idea what great strategies for it are. I know the authors claim that they do. But for DOTA 2 and StarCraft 2, PvP games with immense popularity, we have a credible natural laboratory for strategy and skill development. Hanabi is really obscure by comparison.
The biggest problems with Hearthstone, which in my opinion is one of the best games to research, are that (1) the rules are way too complex for a neural program to learn them, so you need novel approaches for learning, (2) the rules are too complex to execute on a GPU, so you need novel approaches to parallelized computing and optimization, and (3) the deckbuilding meta and some pay-to-win design have stunted strategy development, so outperforming a human opponent may be way too easy.
There are so many problems to solve, and it's not that credible that neural networks are the path forward!
I didn't realize Hanabi is already that close to being solved.
My gut reaction is that the game is a lot simpler than it appears. I guess your simpler "matrix" game points to that--you already had an intuition for reducing Hanabi. Indeed, looking at the code you share for the "matrix game," it would seem that Hanabi's problem is that, like Chess and Go, it doesn't really resemble more sophisticated games as much as it resembles something that can be literally expressed in Tensorflow.
That doesn't diminish work on playing games. If they had released a chess or go challenge where the board is just bigger or more pieces, that would be dumb. But this challenge is a game that is a tiny bit closer to real "problems that people care about." Solving this won't get us to problems that people care about either, but it'll get us closer. It's only an incremental step, but that's okay.
Sometimes I think that the progress in games seems kind of orthogonal to progress in using machine learning to solve real world problems, because anytime you have a game it automatically gets you essentially infinite labeled training data set (each game has a score/outcome, and there are essentially infinite possible games). So as long as the compute scales up enough, any game humans can play will be solvable.
I totally agree about the ability to just skirt sample complexity. It's a tough one, made tougher by how early stage this work really is. We want bots to be able to match human ability and match human learning. Though they're put together, they're have very separate concerns.
For matching human ability, we're just beginning to learn techniques to get bots able to master hard tasks (e.g. incomplete information games, atari games, picking objects up). Those bots mostly learn waaaaaay slower than people. But never mastering is worse than slowly mastering, so it's early days.
On the other hand, you have people working on efficient learning. This is the question you're getting at with compute scaling arbitrarily-ish. It's more impressive if it can master a game after only playing it a small number of times. People are definitely working on this too, but for even simpler tasks. There's a lot of work right now in contextual bandits on learning fast, and that's a kind of baby-RL task. Even there, simulation tasks are super important because you really need a counterfactual to say whether you're doing well compared to alternatives.
(Which isn't quite practical yet but might be in the mid-term future).
I would say that Poker with 3-9 players and heterogeneous player quality also contains similar aspects. You can improve the play when you recognize the skill difference, how people play against other players, changes in impulse control, magical thinking etc.
You get information of how people play just by observing their play completely legally.
DeepMind and OpenAI are indirectly in competition with each other.
The Gym interface is okay but the implementation is poor and does not readily support multiple agents.