
The Hanabi Challenge: A New Frontier for AI Research - iron0013
https://arxiv.org/abs/1902.00506
======
gamegoblin
Context for those that don't know the game:

1\. Each player has a set of tiles that each have a particular color and a
particular number.

2\. The tiles a player has are known only to _other_ players (they are placed
to face away from the player)

3\. On your turn, you may either place one of your tiles, or give a hint to
another player about their tiles.

4\. Hints can only take the form of "you have exactly K tiles of C color
<indicate the K tiles>" or "you have exactly K tiles of N number <indicate the
K tiles>". There are some rules for how many hints can be given and ways to
"earn back" more hints to give.

4b. You can make more deductions than what the hints provide. There are a
fixed number of each tile. e.g. There are 2 RED-4 tiles. So if you can see
that two other players both have a RED-4 tile, this means that you cannot have
a RED-4 tile. So if someone gives you the hint that you have a *-4 tile, you
know it is not RED. But the two players with the RED-4 tiles cannot see their
own, so they cannot make this deduction. But if you act in a way that shows
that you have made the deduction, and they are really smart, they can observe
your act and deduce that you would have only made that act if you knew that
you didn't have a RED-4 tile. And since they can only see 1 other RED-4 tile,
they can deduce that they must hold the other.

5\. No other communication is allowed.

6\. In the center of the table there are stacks for each color that start
empty. The goal of the game is to populate the stacks by playing the color
tiles of that stack in increasing number. (e.g. RED-1, RED-2, RED-3 ...). If a
player tries to play a tile that is invalid (e.g. the RED stack is only up to
RED-2 and the player selects their RED-4 tile), a counter is incremented.

7\. The game ends when the counter reaches a threshold or all tiles are
played. The score is equal to the number of tiles played. Higher is better.

I have long thought that it should be possible to write an AI that does
superhuman hint-giving and deduction, provided that all of the AIs can collude
beforehand. That is, if all of the AIs have a giant lookup table that
essentially says "If the board is in this particular state, and I give hint H,
that actually means FOO" where FOO contains more bits of information than hint
H would by itself.

~~~
thebzax
There's a paper from just a couple years ago about turning hanabi (with >2
players) into a variant of the 100-prisoner hat color problem:
[http://helios.mi.parisdescartes.fr/~bouzy/publications/bouzy...](http://helios.mi.parisdescartes.fr/~bouzy/publications/bouzy-
hanabi-2017.pdf)

For the 5 player variant specifically, the concept is as follows: Out of the 8
moves each other player can make (play any of 4 cards, discard any of 4
cards), you figure out the best move, add them up mod 8, and make one of 8
hints (you have a color or number hint available for each player). Then each
of the 4 other players deduces which move you wanted them to make by
subtracting out everyone else's best move. There's some room for improvement
on figuring out how to avoid one player mistaking another's best play (because
it might be predicated on them playing a card they don't know they have
first), or how to hint in ways which avoid that.

A friend has put up an implementation up on github:
[https://github.com/chikinn/hanabi/blob/master/players/hat_pl...](https://github.com/chikinn/hanabi/blob/master/players/hat_player.py)

~~~
SilasX
Nice! Any intuition on whether this is practical for humans to implement in a
game?

(Side note: in practice, it's almost impossible to avoid the illegal side-
channel communication of watching the other players get nervous as you start
to do something stupid, so really the only rules-faithful way is to play over
a computer.)

(Another side note: there's a game called The Mind[1] which is the board-game
version of SleepSport -- the goal is to, without most kinds of communication,
play cards from each players hand in increasing order. One way to win is to
synchronize everyone's internal clock and play each card when that many
seconds have passed.)

[1]
[https://boardgamegeek.com/boardgame/244992/mind](https://boardgamegeek.com/boardgame/244992/mind)

[2] [https://stackoverflow.com/questions/6474318/what-is-the-
time...](https://stackoverflow.com/questions/6474318/what-is-the-time-
complexity-of-the-sleep-sort)

~~~
whatusername
(I think you meant to refer to "the Mind". "The Game" looks similar in some
ways -- but your description is for "the mind".)

~~~
goodmachine
For a game that - in principle - should be no fun at all, The Mind is
terrific.

------
gambler
Deep Mind continues the venerable tradition of defining their own frontiers of
AI research based on what will sound good in a PR release when they inevitably
achieve superhuman performance by modests means of a single datacenter stuffed
with TPUS and $10K video cards.

I guess we should just accept that in the society of the future robots will
play board/computers games and draw paintings while humans will all be
slavering on integration of endless badly written web services.

~~~
leesec
This is so cynical it is laughable. They are using games to develop algorithms
as they are easily defined problems, but they have in turn used those
algorithms on real world problems, notably AlphaFold and their data center
improvements [1]. I'm sorry breaking world records every year isn't quite
impressive enough for you.

Furthermore, their investment into TPU's is going to bring the price down in
the long run, which is incredibly beneficial to everyone. And yes we should
assume if AGI is possible it will require a lot of compute.

[1] [https://deepmind.com/blog/deepmind-ai-reduces-google-data-
ce...](https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-
cooling-bill-40/)

~~~
gambler
I am cynical for a reason.

 _> They are using games to develop algorithms as they are easily defined
problems_

I believe they use games because it makes good PR. I believed that before they
chose StarCraft, but now I am more convinced in it.

Also, because their algorithms are extremely data-hungry and it's easy to
generate tons of training data for a game simply by running it. And because it
is possible to evaluate performance of an algorithm over and over again.

 _> they have in turn used those algorithms on real world problems, notably
AlphaFold_

AlphaFold does not use their board game AI[1]. Naming it similarly is just a
marketing tactic.

AlphaFold might be a genuinely useful application of their resources, in which
case they deserve praise in credit for it. I have to do more research on it,
which is made very hard by all the hyperbole and hype around their work.

 _> Furthermore, their investment into TPU's is going to bring the price down
in the long run_

Overall, they do the exact opposite of democratizing AI. They invest heavily
into insanely data-hungry and hardware-hungry approaches that a normal company
will not be able to use (directly, not through Google's cloud services) for
decades if ever.

 _> And yes we should assume if AGI is possible it will require a lot of
compute._

What you're saying here (overall) does not make sense. There is absolutely no
reason to assume that breaking records in games without substantial
improvements in AI _theory_ makes us any closer to "AGI".

Ignoring the completely unsubstantiated statements about AGI, we are left with
the question of practical applications. I am not convinced that the societal
benefits of investing in complex and computationally intensive machine
learning is higher than investing in improvements of simpler algorithms. Most
practical applications of AI outside of a few ubercorps involve simpler, older
algorithms. Even someone as big as SAP relies mostly on decision trees and
SVMs.

If anything, the hype around deep learning likely means it's _harder_ to get
funding for old-school AI research that is far likely to benefit smaller
companies.

\---

[1]
[http://predictioncenter.org/casp13/doc/CASP13_Abstracts.pdf](http://predictioncenter.org/casp13/doc/CASP13_Abstracts.pdf)
Page 11, A7D.

------
wuthefwasthat
Never thought I'd see my github repo cited in a DeepMind/Brain paper. I still
have SOTA for >2 player by a large margin :)

I'd be pretty impressed if they beat my 5p strategy without Hanabi-specific
stuff, but I assume that's not their goal. Good luck to DeepMind/Brain either
way!

Authors: would love to chat sometime if you're reading this! If I get back to
Hanabi botting, I might try to use a learned policy to replace some of the
random (very non-optimized) heuristics my bot had.

~~~
jakobnicolaus
Absolutely, please shoot me an email. Did I mention that we link out random
games our bot played in the BAD paper? Sorry for the late reply!

------
te
From the paper:

> The main difference in Hanabi is that there is no “cheap-talk” communication
> channel: any signalling must be done through the actions played. It is
> therefore more similar in spirit to learning how to relay information to
> partners in the bidding round of bridge.

Jakob, why did you choose Hanabi instead of bridge itself? Bridge is obviously
much more widely known and seems to present similar challenges.

~~~
savanaly
To add on to the reasons mentioned by others (Hanabi is simpler, purely
cooperative, etc), I'll point out that Hanabi is fairly popular with young CS
types, especially the ones interested in ML and AI, much more so than Bridge
which is understandably seen as old fashioned. My source for this statement
being that I'm friends with a lot of such people and was roommates with
someone who now works for Deep Mind, and have learned first hand and shared in
the generalized Hanabi obsession. Perhaps it's just my localized experience
and I'm exaggerating the popularity of the game. But I think it's telling that
I wasn't the least big surprised when I saw this game mentioned as the next
candidate Deep Mind would tackle.

Does it much matter that the topic being studied is one you have experience
with and a passion for? Perhaps more than we might think.

------
marviel
I didn't know what Hanabi was, so I looked it up:
[https://en.wikipedia.org/wiki/Hanabi_(card_game)?oldformat=t...](https://en.wikipedia.org/wiki/Hanabi_\(card_game\)?oldformat=true)

One part of the game's description that immediately jumped out at me was the
need to pass information efficiently to other players, without directly giving
them the world state. This appears to be a generally similar problem to one
described in a recent openai paper: [https://blog.openai.com/interpretable-
machine-learning-throu...](https://blog.openai.com/interpretable-machine-
learning-through-teaching/)

------
angel_j
[https://github.com/deepmind/hanabi-learning-
environment](https://github.com/deepmind/hanabi-learning-environment)

------
ArtWomb
Am also working on a Hanabi game for the web. WebSocket Multiplayer, HTML5
Canvas2D, highly performant. Should play well in all modern browsers including
mobile. Goal is to have a full AI platform for agents as well as competitive
human play ;)

Any suggestions for implementing complex Hanabi game rules and logic are
appreciated! My gut feel is that an omniscient agent would converge on an
optimal policy in all current game states.

------
doctorpangloss
Based on my experience maintaining a Hearthstone simulator and helping
contributors write AI agents for it, the authors' interest in "imperfect
information" is spot on.

But that's just one major hole in what the authors describe as "superhuman" AI
demos in games like chess and go. The other is that those games don't look at
all like most conventional digital games people enjoy playing.

Go _looks_ like matrix math. A neural program has a chance of learning an
internal simulation of the game. There's no chance of that for DOTA 2 or
StarCraft II, which is why those wins by reinforcement learning AI bots were
so impressive.

However, those games (DOTA 2 and SC2) suffer from big rewards for superhuman
dexterity. Just a few seconds of excess dexterity (not strictly superhuman
dexterity) can tip the game to an absolute win for an AI bot. So most
conventional performance diagnostics, like winrates or Elo scores, greatly
understate the role of dexterity in those games.

Hanabi seems like a great reaction to where the games AI research is going.
Hanabi will be easier to simulate than a game like Hearthstone, so there will
be accelerated implementations or neural programs of it. However, the authors
criticize something that might be vital to Hanabi research:

> "However, it is impractical for human Hanabi players to expect others to
> strictly abide by a preordained policy."

That seems unfair. If you're going to research an idiosyncratically
cooperative game, the most _competitive_ strategies are going to involve
preordained (meaning coordinated prior to a match) policies. Why turn your
back on the path towards strategy development that obviously leads to the best
play?

The biggest problem with Hanabi is that, as a cooperative game, we have no
real idea what great strategies for it are. I know the authors claim that they
do. But for DOTA 2 and StarCraft 2, PvP games with immense popularity, we have
a credible natural laboratory for strategy and skill development. Hanabi is
really obscure by comparison.

The biggest problems with Hearthstone, which in my opinion is one of the best
games to research, are that (1) the rules are way too complex for a neural
program to learn them, so you need novel approaches for learning, (2) the
rules are too complex to execute on a GPU, so you need novel approaches to
parallelized computing and optimization, and (3) the deckbuilding meta and
some pay-to-win design have stunted strategy development, so outperforming a
human opponent may be way too easy.

There are so many problems to solve, and it's not that credible that neural
networks are the path forward!

~~~
jakobnicolaus
I think neural networks will be part of the solution, but they are probably
not the entire answer. For an example of a method that combines Deep RL with
Bayesian reasoning, you can take a look at our recent paper
([https://arxiv.org/abs/1811.01458](https://arxiv.org/abs/1811.01458)). BAD
achieves best known scores in for 2 player Hanabi in self-play.

~~~
doctorpangloss
> where BAD achieves an average score of 24.174 points in the two-player
> setting, surpassing the best previously published results for learning
> agents by around 9 points and approaching the best known performance of 24.9
> points for (cheating) open-hand gameplay

I didn't realize Hanabi is already that close to being solved.

My gut reaction is that the game is a lot simpler than it appears. I guess
your simpler "matrix" game points to that--you already had an intuition for
reducing Hanabi. Indeed, looking at the code you share for the "matrix game,"
it would seem that Hanabi's problem is that, like Chess and Go, it doesn't
really resemble more sophisticated games as much as it resembles something
that can be literally expressed in Tensorflow.

~~~
jakobnicolaus
The good news is that we have open-sourced the environment, so if you think
it's easy I would love to see a simple method that solves it.

------
dontreact
So far it seems to me like research on game-playing AI has not carried over
well to solving problems that people care about. Deepmind's success in applied
settings doesn't really seem to have benefitted from any meaningful way from
the game-playing research as far as I can tell. What are the best examples of
this so far that people are aware of?

~~~
imh
The problem is that the expectation of being able to transfer this stuff over
is more hype driven than anything else. People hear "AI beat the best person
in the world at something intellectually hard" and think that it should be
smarter than people in other ways too.

That doesn't diminish work on playing games. If they had released a chess or
go challenge where the board is just bigger or more pieces, that would be
dumb. But this challenge is a game that is a tiny bit closer to real "problems
that people care about." Solving this won't get us to problems that people
care about either, but it'll get us closer. It's only an incremental step, but
that's okay.

~~~
dontreact
Yeah I agree. I just wanted to highlight that to me the idea that doing better
at games is advancing AI in a meaningful way is definitely overhyped.

Sometimes I think that the progress in games seems kind of orthogonal to
progress in using machine learning to solve real world problems, because
anytime you have a game it automatically gets you essentially infinite labeled
training data set (each game has a score/outcome, and there are essentially
infinite possible games). So as long as the compute scales up enough, any game
humans can play will be solvable.

~~~
imh
I wouldn't say that makes games orthogonal to real world problems. That's what
makes them good stepping stones. Risk free "cheap" testing makes for fast
research.

I totally agree about the ability to just skirt sample complexity. It's a
tough one, made tougher by how early stage this work really is. We want bots
to be able to match human ability and match human learning. Though they're put
together, they're have very separate concerns.

For matching human ability, we're just beginning to learn techniques to get
bots able to master hard tasks (e.g. incomplete information games, atari
games, picking objects up). Those bots mostly learn waaaaaay slower than
people. But never mastering is worse than slowly mastering, so it's early
days.

On the other hand, you have people working on efficient learning. This is the
question you're getting at with compute scaling arbitrarily-ish. It's more
impressive if it can master a game after only playing it a small number of
times. People are definitely working on this too, but for even simpler tasks.
There's a lot of work right now in contextual bandits on learning fast, and
that's a kind of baby-RL task. Even there, simulation tasks are super
important because you really need a counterfactual to say whether you're doing
well compared to alternatives.

------
nabla9
Hanabi seems interesting.

I would say that Poker with 3-9 players and heterogeneous player quality also
contains similar aspects. You can improve the play when you recognize the
skill difference, how people play against other players, changes in impulse
control, magical thinking etc.

~~~
jakobnicolaus
Yes, but actively communicating with some of the other players through agreed
conventions would probably count as collusion and be illegal in N-player
poker..

~~~
nabla9
I was not talking about collusion.

You get information of how people play just by observing their play completely
legally.

~~~
jakobnicolaus
Sure, but in Hanabi the point is to be as informative as possible, while in
poker it should be the opposite (unless you collude).

------
ilaksh
I think that they should focus on educational games that teach basic language
and math. That may be very challenging but if anyone can pull that off it
could lead to some really general purpose abilities.

------
munificent
I tried playing Hanabi once with someone who has ADHD, a small working memory,
and is borderline on the autism spectrum so doesn't have a super great theory
of mind. It was excruciating.

~~~
zeckalpha
I’ve done nearly the same and it was spectacular. Different people are
different people.

------
nestorD
Just discovering the game but the imperfect information and convergent nature
of the game looks like a good fit for a monte-carlo based algorithm.

------
knicholes
I wonder why they created their own OpenAI-like interface instead of
inheriting from OpenAI's environment?

~~~
jakobnicolaus
Hanabi is a multi-agent problem. Unfortunately gym doesn't natively support
multi-agent action and state spaces.

