

Show HN: Prisoner's dilemma tournament with bots that can simulate each other - spimta
https://github.com/pdtournament/pdtournament

======
jere
Is there any reason the number of rounds is fixed? I didn't think anything
interesting could come of that:

>If the game is played exactly N times and both players know this, then it is
always game theoretically optimal to defect in all rounds. The only possible
Nash equilibrium is to always defect. The proof is inductive: one might as
well defect on the last turn, since the opponent will not have a chance to
punish the player. Therefore, both will defect on the last turn. Thus, the
player might as well defect on the second-to-last turn, since the opponent
will defect on the last no matter what is done, and so on. The same applies if
the game length is unknown but has a known upper limit.

Also, I'm wondering if the rules prohibit collusion. The rules say one bot per
person, but don't mention teams.

~~~
spimta
I actually fixed the number of rounds on purpose--I figure the Nash
equilibrium in the classical iterated PD isn't very relevant in this wildly
different version of the game, and I want to see what kinds of strategies
emerge with fixed-size rounds.

I didn't explicitly forbid collusion, since it's tough to prevent it in
practice; I would appreciate it if teams simply submitted one bot or several
bots with different strategies.

~~~
Retric
A perfect simulation is recursive so you need to add a resource limit. In
which case a bot can either lie by detecting it's being simulated or just
waste time and give out zero information.

------
jamesbritt
Folks interested in reading more about the Prisoner's Dilemma might want to
read Robert Axelrods' The Evolution of Cooperation.

[http://www.amazon.com/The-Evolution-Cooperation-Revised-
Edit...](http://www.amazon.com/The-Evolution-Cooperation-Revised-
Edition/dp/0465005640)

He also wrote a follow-up book, The Complexity of Cooperation: Agent-Based
Models of Competition and Collaboration

[http://www.amazon.com/The-Complexity-Cooperation-Agent-
Based...](http://www.amazon.com/The-Complexity-Cooperation-Agent-Based-
Collaboration/dp/0691015678)

------
j2kun
There is a fun anecdote about how to beat tit-for-tat by using a collusion
strategy. What you do is enter a team of bots, and their strategy is as
follows:

1\. For the first 50 rounds, behave according to a predefined sequence. 2\. If
your opponent also does this sequence, then cooperate in every subsequent
round. Otherwise defect in every subsequent round.

I don't have a reference on hand, but from what I remember this defeated all
other algorithms by a large margin in one of the main yearly Prisoner's
Dilemma tournaments.

~~~
LanceH
The bots should all cooperate with each other, except for one which will
defect against peers and take all the points for the win.

~~~
j2kun
That's right, now I remember.

------
philh
> The 1/100th of a second time limit used in this example is more than enough
> time for a bot that does not loop infinitely to terminate on any modern
> processor; simple bots like cooperateBot or titForTat will terminate in less
> than 1/1000th of a second. For the purposes of this tournament, you can
> consider 1/100th of a second a safe upper bound on simulation time when
> writing your own bots. In addition, the time function itself costs slightly
> less than 1/100th of a second in overhead

This doesn't seem safe. Suppose I'm playing against JusticeBot, and simulate
ver playing against me. Then ve simulates me playing against CooperateBot, and
presumably sim-me simulates CooperateBot playing against me. The recursion
ends there. But JusticeBot called time once directly, and vis simulation of me
also called time, which means ve took more than 1/100 of a second to run in
total.

~~~
spimta
When you call the time function, it will cut off the simulation once that
1/100th of a second limit is reached, no matter what happens inside the
simulation. So even if your opponent simulates you or performs some expensive
computation or even goes into an infinite loop, time will terminate in the
allotted number of seconds plus the overhead. If that happens, though, it will
return Nothing, so you won't have gained any information about what your
opponent will do.

~~~
philh
I mean that it doesn't seem a safe upper bound - `time` will return, but there
are situations where to get a useful result, you need to increase the limit;
and it's easy to get in those situations even when neither bot is running lots
of repeated simulations.

This class of difficulty can't be completely eliminated, but it seems that
being able to run bots for a long time relative to the overhead of `time`
would reduce it.

~~~
spimta
Ah, I see what you mean now. This was deliberate--I want simulation time to be
a precious resource that bots have to spend wisely. You can burn 4 seconds on
one simulation that is more likely to give you valuable information, but then
it's the only one you're going to get. Or you can do lots of small simulations
that are less likely to return an answer. I agree that it's not very safe,
which is somewhat unfortunate, but also serves to discourage bots that use
precise timing as part of a strategy. Also, since timing out results in a
defection rather than in some additional penalty, the risk isn't too large.

------
philh
> The tit-for-tat is very effective in the classical iterated prisoner's
> dilemma, but it is easily exploitable in the program equilibrium verion of
> the game. This is because the bot's decision is based entirely on the
> history of previous moves, so an opponent can always predict when the tit-
> for-tat bot will cooperate and then defect against it.

Unless I'm missing something, doing this naively doesn't let you exploit
TitForTat, it just decreases your reward from (3 every round) to (5 every
other round).

You _can_ still exploit TitForTat, e.g. by defecting in the final round. But
that's not a massive win for you. CooperateBot is much more exploitable.

~~~
spimta
Yep, you're exactly right--I didn't actually add up the payoffs when I wrote
that. You could potentially do better against tit-for-tat variants that have a
certain chance of forgiving after a defection, but not against the naive,
deterministic TitForTat.

------
diziet
A plot device based on a similar concept (including simulating other players)
is used in Hannu Rajaniemi's _The Quantum Thief_. There is a Prisoner's
Dilemma Prison and the winning entity cheats by emulating others.

[https://en.wikipedia.org/wiki/The_Quantum_Thief](https://en.wikipedia.org/wiki/The_Quantum_Thief)

------
pokpokpok
This looks like it's bound to result in a messy spiral of infinite second
guessing. Curious what the winning strategy will be!

~~~
jreimers
I am curious as to how this will work.

How does bot A simulate bot B that must simulate bot A to determine its
answer? Seems like infinite trampoline recursion.

EDIT: it seems you run the simulation against a different bot implementation
than the running one and attempt to determine its characteristics given the
responses and the known behaviour of the other implementation.

------
malloreon
Hannu Rajaniemi's All-Defector would win.

