
The Prisoner's Dilemma: a summary, and new advances - ColinWright
http://bosker.wordpress.com/2012/07/23/the-prisoners-dilemma/
======
jere
The most interesting story on IPD (especially from a hacker perspective)
wasn't mentioned:

>Although tit for tat is considered to be the most robust basic strategy, a
team from Southampton University in England (led by Professor Nicholas
Jennings [1] and consisting of Rajdeep Dash, Sarvapali Ramchurn, Alex Rogers,
Perukrishnen Vytelingum) introduced a new strategy at the 20th-anniversary
iterated prisoners' dilemma competition, which proved to be more successful
than tit for tat. This strategy relied on cooperation between programs to
achieve the highest number of points for a single program. The University
submitted 60 programs to the competition, which were designed to recognize
each other through a series of five to ten moves at the start. Once this
recognition was made, one program would always cooperate and the other would
always defect, assuring the maximum number of points for the defector. If the
program realized that it was playing a non-Southampton player, it would
continuously defect in an attempt to minimize the score of the competing
program. As a result,[8] this strategy ended up taking the top three positions
in the competition, as well as a number of positions towards the bottom.

>This strategy takes advantage of the fact that multiple entries were allowed
in this particular competition and that the performance of a team was measured
by that of the highest-scoring player (meaning that the use of self-
sacrificing players was a form of minmaxing). In a competition where one has
control of only a single player, tit for tat is certainly a better strategy.
Because of this new rule, this competition also has little theoretical
significance when analysing single agent strategies as compared to Axelrod's
seminal tournament. However, it provided the framework for analysing how to
achieve cooperative strategies in multi-agent frameworks, especially in the
presence of noise. In fact, long before this new-rules tournament was played,
Richard Dawkins in his book The Selfish Gene pointed out the possibility of
such strategies winning if multiple entries were allowed, but he remarked that
most probably Axelrod would not have allowed them if they had been submitted.
It also relies on circumventing rules about the prisoners' dilemma in that
there is no communication allowed between the two players. When the
Southampton programs engage in an opening "ten move dance" to recognize one
another, this only reinforces just how valuable communication can be in
shifting the balance of the game.

[http://en.wikipedia.org/wiki/Prisoners_dilemma#Strategy_for_...](http://en.wikipedia.org/wiki/Prisoners_dilemma#Strategy_for_the_iterated_prisoners.27_dilemma)

By the way, if you haven't read The Selfish Gene, do yourself a favor and read
it now. It discusses IPD _and_ coins the word "meme"... what more could you
possibly want?

~~~
Kronopath
It's fantastic how these people were able to play the metagame like that.

I suspect that if you caught on to this, the best strategy would in fact be to
_impersonate_ the Southampton players, mimicking the ten-move dance, and then
perpetually defect, assuring the best strategy against either the "always
defect" or "always cooperate" players.

~~~
jere
Absolutely.

<http://en.wikipedia.org/wiki/Cuckoo#Brood_parasitism>

------
mistercow
I feel bad for game theorists. Any time an article talks about their field,
the first paragraph has to be a frantic explanation of what the field is and
why it's interesting, in terms a five-year-old could understand, and then the
article always ends up just being about the prisoner's dilemma.

It would be as if every article about physics started out "How stuff moves is
a really neat thing to study..." and then invariably ended up talking about
the parabolic trajectories of projectiles.

------
nopassrecover
This is actually a really good article, including easily missed gems like the
demos that the author put together (e.g. <http://s3.boskent.com/prisoners-
dilemma/fixed.html> and <http://s3.boskent.com/prisoners-dilemma/titfer.html>)

------
StavrosK
Did anyone else find that article meandering? The description of the game, in
particular, struck me as especially pompous wiring.

The metaphor with the elephant and the goldfish had absolutely nothing to
offer either, it was just a rehash of the previous sentence.

~~~
drostie
The metaphor might have been a little over-the-top but I support this sort of
creativity in blog posts, if only because it can ripen into something much
more aesthetically pleasing. There are places where it goes way too far --
_Gödel, Escher, Bach_ comes to mind -- but this is only a couple paragraphs.

In this case it does have something to "offer": it offers the insight that the
only value that looking at past states can offer is a sort of "strategy
tomography" to tell you what the other person's conditional probabilities
actually are, but that once you know this, your strategy is first-order
Markov, just because their strategy is first-order Markov. (This term, nth-
order Markov, just means "depends only on the last n game states.")

It's true, however, that this leads to a very interesting consequence which is
_not_ discussed in this context, something quite convoluted: "I will commit to
a first-order Markov strategy in order to prevent my opponent from using a
higher-order Markov strategy, so that my own analysis of their strategy
simplifies." It is a curious statement that your own ignorance forces someone
else to be ignorant, which you can then exploit.

This is proven in appendix A of their paper. To state the conclusion in their
words, "In iterated play of a fixed game, one might have thought that a player
Y with longer memory of past outcomes has the advantage over a more forgetful
player X. For example, one might have thought that player Y could devise an
intricate strategy that uses X’s last 1,000 plays as input data in a decision
algorithm, and that can then beat X’s strategy, conditioned on only the last
one iteration. However, that is not the case when the same game (same allowed
moves and same payoff matrices) is indefinitely repeated. In fact, for any
strategy of the longer-memory player Y, X’s score is exactly the same as if Y
had played a certain shorter-memory strategy (roughly, the marginalization of
Y’s long-memory strategy), disregarding any history in excess of that shared
with X."

~~~
sisu
> It's true, however, that this leads to a very interesting consequence which
> is not discussed in this context, something quite convoluted: "I will commit
> to a first-order Markov strategy in order to prevent my opponent from using
> a higher-order Markov strategy, so that my own analysis of their strategy
> simplifies." It is a curious statement that your own ignorance forces
> someone else to be ignorant, which you can then exploit.

This is only true if you know the strategy of the enemy beforehand, though.
For instance if you play rock-paper-scissors and decide your move only based
on the previous move your enemy can easily exploit that after playing for a
while. It is true that the enemy doesn't have to remember more than one move
after he learns your strategy but he needs to remember many moves to learn it.

~~~
drostie
1\. No, that statement is still true even if you don't know the higher-order
strategy of your opponent: no matter what it is, it has the same payoffs as
some lower-order strategy.

2\. You would have to define "exploit that," especially with the understanding
that this is game theory and probabilistic strategies are certainly
encouraged. So for example, you might imagine a genius who can consistently
outthink you, knows your entire history and how you like to play Rock-Paper-
Scissors and immediately as you throw down Rock, simply is able to guess that
this is what you're likely to do, and throws Paper.

You can beat this guy. Or, more precisely, you can equal him. It's very
simple: before the day has begun, roll a six-sided die and memorize the
sequence. As long as they are not exploiting certain "tells" (as a Japanese
robot did in the news a week or two ago) -- as long as they are just making a
deduction based upon the sort of person you are, they cannot produce a net win
against you and you're safe. Indeed, the Nash equilibrium for RPS is not
terribly interesting, it's to choose each of the options with probability
1/3rd -- I don't really have much reason to believe that this changes
dramatically in iterated RPS.

~~~
sisu
1\. I agree with what you say. There _exists_ a low-order strategy that has
the same payoff against the low-order strategy your opponent uses, and you can
use that if you know your opponent's strategy. However, in many games there
aren't low-order strategies that would work well against _any_ low-order
strategy so you need to know the opponent's strategy to choose a proper low-
order strategy. Alternatively you could use a higher order strategy that
learns the opponent's strategy and adapts to it.

2\. Well, I was thinking that a strategy would be a mapping from game history
to a probability distribution over possible moves. Should this be considered
in some other way?

Sure, you can just play according to the Nash equilibrium and you can't be
exploited, but probably a more interesting case is when both players try to
outsmart the other player and not just aim for a draw.

------
scotty79
I played against extotionist strategy and I guess I got lucky because I'm
leading (my score - 1) : (his score - 1) = 2:1 and won't change into 1:3 as
advertised because from this step I will always defect.

    
    
        25. You defected (1 points) and I defected (1 points)
        24. You defected (1 points) and I defected (1 points)
        23. You defected (1 points) and I defected (1 points)
        22. You defected (1 points) and I defected (1 points)
        21. You defected (1 points) and I defected (1 points)
        20. You defected (1 points) and I defected (1 points)
        19. You defected (1 points) and I defected (1 points)
        18. You defected (1 points) and I defected (1 points)
        17. You defected (1 points) and I defected (1 points)
        16. You defected (1 points) and I defected (1 points)
        15. You defected (1 points) and I defected (1 points)
        14. You defected (1 points) and I defected (1 points)
        13. You defected (1 points) and I defected (1 points)
        12. You defected (1 points) and I defected (1 points)
        11. You cooperated (0 points) and I defected (5 points)
        10. You defected (1 points) and I defected (1 points)
        9. You defected (5 points) and I cooperated (0 points)
        8. You cooperated (3 points) and I cooperated (3 points)
        7. You defected (5 points) and I cooperated (0 points)
        6. You cooperated (0 points) and I defected (5 points)
        5. You defected (1 points) and I defected (1 points)
        4. You defected (5 points) and I cooperated (0 points)
        3. You cooperated (3 points) and I cooperated (3 points)
        2. You defected (5 points) and I cooperated (0 points)
        1. You cooperated (3 points) and I cooperated (3 points)

~~~
gjm11
> won't change into 1:3 as advertised because from this step I will always
> defect.

From the description of the extortionist strategy in the demo: " _The only way
you can avoid being taken advantage of is to resign yourself to the meagre
rewards of mutual defection, and to defect on every turn._ "

So: yes, indeed you can do that, but then your long-run score will be rotten
because it will look like a vast succession of (D,D) plays, 1 point each. If
you let the Extortionist extort you, you will get substantially more points.

~~~
scotty79
_"The only way you can avoid being taken advantage of is to resign yourself to
the meagre rewards of mutual defection, and to defect on every turn. If you do
anything else then I will take advantage of your cooperation and I will do
three times better than you, in the sense that – on average over the long run
– (my score minus 1) will be thrice (your score minus 1)."_

I was refering to that. I did something else than dfecting all the time and
got better result than 1:3 but that's ok since 1:3 is average over all
strategies.

------
Xcelerate
For the split or steal game (with the golden balls), here is what I what do.

I would not open either ball, and I would tell the other contestant to pick
one at random. Then, he MUST choose split for his own ball if he wants any
chance of winning money. If my ball happened to say "steal" after opening,
then I would split what I won with him.

I think this would guarantee both of us winning unless he was willing to take
the chance of winning nothing.

~~~
cgag
You could just tell him you're going to choose steal:
<http://www.youtube.com/watch?v=S0qjK3TWZE8>

~~~
Xcelerate
I saw that video. That's what made me think of my idea. But mine removes the
possibility that I might be holding out on "split" and I think has a better
chance of guaranteeing we both split the money.

------
andyjohnson0
It's good to see Freeman Dyson still producing new insights in his late
eighties. A great mind.

------
biot
If I were playing Split or Steal, my strategy would be exactly this: tell the
other player that I'm going to choose "steal" and if they choose "split", I'll
give them half of the winnings. We would shake on it on national television
which forms a nice contract: an offer, consideration for both parties, and
acceptance. The beauty of this is that if they're not an idiot they'll choose
"split" and we each walk away with half. However, if they are an idiot and
they also chose "steal" then nobody gets anything and I can sue them for
violation of the contract in the amount of the 50% I would have won.

Sadly, I suspect everyone would then copy that strategy and much like everyone
choosing the letters R, S, T, L, N, and E in Wheel of Fortune, it would
quickly become a formula and would ruin the suspense the show hopes to create.

~~~
redslazer
Here is the video of this happening on the tv show.
<http://www.youtube.com/watch?v=S0qjK3TWZE8> . It worth a watch, I recommend
it.

~~~
mmcnickle
There's a twist in this example though. He says he's going to steal (but split
after the show) and convinces the other guy to split. At the last minute he
changes to split so that the split is governed by the rules of the game.

~~~
alexfoo
I deeply wanted "the other guy" to steal leaving that guy with nothing.

~~~
mmcnickle
I think that's why the guy proposing the deal ended up splitting. If the other
guy did steal to spite him, the fact he split would show that he was acting in
good faith and would increase the chance of a split after the show.

------
gwern
The Hofstadter essays: <http://www.gwern.net/docs/1985-hofstadter>

------
CaptainDecisive
The parable about the elephant and the goldfish was wonderful.

I imagined, when the time is right, telling it to my little boys as I tucked
them in to bed at night. And then imagined one of them coming in to my bedroom
later that night and beginning with "Papa, why did the elephant ...".

------
zmj
Results and strategy summary of an iterated Prisoner's Dilemma tournament from
last year:
[http://lesswrong.com/lw/7f2/prisoners_dilemma_tournament_res...](http://lesswrong.com/lw/7f2/prisoners_dilemma_tournament_results/)

------
macavity23
Watch the embedded video, it's a great illustration of the article. :-)

------
Toenex
Thanks, very enjoyable read.

