The solution is not necessarily that both players will play all three options randomly. It is only necessary for ONE player to play randomly in order to reach an equilibrium. The other player at that point is free to pick any strategy they choose, including playing a single option 100% of the time.
EDIT: It’s not nice to vote people down just because they politely point out you’re demonstrably wrong.
HN doesn't let you downvote the replies to your own comments or top-level comments for your own article submissions.
Setup your own experiment and let me know what you find.
I’m going to sound very arrogant but: I don’t need to set up the scenario you describe because optimal strategies in identical games have been proven since the 1940s when Oskar Morgenstern and John Von Neumann published their seminal Theory of Games and Economic Behaviour (1944).
If you are getting results that are in conflict with that which is mathematically proven (provided certain circumstantial criteria are met), then you can be sure that you have not exactly matched the circumstantial criteria assumed as a perquisite for the proof. So basically you’re in another situation where the theorem makes no claim to hold and, unsurprisingly, doesn’t.
If the first player is locked into playing randomly forever, irrespective of what the second player does, then for the remaining player any strategy is an "equilibrium" and can't be improved (or worsened). We're just talking about roulette or something.
But I'm a little disappointed that someone would downvote the comment above. The "something is wrong with the theory" claim is very premature (ok, it's wrong), but the important thing is now we are fighting with the text instead of just citing it. That's a step forward, not a step back.
But note the relationship between computation and exploitation here: if the deviating player makes its deviation difficult to detect (e.g. using a PRNG stream cipher, which is completely deterministic but may look random), then the cost of making good predictions to take advantage of the deviation may exceed the payoff.
If all players write down the strategy they are going to play on a sheet of paper and show them to all other players (can include randomisation) then no player would have a strict incentive to alter their strategy.
With rock paper scissors, if me and my opponent both write down 1/3,1/3,1/3 randomisation, then neither of us would strictly benefit from doing something else.
Your [1/3,1/3,1/3], [1,0,0] is not a Nash Equilibrium. Whilst the second player does not have a strict incentive to change strategies, the first player can switch to [0,0,1] and win every time.
Otherwise you have a little bit of a problem related to when and how you know the strategy executed by your opponent.
On paper you can get around this issue by ignoring the time element, but in actual execution with humans or intelligent agents, the problem exists.
Edit- quote from the introduction:
“The technical material includes logic, probability theory, game theory, and optimization.[...]the goal has been to gather the most important elements from each discipline and weave them together into a balanced and accurate introduction to this broad field. The intended reader is a graduate student or an advanced undergraduate, prototypically, but not necessarily, in computer science”
Arxiv Vanity renders academic papers from Arxiv as responsive web pages so you don’t have to squint at a PDF.
Doesn't appear to work with this paper, though.