I hate to say it, but I’m not sure it’s playing fair. I used random.org to play truly randomly, and the “neural network” beat me to 10 pts 6 times in a row.
Looking at the source code, the move you are about to make is included in the training dataset. This can be confirmed by playing “scissors scissors scissors rock” and then looking at the variables x and y in the console, which will include the surprise rock.
The code updates x and y, then trains the model, and then makes a prediction. “Fair” code would make a prediction, then update x and y and train the model
This explains the behavior mentioned in the comments where the computer gets an impressive early lead due to the players next move being one of only a few datapoints it learns from, then backs off to a more plausible advantage as the leaked data is diluted by past data.
In other words, the description is precisely inaccurate. Not only is it not untrained, it's trained against your exact plays through mysterious future-knowledge!
after the player makes a move the nn makes ones and add it to it's training data. the move the player does after that is added as a counter move to the nn move. this way I treat the data as time series.
The problem is that that the player's move is actually being fed to the training data before the computer makes its prediction. Take a look at the variable 'y' after making your move and you'll see that it includes the player's last move. Because 'y' is updated before the computer makes its prediction, this shows that the computer uses the player's current move as part of the training set to decide its move.
it's not taking a pick at the player move. read the code, even after the pull request that brought this up the nn performed the same. even if the nn was taking a pick at the player's move, which is not, the data is shuffled before the training
Well, it’s much easier with your code. I immediately opened up a 3x lead on the comp. I stopped at 17-6-17 (W-L-T). What I did was to play the hand that would have beaten the computer’s previous guess, and if the computer looks like it’s figuring it out, I play the hand that beats my strategy once or twice, and then resume.
Two economists – one young and one old – are walking down the street together:
The young economist looks down and sees a $20 bill on the street and says, “Hey, look a twenty-dollar bill!”
Without even looking, his older and wiser colleague replies, “Nonsense. If there had been a twenty-dollar lying on the street, someone would have already picked it up by now.”
(99% joke and snark. Of course this isn't going to work on the stock market, but that kind of efficiency argument isn't perfect.)
I was not making the efficiency argument in the abstract. I'm making it in the concrete of a neural network that can be trained in real time in a browser window.
There's a great deal of people who seem to mistake "the market isn't actually 100% efficient" for "oh, I guess we can just ignore the question of 'efficiency because if it's not 100% it must be 0%". Not that they say the last part out loud, of course.
But that's not what you're coding.
If you trained it with every possible next move and response then you would learn that kind of relationship (though you would also overfit on the existing moves), but this way you just give it a peek into the probability distribution of the players moves, one thats so accurate that it comes from the future...
The thing is, if you are playing randomly, that makes it easier to predict what you’re (not) going to do next. A good random distribution would give fewer runs, which means your choice is less likely to be the last choice you picked and somewhat less likely to be other recent values. It can at least tie by choosing scissors if it thinks you will not pick rock.
This is kind of a shot in the dark because I am not familiar with RPS strategy, but I think it makes sense.
That sounds like the gambler’s fallacy. Less runs than what? Most truly random input haa far more runs than what people “think” is random, and in fact that’s one of the statistical tests for whether a data set was random.
You’re essentially saying that a good neural network can predict the next value of a good random number generator. Good luck with that one!
Maybe while you’re at it, have neural networks invert cryptographically secure hash functions :)
Here’s what I’m thinking: The neural network doesn’t have to be correct about which one you pick, it has to be correct about which you don’t pick. Only one option they pick is a loss, so if the other party can be somewhat certain you won’t pick a specific option, it can at least tie. So if a randomizer has pretty even distribution, I think it can win more than half the time, because it can gather roughly how likely the same choice is to be played in a row.
I’m basically suggesting a predictable distribution can be exploited in RPS.
Honestly, the bot could always be winning against the RNG by dumb luck. More experimentation would be needed to be sure. I am just making guesses.
> I think it can win more than half the time, because it can gather roughly how likely the same choice is to be played in a row.
With a random choice, the chance of playing the same choice in a row is 1/3. This does not give you any advantage over having no information (where each choice has a 1/3 chance.)
I think the misunderstanding is in
> a randomizer has pretty even distribution
Here's a thought experiment that might help: imagine what you say is truly the case - that would mean you could "charge up" a dice by rolling it until you got a long run of a given number - lets pick something arbitrary, say you roll until you get 5 twos in a row. According to what you've said the chance of the next number rolled being a two is now lower than it was when you started "charging up" your dice.
How is this possible? Nothing is physically changing about the dice between rolls.
> So if a randomizer has pretty even distribution, I think it can win more than half the time, because it can gather roughly how likely the same choice is to be played in a row.
Random specifically doesn't mean it's less likely to be the last choice you picked. At least if we're talking uniform random which we should be here. It means the last choice you picked is exactly as likely as any other choice.
If you made random moves in a game of chess or checkers you'd get beat by a four year old with the barest grasp of the rules. In fact I would love to see a chess-playing robot that actually does that, it would thrill the crap out of my nephews to beat the big bad robot in a game of wits.
I'm unaware of the name of the property that RPS exhibits that makes it ungameable by a random opponent (zero-sum?) but ordinary RPS played by humans certainly doesn't exhibit it, only in the magic world of computers where things like simultaneity of play can be guaranteed does RPS exhibit that property.
If played in the real world like real humans traditionally played it, ongoing RPS matches between a human and an AI would soon see the AI dominating. I'd be very curious to see what real-world RPS would look like between two AIs that can 'explain their model'.
Imagine making a rule that gives a Xms window for making plays. Then strategy can start to emerge.
2 person adversarial games with hidden information (eg poker or RPS) exhibit a "Nash equilibrium in mixed strategies". That is, the best strategy involves a random selection between choices with some probability weights. Since RPS is a very simple game, the weights are equal, so the unbeatable strategy is to choose between rock, paper and scissors with uniform probability on each round taking no account of what you or the other player have played before. Any other strategy can be exploited to the extent it deviates from pure uniform random choice (eg trivially the classic "play the thing that would have beat the thing which beat the winner of the previous round" strategy in RPS which you can use to reliably destroy children at the game).
Games with complete information (eg noughts and crosses, chess, go etc) have "pure strategies" (ie randomness is not required and there should always be an absolute best move in any given situation which you should pick 100% of the time).
Edit: In case it's not clear, the hidden information in RPS is the opponent's move. Whereas in chess you are either starting the game or you know what the opponent has done, in RPS you move simultaneously with the opponent, so don't know their move.
The Nash-Equilibrium solution to RPS is playing each option a third of the time. Simplified, this means that if you're playing this strategy, your opponent's best strategy is to do the same and there is nothing either of you can do to increase your own expectation.
This thread is talking about playing random -- by aid of a computer providing random numbers. The result is just that -- 1/3 of the time each option is played. The issue here is this AI is able to best random, and that is not something that should happen.
This is not chess. But a game of chance. You can't form a strategy based off random without predicting said random.
I thik it was already pointed out but I have not had a chance to verify. This game uses your played move in tht training set to train before making it's predictions.
RPS is not a game of chance. Outcomes of games are determined not by a random process, but by the choices of the players.
If the players rolled dice and biggest roll wins, that's a game of chance. The dice can't pick how they're thrown, nor can the throwers influence the outcome. If either of those two things change, the game is no longer purely one of chance.
The game can be set up in many ways that invite strategy. If they set up a camera and watched the human playing out the moves physically, that lets the algorithm read the person's patterns the way a human might read someone's poker tell.
The entire point of this thread is that some of use pitted random vs this AI. And tht AI is winning more often than it should. Zero humans were involved.
I have "beaten" it. Play initially like you would - typically a human would change his choice often. The network would learn from you and gain a lead of say 10 points. After this, if you pick something that gives you a win start sticking to it while it gives you a win. You will quickly regain the lost points and gain a small lead. This seems to work every time I try it. After 60 moves I always lead a bit. Btw, the game gets slow then.
Not sure a bigram or trigram model would not do better than the neural network.
Nice. Seems to beat me. I performed significantly better when I used random.org to generate numbers than when I played with my own attempt at randomness.
Screw up the NN by selecting the same thing over and over at first. Then start alternating between two values choosing a new set of values when it starts guessing correctly again. Rinse. Repeat.
I wonder about an arms race between two advanced AI’s which play RPS.
I can’t fully put my head around this, but what would it be like if each AI could read the architecture of the other’s brain before each move. The AI’s would be permitted to reconfigure themselves as they play. An “obvious” strategy may be to simulate your opponent and ask what they are likely to play. Though, simulating their behavior is likely to involve you simulating someone else simulating your behavior, ad-infinium, until your computing resources bottom out. It is like fighting the man in the mirror who can choose to mirror you or not.
Yep, RPS is only interesting to the extent that the players are imperfect. It’s most interesting as an illustration of how bad people are at behaving randomly.
This is my awstats after being featured here. Can someone tell how efficient is my code being served? I have no other numbers (or default values) to compare.