Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Play rock paper and scissors against a untrained neural network (github.com/victorqribeiro)
101 points by atum47 on July 26, 2019 | hide | past | favorite | 63 comments



I hate to say it, but I’m not sure it’s playing fair. I used random.org to play truly randomly, and the “neural network” beat me to 10 pts 6 times in a row.


Looking at the source code, the move you are about to make is included in the training dataset. This can be confirmed by playing “scissors scissors scissors rock” and then looking at the variables x and y in the console, which will include the surprise rock.

The code updates x and y, then trains the model, and then makes a prediction. “Fair” code would make a prediction, then update x and y and train the model

This explains the behavior mentioned in the comments where the computer gets an impressive early lead due to the players next move being one of only a few datapoints it learns from, then backs off to a more plausible advantage as the leaked data is diluted by past data.


In other words, the description is precisely inaccurate. Not only is it not untrained, it's trained against your exact plays through mysterious future-knowledge!


after the player makes a move the nn makes ones and add it to it's training data. the move the player does after that is added as a counter move to the nn move. this way I treat the data as time series.


The problem is that that the player's move is actually being fed to the training data before the computer makes its prediction. Take a look at the variable 'y' after making your move and you'll see that it includes the player's last move. Because 'y' is updated before the computer makes its prediction, this shows that the computer uses the player's current move as part of the training set to decide its move.


If true, this seems like it would be a huge scandal. Or at least an issue in Github.


it's not taking a pick at the player move. read the code, even after the pull request that brought this up the nn performed the same. even if the nn was taking a pick at the player's move, which is not, the data is shuffled before the training


Isn't this just plain cheating?



Well, it’s much easier with your code. I immediately opened up a 3x lead on the comp. I stopped at 17-6-17 (W-L-T). What I did was to play the hand that would have beaten the computer’s previous guess, and if the computer looks like it’s figuring it out, I play the hand that beats my strategy once or twice, and then resume.


He should hook it up to the stock market.


If it were that easy to beat the stock market, it wouldn't be that easy to beat the stock market. (Not a joke or snark.)


Two economists – one young and one old – are walking down the street together:

The young economist looks down and sees a $20 bill on the street and says, “Hey, look a twenty-dollar bill!”

Without even looking, his older and wiser colleague replies, “Nonsense. If there had been a twenty-dollar lying on the street, someone would have already picked it up by now.”

(99% joke and snark. Of course this isn't going to work on the stock market, but that kind of efficiency argument isn't perfect.)


I was not making the efficiency argument in the abstract. I'm making it in the concrete of a neural network that can be trained in real time in a browser window.

There's a great deal of people who seem to mistake "the market isn't actually 100% efficient" for "oh, I guess we can just ignore the question of 'efficiency because if it's not 100% it must be 0%". Not that they say the last part out loud, of course.


I think it's fair, cause it's like the neural network is thinking "after I play rock the player plays paper" and train with that data


But that's not what you're coding. If you trained it with every possible next move and response then you would learn that kind of relationship (though you would also overfit on the existing moves), but this way you just give it a peek into the probability distribution of the players moves, one thats so accurate that it comes from the future...


Yeah, I am not sure what is going on, I just played it 1000 times from rand and the results were.

500 plays - Player: 144, Computer: 179, Tie: 177

https://youtu.be/g9Zo771HYpM

1000 plays - Player: 325, computer: 359, tie: 316

https://youtu.be/7pB5TZ_xYzE


I just did the same thing,

500 plays - Player: 163, Computer: 179, Tie: 158

1000 plays - Player: 306, Computer: 365, Tie: 329

I've run it several times and the computer has won every time against random, it must have an unfair advantage.


How are you emulating the clicking?


I normally write a c program or go linked to xlib to do this sort of work, but I just used xdotool in a bash script for this.

https://gist.github.com/mbrumlow/49f0a3fa311cc3002e4e1fae2e2...


thanks! didn't know about xdotool


This runs it randomly 1000 times (well not reaaally randomly, but it gets close), just paste it into the browser console:

const times = 1000;

function sleep(ms) { return new Promise(resolve => setTimeout(resolve, ms)); }

async function run() { for (let i = 0; i < times; i++) { await sleep(10); play(Math.floor(Math.random()*3)); } }

run();


You can also do:

for(i=0; i<100; i++) { play(Math.floor(Math.random()*3)) }

Right in the JavaScript console.


Nice! Thanks, I've edited my post to use this shorter version. The sleep is needed to avoid freezing the tab.


The thing is, if you are playing randomly, that makes it easier to predict what you’re (not) going to do next. A good random distribution would give fewer runs, which means your choice is less likely to be the last choice you picked and somewhat less likely to be other recent values. It can at least tie by choosing scissors if it thinks you will not pick rock.

This is kind of a shot in the dark because I am not familiar with RPS strategy, but I think it makes sense.


That sounds like the gambler’s fallacy. Less runs than what? Most truly random input haa far more runs than what people “think” is random, and in fact that’s one of the statistical tests for whether a data set was random.

You’re essentially saying that a good neural network can predict the next value of a good random number generator. Good luck with that one!

Maybe while you’re at it, have neural networks invert cryptographically secure hash functions :)


Here’s what I’m thinking: The neural network doesn’t have to be correct about which one you pick, it has to be correct about which you don’t pick. Only one option they pick is a loss, so if the other party can be somewhat certain you won’t pick a specific option, it can at least tie. So if a randomizer has pretty even distribution, I think it can win more than half the time, because it can gather roughly how likely the same choice is to be played in a row.

I’m basically suggesting a predictable distribution can be exploited in RPS.

Honestly, the bot could always be winning against the RNG by dumb luck. More experimentation would be needed to be sure. I am just making guesses.


> I think it can win more than half the time, because it can gather roughly how likely the same choice is to be played in a row.

With a random choice, the chance of playing the same choice in a row is 1/3. This does not give you any advantage over having no information (where each choice has a 1/3 chance.)

I think the misunderstanding is in > a randomizer has pretty even distribution

Having an even distribution over a long time does not make any specific choice less random. https://en.wikipedia.org/wiki/Gambler%27s_fallacy


Here's a thought experiment that might help: imagine what you say is truly the case - that would mean you could "charge up" a dice by rolling it until you got a long run of a given number - lets pick something arbitrary, say you roll until you get 5 twos in a row. According to what you've said the chance of the next number rolled being a two is now lower than it was when you started "charging up" your dice.

How is this possible? Nothing is physically changing about the dice between rolls.


> So if a randomizer has pretty even distribution, I think it can win more than half the time, because it can gather roughly how likely the same choice is to be played in a row.

No.


Random specifically doesn't mean it's less likely to be the last choice you picked. At least if we're talking uniform random which we should be here. It means the last choice you picked is exactly as likely as any other choice.


But if it predicts you are playing random, is it not then random vs random?

I am under the impression that you can't game random -- unless you are aware of the random generator being used is broken..


If you made random moves in a game of chess or checkers you'd get beat by a four year old with the barest grasp of the rules. In fact I would love to see a chess-playing robot that actually does that, it would thrill the crap out of my nephews to beat the big bad robot in a game of wits.

I'm unaware of the name of the property that RPS exhibits that makes it ungameable by a random opponent (zero-sum?) but ordinary RPS played by humans certainly doesn't exhibit it, only in the magic world of computers where things like simultaneity of play can be guaranteed does RPS exhibit that property.

If played in the real world like real humans traditionally played it, ongoing RPS matches between a human and an AI would soon see the AI dominating. I'd be very curious to see what real-world RPS would look like between two AIs that can 'explain their model'.

Imagine making a rule that gives a Xms window for making plays. Then strategy can start to emerge.


2 person adversarial games with hidden information (eg poker or RPS) exhibit a "Nash equilibrium in mixed strategies". That is, the best strategy involves a random selection between choices with some probability weights. Since RPS is a very simple game, the weights are equal, so the unbeatable strategy is to choose between rock, paper and scissors with uniform probability on each round taking no account of what you or the other player have played before. Any other strategy can be exploited to the extent it deviates from pure uniform random choice (eg trivially the classic "play the thing that would have beat the thing which beat the winner of the previous round" strategy in RPS which you can use to reliably destroy children at the game).

Games with complete information (eg noughts and crosses, chess, go etc) have "pure strategies" (ie randomness is not required and there should always be an absolute best move in any given situation which you should pick 100% of the time).

Here's an intro to the concept of Nash equilibrium in mixed strategies http://www.econport.org/content/handbook/gametheory/useful/e...

Edit: In case it's not clear, the hidden information in RPS is the opponent's move. Whereas in chess you are either starting the game or you know what the opponent has done, in RPS you move simultaneously with the opponent, so don't know their move.


The Internet truly is a horn of plenty. Thanks for granting my nerd wish!


The Nash-Equilibrium solution to RPS is playing each option a third of the time. Simplified, this means that if you're playing this strategy, your opponent's best strategy is to do the same and there is nothing either of you can do to increase your own expectation.


This thread is talking about playing random -- by aid of a computer providing random numbers. The result is just that -- 1/3 of the time each option is played. The issue here is this AI is able to best random, and that is not something that should happen.


This is not chess. But a game of chance. You can't form a strategy based off random without predicting said random.

I thik it was already pointed out but I have not had a chance to verify. This game uses your played move in tht training set to train before making it's predictions.


RPS is not a game of chance. Outcomes of games are determined not by a random process, but by the choices of the players.

If the players rolled dice and biggest roll wins, that's a game of chance. The dice can't pick how they're thrown, nor can the throwers influence the outcome. If either of those two things change, the game is no longer purely one of chance.

The game can be set up in many ways that invite strategy. If they set up a camera and watched the human playing out the moves physically, that lets the algorithm read the person's patterns the way a human might read someone's poker tell.


It becomes a game of chance once one of the players is replaced with a computer picking at random.

Which is the point of this thread. Random vs AI.


That's gotta be the most boring discussion ever. I guess I'll leave you to it.


Are you implying that you can (or any other human can) in the long run beat someone who is playing RPS randomly?


Can I beat a computer playing randomly, no. A human, show me where the random number generator in their brain is.


I see the confusion now.

The entire point of this thread is that some of use pitted random vs this AI. And tht AI is winning more often than it should. Zero humans were involved.

Which brings us back to you can't game random.


> A good random distribution would give fewer runs

A good random distribution actually gives more and longer runs than a human trying to appear random.


I have "beaten" it. Play initially like you would - typically a human would change his choice often. The network would learn from you and gain a lead of say 10 points. After this, if you pick something that gives you a win start sticking to it while it gives you a win. You will quickly regain the lost points and gain a small lead. This seems to work every time I try it. After 60 moves I always lead a bit. Btw, the game gets slow then.

Not sure a bigram or trigram model would not do better than the neural network.


I have merged the first pull request that changes to the prediction first training after.


Nice. Seems to beat me. I performed significantly better when I used random.org to generate numbers than when I played with my own attempt at randomness.


Screw up the NN by selecting the same thing over and over at first. Then start alternating between two values choosing a new set of values when it starts guessing correctly again. Rinse. Repeat.


I wonder about an arms race between two advanced AI’s which play RPS.

I can’t fully put my head around this, but what would it be like if each AI could read the architecture of the other’s brain before each move. The AI’s would be permitted to reconfigure themselves as they play. An “obvious” strategy may be to simulate your opponent and ask what they are likely to play. Though, simulating their behavior is likely to involve you simulating someone else simulating your behavior, ad-infinium, until your computing resources bottom out. It is like fighting the man in the mirror who can choose to mirror you or not.


don't you just end up with a (discrete) uniform distribution on both sides? anything else can be exploited by the opponent


Yep, RPS is only interesting to the extent that the players are imperfect. It’s most interesting as an illustration of how bad people are at behaving randomly.


There's a site where AIs play RPS on an ongoing basis http://www.rpscontest.com/. Source code of the entrants is available. My best bot (called paper6) http://www.rpscontest.com/entry/5640686850277376 wins on average 67% of the time, but the top bots win around 80% of matches.


This is my awstats after being featured here. Can someone tell how efficient is my code being served? I have no other numbers (or default values) to compare.

day | visits | pages | hits | bytes

26 Jul 2019 | 4,688 | 16,507 | 60,489 | 163.89 MB


Hm in my limited testing of not looking at the screen and selecting as randomly as I could I was surprised to find it not end 50/50

Not enough trials or is there something else going on here?


2 things to consider...

1. Humans aren't good at random, try using a RNG or dice.

2. If you toss 10 coins, what is the probability of 5 heads and 5 tails? It isn't very likely, even though it's the most likely outcome.


I let rand play it 500 times the results are as followed.

Player: 144, Computer: 179, Tie: 177,

Although I think I got more equal results when I did 1000 iterations, but failed to record that one.

https://youtu.be/g9Zo771HYpM


Wow it's surprisingly good


Check the other comments, looks like it's cheating by reading the human's input before guessing.


no, it's not.


Or we’re surprisingly bad…


now with a new interface, thanks to https://github.com/ocjojo


I expected to lose against a computer.

I was right.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: