
How I Used Professional Poker to Become a Data Scientist - tolukia
https://medium.springboard.com/how-i-used-professional-poker-to-become-a-data-scientist-e49b75dfe8e3
======
_asummers
Cool little trick, if you're strictly curious about knowing which hands rank
where in relation to which other hands (say in an AI or whatever).

While there are C(52, 5) different hands you can have, having 4 diamonds and a
heart is the exact same thing as 3 clubs and 2 hearts, etc. so it collapses
down to ~2.5k hands.

Now the next observation we make is that for most hands, suit doesn't matter,
but if it does we still need a way to distinguish it, but again, a flush in
hearts is the same as a flush in spades in terms of card ranking. The trick we
now pull out is the Fundamental Theorem of Arithmetic, which says that for
some natural number N, N can be factored into primes in exactly one way,
excluding permutations of the factors.

We take advantage of this and assign each card a prime number. 2=2 3=3 4=5 ...
A=41

Now, by multiplying the prime numbers together, we get things like a 2 3 A K Q
hand is 2 * 3 * 41 * 39 * 37, and if they are all the same suit, you multiply
by 43. You can then take these 2.5k hashes, rank them against each other in a
lookup table, and know in constant time for the number of hands how hands rank
against each other by just multiplying their card's values together with
either 1 or 43 if same suit and seeing where it falls in the list.

I realize expected value is more widely used as the basis for these types of
systems, but I always thought that was a fun trick.

~~~
hervature
I think it should be noted here that this only ranks garbage hands and
flushes. Quad 2's is certainly worth more than 3 Ace's yet 2^4 < 41^3.

Edit: I missed

>>> rank them against each other in a lookup table

I'm just confused about the prime factors thing now. Why not just hash the
hand itself? "3459To" where o stands for off suit?

~~~
T-hawk
Because 3459T is equivalent to 345T9 and 34T59 and all the other permutations.

You need to either hash all the permutations (making your lookup table 5!
times bigger) or sort the hand by card value. It is faster to multiply the 5
prime factors in linear time rather than the N log N of sorting.

~~~
hervature
That makes sense. I was making the assumption that hands are given in order.
For example, when generating all hands, your inner loops only start at the
current index of the parent loop.

~~~
T-hawk
The purpose isn't to generate the ranking list. The purpose is to evaluate
hands. The list is a one-time cost however you make it, but evaluating the
randomly dealt cards would have the ongoing cost of sorting every time if not
for the prime-factors trick.

~~~
hervature
Yea, I guess I'm not convinced that the prime-factors is faster than sorting.
Sure, linear time for prime multiplication, but you still have to look up the
prime number in a lookup. The question comes down to what the coefficient is
in front of N for prime factor method and in front of N log N for sorting. As
N=5, the coefficient ratio is pretty small to still warrant using sorting.
Also, using Radix sorting makes things even more fuzzy.

------
mad_tortoise
As someone who has played poker my entire life as well as a programmer, and
have drifted passively into playing online poker recently over the real game.
I found this fascinating and hadn't really considered taking the data science
route to playing until I read this article as it's more of a hobby. I always
have a minimum I'm willing to walk away losing should I do badly, however this
approach has changed my view.

Does anyone else on HN have more resources like this, applying data science to
poker. I will google, but on forums like this one, I find personally
recommended resources to be very helpful.

~~~
thret
Holdem Manager is still the best program for it I believe, but it only works
for holdem. You're out of luck if that game bores you.

It is most useful for analysing your own game - you have access to your entire
hand history so it is extremely valuable for finding out your weaknesses.

~~~
mad_tortoise
Thanks, well holdem is my preferred game and doesn't bore me at all. I'll
definitely be checking out Holdem Manager.

------
rurounijones
While reading this I started thinking about how interesting it would be would
be to see an AI vs AI poker tournament.

Does such a thing exist? Some preliminary searches didn't bring much up apart
from one-off experiments. [EDIT found a recent article about this
[https://www.scientificamerican.com/article/time-to-fold-
huma...](https://www.scientificamerican.com/article/time-to-fold-humans-poker-
playing-ai-beats-pros-at-texas-hold-rsquo-em/) which talks about
[http://www.computerpokercompetition.org/](http://www.computerpokercompetition.org/)]

Another interesting one would be a table of 8 players. Half AI half
professionals but no one knows who is who.

~~~
Wingman4l7
The poker competition you linked to is really more academics-oriented, and
last time I checked, was consistently won by the University of Alberta.

For a more "real-world" example -- AIs that were actually used in live online
poker environments, competing against each other -- there was only one that I
know of. It was hosted by the creator of the (apparently now defunct)
WinHoldEm software, Ray E. Bornert II. It never ran again, only a small
handful of the more experienced bot authors attended, and turnout was poor,
with the highest max buy in at $100 (
[http://robopoker.blogspot.com/2007/12/pokerbot-world-
champio...](http://robopoker.blogspot.com/2007/12/pokerbot-world-championship-
cruise.html)). It was called the 2007 Poker Bot World Championship (PBWC).

------
femto113

        600 hands per hour ... won $1,913.13 over 387,373 hands
    

That's less than $3/hour, switching to _any_ other career would be an
improvement.

~~~
thret
Nobody is playing for a living at those stakes, I guess he was still learning
at that time. It would make his basic analysis easier to bear out though. At
those stakes the bad players are loose and passive, and simply playing tight
and aggressive is a winning strategy.

There are no sharks at those tables.

~~~
wrrretch
Lol. It's 2017. At $25 every table has 8 eastern europe "sharks".

~~~
erikb
Really? I hoped so many years after the boom died it would become more
relaxing again. Maybe good old times really never come back.

------
SeanDav
> _" I used the data at this level from 2013 where I won $1,913.13 over
> 387,373 hands,"_

This does not seem like a lot of money to earn for playing 387k hands!

Not a criticism just an observation - but surely this amount of effort could
be better spent elsewhere? Purely from a money-making perspective, obviously
there is an element of enjoyment here which might change the picture
significantly.

~~~
splonk
For professional low level poker players, a good deal of their income comes
from rakeback. That is, for every hand, the site makes $N (the "rake"), and
the player gets back X% of N/players, where X scales according to the amount
of volume put in. I'm a bit out of the loop but you can assume that N is
around $3 and X is probably in the 35% range.

Some people semi-derisively refer to these people as "rakeback pros", in that
you can break even (or even lose a little bit) in your poker results, and
still make a living wage from the rakeback alone.

Also, 387k hands probably takes less time to play than you think. Many online
grinders play upwards of 1000 hands/hr - 20+ tables concurrently X 50-150
hands/hr.

------
DennisP
> ‘Voluntarily Put Money in Pot’ (VP$IP) percentage...is the frequency with
> which a player plays a hand when first given an opportunity to bet or fold.

Does a call count, or does it have to be an initial bet or raise?

~~~
hanasu
Any money that enters the pot that is not a blind is considered VPIP.

------
madcaptenor
Nate Silver was for a while a professional poker player.

------
kapauldo
Everyone is a data scientist these days

~~~
mdekkers
Indeed. Call me a snob, but I am sitting in 2 to 3 interviews weekly at the
moment (looking for a big data specialist) and everyone is a "Data Scientist":

 _me_ : You state on your CV that you are a data scientist

 _Candidate_ : Yes

 _me_ : What is your Phd in?

 _Candidate_ : I don't actually have a Phd

 _me_ : <sigh> right, no scientist.

I hate job interviews

~~~
sushid
You are being a snob. There's no degree requirement to be called a data
scientist. Even if there were, why would you exclude a B.S. or an M.S.? It's
not like they're calling themselves a Doctor.

You'd be stupid to market yourself as a data specialist when every startup is
looking for a data scientist.

~~~
mdekkers
_There 's no degree requirement to be called a data scientist._

There is for one of the positions I am recruiting for.

 _Even if there were, why would you exclude a B.S. or an M.S.?_

Because I am looking for someone with a Phd in Data Science. How stupid of me,
for excluding people that don't have the skills I need for my job.

 _You 'd be stupid to market yourself as a data specialist when every startup
is looking for a data scientist._

I'd be stupid for hiring a data specialist, when what I am looking for is a
data scientist. I'd be even stupider for hiring someone that is _trying_ to
pass themselves off as a data scientist with Phd, when they don't actually
have one. I recently interviewed someone with "12 years production hadoop
experience" (seriously), claiming to have a "data science Phd" (doesn't exist
as such) and when asked "can you please explain the relationship between
hbase, hadoop, and hdfs" (yes, there is a reason for asking this question in
this way) I had to listen to essentially a salespitch for Hortonworks, and to
be told "hdfs has nothing to do with hadoop" <\-- this shit, many times a
week.

~~~
sushid
But this isn't a problem with the industry itself, it's how you've set
yourself up for poor matches. Why don't you title the position "Data
Scientist, Senior/Chief/etc. (PhD req)." It'll cut through at least half of
the cruft right away.

But really, it looks like you need to hire a new recruiter. If you're wasting
your time interviewing a Hortonworks entry dev when you want a guy with a PhD,
the recruiter is not doing their job.

