
My journey applying AI to horse racing - surfsvammel
https://towardsdatascience.com/applying-ai-to-horse-racing-e3632a7e7c92
======
jedberg
This is another case where being rich lets you get richer. :)

I had a friend who had a private box at a horse track. One of the benefits of
this box was that you had a personal betting machine. This let you bet on the
best odds just seconds before the race every time. I made a lot of money doing
that.

You can kind of do it in Vegas too because the sports books aren't usually
that busy.

In fact, that hardest part of this strategy is actually figuring out who the
favorite is! The odds are usually listed as fractions, but they have different
denominators, so you quickly have to convert them all the the LCD and then
order them.

But yes, the closer you bet to post time, the better your outcome with this
strategy will be.

~~~
LiamPa
The closer to the off the more accurate the price is (wisdom of the crowd)
thus less value, you need to know something others don’t to win long term from
taking the starting price!

I assume you are American as everyone in the UK has a personal betting machine
and it’s in their pocket.

~~~
CamTin
Not sure how it works elsewhere but horse betting in the US is almost always
parimutuel, meaning that everybody gets the post odds, regardless of when they
bought their ticket. In other words, betting early doesn't give any advantage
and in fact the later you can wait, the more accurate an idea you have of the
actual payout odds of your bet. Thus, having access to a private betting
machine, and not having to elbow other gamblers to buy a ticket at the last
minute, is a real advantage.

That said, the state's take-out is so high in most states that you need a
massive edge to actually make money. The slight edge a private betting machine
would bring probably isn't enough.

~~~
sago
If you bet at 10-1 here, your bet is 10-1 on your receipt, and your odds are
10-1. Your bet at that price may change the odds for everyone who comes later,
but that doesn't affect yours. It is quite possible to arbitrage odds between
bookmakers. Or it would be, if they didn't do such a good job of conferring
and mutually underwriting their risk themselves.

It must be crazy having the odds change on you after your bet.

~~~
surfsvammel
For Swedish harness racing there is only one bookie. The state has had
monopoly on betting in general for a long time and is very strictly regulated
also today. So, there won't be any arbitrage across bookies for these races at
least.

~~~
Scoundreller
What if... bear with me... somebody outside of Sweden took bets on Swedish
harness racing?

------
shaunray
Actually the most successful parimutuel syndicates and HNWI bet
programmatically through API’s. They consistently win and they do not make
their money from picking a single winner. They bet on exotic bets that have
much higher odds and are easier to box. For instance picking two winners or
order of the first three horses.

Source: I worked on those APIs to integrate with 20+ exchanges over three
companies and 10 years. The syndicates and HNWIs were always the same people
over that decade.

~~~
gricardo99
That sounds fascinating. I don’t suppose you have a blog, or any other write
up about that world? Even perhaps a link to one of these betting APIs, if
they’re generally available to the public?

~~~
shaunray
Betfair has a public API, but they are not parimutuel:
[https://docs.developer.betfair.com/display/1smk3cen4v3lu3yom...](https://docs.developer.betfair.com/display/1smk3cen4v3lu3yomq5qye0ni/API+Overview)

Tabcorp is an Australian Wagering Company that offers real-time horse racing
APIs: [https://studio.tab.com.au/](https://studio.tab.com.au/)

Others are a little less public and you need to sign up with an account and
prove you are a syndicate. Hong Kong Jockey Club, UniBet, Amtote.

I don't normally talk about this history as I was active in this industry
until four years ago. There are however people who have come out in public:

Bill Benter: [https://www.bloomberg.com/news/features/2018-05-03/the-
gambl...](https://www.bloomberg.com/news/features/2018-05-03/the-gambler-who-
cracked-the-horse-racing-code)

Jelko: [https://www.smh.com.au/national/meet-the-joker-the-
australia...](https://www.smh.com.au/national/meet-the-joker-the-australian-
who-is-the-biggest-gambler-in-the-world-20180515-p4zfhi.html)

Most of the other families and players are extremely private or are of Eastern
European I wont talk about them publicly.

This is a list of common bet types that are targets for these types of
players: \- Trifecta \- Big Six \- Quadrella \- First 4

Here is a blog on working the combinations of the Big Six:
[https://practicalpunting.com.au/pp-online/a-z-of-
betting/exo...](https://practicalpunting.com.au/pp-online/a-z-of-
betting/exotic-betting/big-six/my-formula-to-beat-the-big-6-20090404)

~~~
malbs
hah, how funny I almost copied your post word for word ;)

What are you up to now Shaun? Are you still in Hobart?

------
Animats
He has this completely wrong. The idea in pare-mutuel betting is not to pick
the winner. The favorite usually wins, but the house cut makes betting the
favorite a net lose.

The goal is to find underpriced bets, where the collective estimate of the
odds is far from the prediction systems's estimate.

~~~
sorokod
In this case there is no house. For this to work a player needs to be able to
back or lay an outcome.

"In harness racing, you are playing against all the other players. You are not
playing the house. The odds of any given horse winning is directly related to
the amount of money on it."

~~~
scottlocklin
I'm pretty sure these races are pari-mutuel where the house takes a vigorish.
The sentence you give is basically the definition of pari-mutuel without the
vig.

FWIIW the author is wrong that this problem is unstudied. There are actual
hedge funds which do nothing but this and a quick google search provides
numerous academic examples; even a github repo. "Deep learning" is probably
the worst approach I can think of to the problem.

I'm also guessing he misunderstood the data if betting on the favorite
compounds like that. If you make the race pay off at something other than the
actual odds, that would probably do it.

Examples from the literature:

[https://github.com/dominicplouffe/HorseRacingPrediction](https://github.com/dominicplouffe/HorseRacingPrediction)

[https://www.sciencedirect.com/science/article/pii/S016792361...](https://www.sciencedirect.com/science/article/pii/S016792361200379X)

~~~
surfsvammel
It doesn’t say that horse racing is understudied. It is saying that harness
racing is understudied. Normal horse racing is very different to harness
racing.

Also. The simulated betting on on the favourite on those 26000 races was with
one dollar per race. If the favourite win, the balance increases with 1$*the
odds. If the favourite does not win, the balance decreases with 1$. Nothing
strange about it, and as I tried to explained, no compounding involved. Flat
betting. 1 $ per race. Anyway. It was simulated on historic data and also a
side note as it didn’t really have anything to do with the project itself.
Appreciate the comment.

~~~
soVeryTired
I don't mean to sound rude, but I don't believe the results in your $1 * 26000
races graph.

I spent a few years in a hedge fund building algorithmic trading strategies,
and whenever I saw a graph (we called them 'account curves') like that, it
inevitably meant that there was a bug somewhere. Usually the strategy is
somehow non-tradable, either because the market is closed or the system is
borrowing data from the future.

The graph shows you making about $55,000 over 26,000 races (i.e. about a 100%
return) with no big swings. If the effect was that big, wouldn't people know
about it already? Despite the fact that you can't apply the strategy
_everywhere_ , couldn't you settle for making a little money at your local
betting shop?

Your theory is that there are more informed bettors betting at the last
minute. That suggests that the market odds spike towards the 'correct' odds as
betting is about to close. If someone proposed your 'bet on the favourite'
strategy to me, I would have asked to see evidence of that spike. Are you able
to get access to real-time odds to test the theory?

~~~
beezle
Betting $1 on 26,000 races:

Stated win rate betting on the post time favorite is 37%

This equates to a loss of 16,380 of original 26,000. Earnings must then come
from the 9,620 bet on favorites that won.

Assuming the 55,000 is winnings (and not a total including original 9,620),
this would equate to an average odds of better than 11/2\. If reduced by the
9,620 then odds come down to a bit over 9/2

My experience in horse racing (regular and harness) is that odds on the
favorite of 11/2 (or even 9/2) are extremely rare and certainly not the
average. A realistic figure is 3/2 over the various field sizes (smaller
fields tend to go close to even money, larger to 2/1 or so).

So yeah, the data as presented appear flawed.

------
gorkish
Using computers to beat masses of the general public at horseracing in this
fashion has been going on for well over 30 years. I first read articles about
the gambling teams and data scientists behind them in Hong Kong in the mid
90's. The teams had data capture people entering realtime odds from the tracks
at Sha Tin and Happy Valley and hired private tellers to enter thousands of
computed bets moments before a race began from their offtrack locations in
downtown HK highrises.

The thing that makes it easy to simulate but impractical to execute is that
getting enough bets in fast enough and late enough to tip the odds to your
favor to give you an edge while limiting losses is difficult to do. It's even
harder to do without this wagering itself affecting the odds. For one there
often just isn't enough money wagered on a race. In the heyday of the HK races
you'd often have 100 million+ HKD wagered on every single race, much of that
coming from the general population -- individuals going to the track. There
are only a few big event races in the US or Europe that will attract that kind
of betting pool. Even HK is no longer like that.

~~~
brisance
Bloomberg had an article on this, last year.

[https://www.bloomberg.com/news/features/2018-05-03/the-
gambl...](https://www.bloomberg.com/news/features/2018-05-03/the-gambler-who-
cracked-the-horse-racing-code)

------
LoSboccacc
> The last results, after remodeling the problem, came to be on par with the
> odds themselves. Around 37%. This time my AI was as good as consensus at
> betting harness racing.

using the same dataset as betters with a NN is not really better than running
a regression on it, so this doesn't really surprise me.

nn are able to find patterns with things others than historical series. taking
this forward would need to combine historical race sets external sources that
could have correlated information available that's not easy to extract by
simple regression, like news and articles whenever available or meteorological
data or whatever, just feed it everything and let the nn sort itself the
useless data out

~~~
surfsvammel
That’s in fact what was done. Facts about the races (everything except for the
actual odds for the horses) combined with Google Maps API. It basically
predicted the odds of the horses (more or less).

------
EGreg
I found this nugget to be the most valuable:

 _Simulated flat betting 1$ on 26 000 races over 15 years shows a hefty return
on investment_

If this is true, why don’t we all just go and hire people in Sweden to put in
bets for us at the last minute, based on the favorite horse, and pay them out
of the profits? We pay them to do it and we collect the winnings.

If this strategy works it may work elsewhere too. Why not just do that all day
long with an army of people? And you aren’t hurting who is the favorite
either.

~~~
stefs
well, for once, just because it works it doesn't mean it's worth it. a couple
of years ago i read a story about a man who cracked a scratch-off lottery
ticket code (the serial numbers were biased). he could have earned more than
with his day job if he devoted his time fully to gaming the system, but ... it
decided it wasn't good enough - a mindless, robotic task, ending in misery
soon.

the take away of the story was that the method was probably known by crime
syndicates and used for money laundering, as they have cheap pseudo-slave
labour available. but if you're a well educated person with a fulfulling job,
you'd go insane soon (and of course, job security - the second the company
changes their system you're out).

so with betting, you'd have to invest a lot of time and energy for it to
become profitable. as he wrote, you'd have to be at the race tracks and wait
until the last possible moment, then driving to the next stadium. and if it
was _that_ easy to create a real winning system that depended on local help,
the swedes would do it themselves - so your hired goons might stay as long as
they needed to understand your system and then continue on their own (lowering
your margins in the process).

if the process could be completely automated, i.e. with online betting, the
story might be different. but i'm sure there are already a lot of syndicates
looking into that (even the analog version). for them it doesn't even have to
be profitable. 80% return is good enough for money laundering and they can
out-finance the legal competition.

~~~
chucksmash
I remember a very similar story, it stuck with me from a couple of years ago:

> His next thought was utterly predictable: "I remember thinking, I'm gonna be
> rich! I'm gonna plunder the lottery!" he says. However, these grandiose
> dreams soon gave way to more practical concerns. "Once I worked out how much
> money I could make if this was my full-time job, I got a lot less excited,"
> Srivastava says. "I'd have to travel from store to store and spend 45
> seconds cracking each card. I estimated that I could expect to make about
> $600 a day. That's not bad. But to be honest, I make more as a consultant,
> and I find consulting to be a lot more interesting than scratch lottery
> tickets."[1]

[1]: [https://www.wired.com/2011/01/ff-
lottery/](https://www.wired.com/2011/01/ff-lottery/)

------
yeahwhatever10
Interesting comparison to the old fashioned way:
[https://www.bloomberg.com/news/features/2018-05-03/the-
gambl...](https://www.bloomberg.com/news/features/2018-05-03/the-gambler-who-
cracked-the-horse-racing-code)

------
mimixco
The OP writes repeatedly that the "odds of winning are related to the amount
bet on each horse." This is not correct. The _payout_ is related to the number
of people betting. The odds of winning are based on the horse's DNA, training,
competitors, etc. The horse does not know who bet it on it while it's running.
It's easy to confuse the word "odds" in gambling because that often refers to
payout, but these are not the odds of winning.

------
malbs
Here's my 2 cents.

Anyone who is telling you they have cracked the parimutuel pools, and "here's
how to do it", is trying to sell you something else. Anyone who has actually
cracked the parimutuel pools knows how they work, and that if you share your
"tips", or selections, you only serve to diminish your own returns, so you are
not going to do it.

~~~
mimixco
Indeed. This is the origin of the expression "break a leg." At the track, you
could never wish someone else good luck because their good luck will cost you
money. Instead, you wish their horse breaks a leg so he'll be eliminated,
moving the money in his pool to yours if you win.

------
slm_HN
He was unlucky in picking harness racing. Standard horse racing is already
corrupt, but harness racing takes it to the next level. The rules and the
addition of the harness make cheating much more prevalent.

I'd rather bet on Professional Wrastling than harness racing.

------
brootstrap
appreciate the write up but is this even AI? It seems like he just did some
analysis & stats on observed data, and came up with a simple model.

~~~
surfsvammel
The article doesn’t really say, as this was a write up of a 20min talk at a
conference for economists. The main model where vanilla ANNs. Also SVMs and
LSTMs where tested but unsuccessfully. As I had a lot prior experience writing
Java, most models where implemented in the beginning in DeepLearning4J but was
in the later stages moved over to the frameworks from Fast.ai. I’d be happy to
do a write up of the details at some point. The majority of work was in fact
the data pre-processing.

~~~
brootstrap
Ah, no need to give me all the details good sir. Appreciate the post. Also
yeah, sounds about right with the pre-processing. That is always the case!

