Hacker News new | comments | ask | show | jobs | submit login
My journey applying AI to horse racing (towardsdatascience.com)
100 points by surfsvammel 17 days ago | hide | past | web | favorite | 59 comments

This is another case where being rich lets you get richer. :)

I had a friend who had a private box at a horse track. One of the benefits of this box was that you had a personal betting machine. This let you bet on the best odds just seconds before the race every time. I made a lot of money doing that.

You can kind of do it in Vegas too because the sports books aren't usually that busy.

In fact, that hardest part of this strategy is actually figuring out who the favorite is! The odds are usually listed as fractions, but they have different denominators, so you quickly have to convert them all the the LCD and then order them.

But yes, the closer you bet to post time, the better your outcome with this strategy will be.

The closer to the off the more accurate the price is (wisdom of the crowd) thus less value, you need to know something others don’t to win long term from taking the starting price!

I assume you are American as everyone in the UK has a personal betting machine and it’s in their pocket.

Not sure how it works elsewhere but horse betting in the US is almost always parimutuel, meaning that everybody gets the post odds, regardless of when they bought their ticket. In other words, betting early doesn't give any advantage and in fact the later you can wait, the more accurate an idea you have of the actual payout odds of your bet. Thus, having access to a private betting machine, and not having to elbow other gamblers to buy a ticket at the last minute, is a real advantage.

That said, the state's take-out is so high in most states that you need a massive edge to actually make money. The slight edge a private betting machine would bring probably isn't enough.

If you bet at 10-1 here, your bet is 10-1 on your receipt, and your odds are 10-1. Your bet at that price may change the odds for everyone who comes later, but that doesn't affect yours. It is quite possible to arbitrage odds between bookmakers. Or it would be, if they didn't do such a good job of conferring and mutually underwriting their risk themselves.

It must be crazy having the odds change on you after your bet.

Yeah the whole idea of parimutuel betting is that there is no "bookie" and that you are betting against the other gamblers. No central entity is taking a side in the bets. This makes sense when you realize that the central entity is, for example, the State of Kentucky. Do you want a bureaucrat making millions of dollars in bets on behalf of taxpayers? Probably not. Instead, let's just collect all the bets from race fans, take out say 5% for the state, and then pay out to the fans holding tickets for the winning horse (forget about exotics like exactas and trifectas to keep things simple, but the idea is the same). That way, we don't need to have handicapping experts set odds in order to ensure the state makes money. Instead, it's all simple math that can easily be audited and still guarantees a profit. The side-effect of this is obviously that you can't know the final odds until the last ticket has been sold at post time.

For Swedish harness racing there is only one bookie. The state has had monopoly on betting in general for a long time and is very strictly regulated also today. So, there won't be any arbitrage across bookies for these races at least.

What if... bear with me... somebody outside of Sweden took bets on Swedish harness racing?

Everyone in the US can bet using his or her smartphone, with the current odds shown up the second the gate is opened. There's no advantage in a private betting machine. Some online betting tools even let you set parameters that place your bet automatically if and only if certain odds are reached, so you don't have to stand there and watch it. Of course, for casual bettors, that's half the fun.

Unless you’re talking about last minute info like the horse breaking its leg walking to the racing stall, this sure sounds like people who swear they can read the roulette wheel better when they wait til it’s spinning to bet.

You can bet directly from your phone while standing at the rail if you want to.

Actually the most successful parimutuel syndicates and HNWI bet programmatically through API’s. They consistently win and they do not make their money from picking a single winner. They bet on exotic bets that have much higher odds and are easier to box. For instance picking two winners or order of the first three horses.

Source: I worked on those APIs to integrate with 20+ exchanges over three companies and 10 years. The syndicates and HNWIs were always the same people over that decade.

The most successful such syndicate in the world is run by a purported billionaire by the name of Zeljko Ranogajec. He made most of his money beating horse racing. Here’s a fascinating article on him and his exploits:


That sounds fascinating. I don’t suppose you have a blog, or any other write up about that world? Even perhaps a link to one of these betting APIs, if they’re generally available to the public?

Betfair has a public API, but they are not parimutuel: https://docs.developer.betfair.com/display/1smk3cen4v3lu3yom...

Tabcorp is an Australian Wagering Company that offers real-time horse racing APIs: https://studio.tab.com.au/

Others are a little less public and you need to sign up with an account and prove you are a syndicate. Hong Kong Jockey Club, UniBet, Amtote.

I don't normally talk about this history as I was active in this industry until four years ago. There are however people who have come out in public:

Bill Benter: https://www.bloomberg.com/news/features/2018-05-03/the-gambl...

Jelko: https://www.smh.com.au/national/meet-the-joker-the-australia...

Most of the other families and players are extremely private or are of Eastern European I wont talk about them publicly.

This is a list of common bet types that are targets for these types of players: - Trifecta - Big Six - Quadrella - First 4

Here is a blog on working the combinations of the Big Six: https://practicalpunting.com.au/pp-online/a-z-of-betting/exo...

hah, how funny I almost copied your post word for word ;)

What are you up to now Shaun? Are you still in Hobart?

Betfair API is 100% free for customers with a lot of code examples in various languages.

Tabcorps TABStudio - http://studio.tab.com.au/ (You need to register as a tabcorp customer but it's "free"). Its basically a JSON REST API

NZ TAB have their batch mode, at filebet.tab.co.nz, which is nothing more than a file upload system.

The ATG, and the other Scandinavian totes provide coupon upload systems (where you generate a coupon file with your selections in it, using third party software), but if you win they will end up banning you.

Most providers keep their APIs hidden or require signing NDAs or other types of contracts, I'm referring to companies like Amtote, Churchill Downs, PMU, BetFred, PGI, LeTurf etc

Usually they're quite surprised when you contact them about doing "integration work", a lot of the APIs are designed for retailer shops to integrate with. For example, I've done some integration work with BetFred, where my client never wants to cancel a bet (placed at the last possible moment), however in order to pass their integration tests, I have to fully support cancelling a wager, just like any retail shop would.

On a side note, I have used (at least one of) the APIs shaunray was referring to that he had worked on.

not horse racing, but there's also a book by Steven Skiena titled "Calculated Bets: Computers, Gambling, and Mathematical Modeling to Win" -- it's about the author's interest in betting on jai-alai matches. there's a bit of stuff about parimutuel betting & understanding how the structure of jai-alai round robin tournaments can produce pretty unusual odds in some scenarios, which can be exploited.

technologically the book is a couple of decades out of date, but it's probably worth a read.

He has this completely wrong. The idea in pare-mutuel betting is not to pick the winner. The favorite usually wins, but the house cut makes betting the favorite a net lose.

The goal is to find underpriced bets, where the collective estimate of the odds is far from the prediction systems's estimate.

In this case there is no house. For this to work a player needs to be able to back or lay an outcome.

"In harness racing, you are playing against all the other players. You are not playing the house. The odds of any given horse winning is directly related to the amount of money on it."

I'm pretty sure these races are pari-mutuel where the house takes a vigorish. The sentence you give is basically the definition of pari-mutuel without the vig.

FWIIW the author is wrong that this problem is unstudied. There are actual hedge funds which do nothing but this and a quick google search provides numerous academic examples; even a github repo. "Deep learning" is probably the worst approach I can think of to the problem.

I'm also guessing he misunderstood the data if betting on the favorite compounds like that. If you make the race pay off at something other than the actual odds, that would probably do it.

Examples from the literature:



It doesn’t say that horse racing is understudied. It is saying that harness racing is understudied. Normal horse racing is very different to harness racing.

Also. The simulated betting on on the favourite on those 26000 races was with one dollar per race. If the favourite win, the balance increases with 1$*the odds. If the favourite does not win, the balance decreases with 1$. Nothing strange about it, and as I tried to explained, no compounding involved. Flat betting. 1 $ per race. Anyway. It was simulated on historic data and also a side note as it didn’t really have anything to do with the project itself. Appreciate the comment.

I don't mean to sound rude, but I don't believe the results in your $1 * 26000 races graph.

I spent a few years in a hedge fund building algorithmic trading strategies, and whenever I saw a graph (we called them 'account curves') like that, it inevitably meant that there was a bug somewhere. Usually the strategy is somehow non-tradable, either because the market is closed or the system is borrowing data from the future.

The graph shows you making about $55,000 over 26,000 races (i.e. about a 100% return) with no big swings. If the effect was that big, wouldn't people know about it already? Despite the fact that you can't apply the strategy everywhere, couldn't you settle for making a little money at your local betting shop?

Your theory is that there are more informed bettors betting at the last minute. That suggests that the market odds spike towards the 'correct' odds as betting is about to close. If someone proposed your 'bet on the favourite' strategy to me, I would have asked to see evidence of that spike. Are you able to get access to real-time odds to test the theory?

Betting $1 on 26,000 races:

Stated win rate betting on the post time favorite is 37%

This equates to a loss of 16,380 of original 26,000. Earnings must then come from the 9,620 bet on favorites that won.

Assuming the 55,000 is winnings (and not a total including original 9,620), this would equate to an average odds of better than 11/2. If reduced by the 9,620 then odds come down to a bit over 9/2

My experience in horse racing (regular and harness) is that odds on the favorite of 11/2 (or even 9/2) are extremely rare and certainly not the average. A realistic figure is 3/2 over the various field sizes (smaller fields tend to go close to even money, larger to 2/1 or so).

So yeah, the data as presented appear flawed.

Not rude at all! Appreciate the comment. I don't rule out that there is a bug in there. I did not spend much time on that part of it. It was definitely a side track of that project and had nothing to do with the main model. I do have access to real-time data, but not historic data. So cannot back-test it, that would be interesting. Real-time data can be scraped from ATG.se, the only bookie for harness racing in Sweden.

Also, I dont suggest anyone go out and place real money on anything like this.

The main point of the article is that its not that hard to get the basics of machine learning and actually doing things with it.

Both those links are on harness racing specifically not horse racing in general; and I found them in 2 seconds of obvious google search.

Like the guy beneath me: I've seen those curves before -they're always a mistake.

In the Anglosphere at least (US, AU and others?) harness and jockey racing are both bet in the same way. Harness is shown alongside other racing in Off Track Betting and sportsbooks.

scottlocklin didnt say horse racing is understudied. his sources are specific to harness racing.

Parimutuel betting works the same way regardless of what you're betting on. It could be thoroughbreds, harness racing, or jai alai. Technically, you can offer parimutuel wagering on anything, even football or car races.

Sure, but the various features which make a harness race predictable are unlikely to make other kinds of horse race predictable. Sorta like Steve Skienna's jai alai system is unlikely to work on horse races.

Who offers parimutuel betting without vig? Sign me up! The cumulative losses from vigorish are the reason that most people cannot make money with horse racing over time, regardless of their skill in picking winners or spotting edge cases. The OP is right in saying you're not playing against the house since the house/track/government takes a piece of every bet. They don't care who wins.

Betfair / any exchange allows you to back/lay.

He seems to be ignoring the overround which is effectively the house and then commission when you are banned and have to resort to the exchanges.

That’s all good. If an algorithm can predict the odds then you can compare the odds of the algorithm to those give by the bookies and voila, there you’ll find you under priced bets.

Picking the favourite to win is a measure of how well the algorithm is at predicting the odds.

I agree. The degenerate gambler in me doesn't get much pleasure out of betting on the favorite and winning when they go off at 2/5. IMO it's much more satisfying to box an exacta or trifecta including the favorite and some less favored horses and hit that.

> The idea in pare-mutuel betting is not to pick the winner. The favorite usually wins, but the house cut makes betting the favorite a net lose.

Isn't this a misstatement in the other direction? Favourites can be underrated too.

They're probably not, or at least not enough for you to beat the house cut, but that's true of every bet -- they're all likely to be net losers unless you have special information, or a special way of using it.

Exactly this. You also need to ensure that the underpriced bet actually wins or, if your strategy allows for it, places in the top 3. The odds on a place are generally too low to make this worth it though.

Value is value, doesn’t matter if it wins, its all a numbers game.

There's no payout if the horse doesn't win. It doesn't operate like the stock market.

He/she just means that if a bet is genuinely underpriced, the expected value of taking it will be positive. (So if you can reliably find these bets, you'll profit in the long run.)

The odds reflect the horse's supposed chance of winning or placing (or whatever else it takes to get a payout), not its expected position -- so this isn't a case where you can be aware that the horse is underrated, but have no way to make a +EV bet on that knowledge.

Yeah over the long term, but I thought this was already implied. All I'm saying is that "beating" other bettors is not enough to ensure success because of the means by which payouts are made. I suspect you feel this is also implied though, so I think we're probably on the same page.

Exactly, for example yesterday I placed over 5000 bets, not all of them won but as far as my strategies were concerned they were value and should therefore be in profit over the long term.

In fairness, the stock market doesn't operate "like the stock market," if by that you mean reliably reward a value pick, either.

The first order of business in these things is certainly to spot the overlay/value pick/+EV situation. The second order of business is to manage bankroll/vol to be able to realize the EV over time -- but that is generally speaking much easier and more mechanical than the core insight of (repeatably) finding the EV...

This is not correct, either. Horse racing allows bets on the second and third finishers, as well, not just the winner.

Using computers to beat masses of the general public at horseracing in this fashion has been going on for well over 30 years. I first read articles about the gambling teams and data scientists behind them in Hong Kong in the mid 90's. The teams had data capture people entering realtime odds from the tracks at Sha Tin and Happy Valley and hired private tellers to enter thousands of computed bets moments before a race began from their offtrack locations in downtown HK highrises.

The thing that makes it easy to simulate but impractical to execute is that getting enough bets in fast enough and late enough to tip the odds to your favor to give you an edge while limiting losses is difficult to do. It's even harder to do without this wagering itself affecting the odds. For one there often just isn't enough money wagered on a race. In the heyday of the HK races you'd often have 100 million+ HKD wagered on every single race, much of that coming from the general population -- individuals going to the track. There are only a few big event races in the US or Europe that will attract that kind of betting pool. Even HK is no longer like that.

Bloomberg had an article on this, last year.


> The last results, after remodeling the problem, came to be on par with the odds themselves. Around 37%. This time my AI was as good as consensus at betting harness racing.

using the same dataset as betters with a NN is not really better than running a regression on it, so this doesn't really surprise me.

nn are able to find patterns with things others than historical series. taking this forward would need to combine historical race sets external sources that could have correlated information available that's not easy to extract by simple regression, like news and articles whenever available or meteorological data or whatever, just feed it everything and let the nn sort itself the useless data out

That’s in fact what was done. Facts about the races (everything except for the actual odds for the horses) combined with Google Maps API. It basically predicted the odds of the horses (more or less).

I found this nugget to be the most valuable:

Simulated flat betting 1$ on 26 000 races over 15 years shows a hefty return on investment

If this is true, why don’t we all just go and hire people in Sweden to put in bets for us at the last minute, based on the favorite horse, and pay them out of the profits? We pay them to do it and we collect the winnings.

If this strategy works it may work elsewhere too. Why not just do that all day long with an army of people? And you aren’t hurting who is the favorite either.

well, for once, just because it works it doesn't mean it's worth it. a couple of years ago i read a story about a man who cracked a scratch-off lottery ticket code (the serial numbers were biased). he could have earned more than with his day job if he devoted his time fully to gaming the system, but ... it decided it wasn't good enough - a mindless, robotic task, ending in misery soon.

the take away of the story was that the method was probably known by crime syndicates and used for money laundering, as they have cheap pseudo-slave labour available. but if you're a well educated person with a fulfulling job, you'd go insane soon (and of course, job security - the second the company changes their system you're out).

so with betting, you'd have to invest a lot of time and energy for it to become profitable. as he wrote, you'd have to be at the race tracks and wait until the last possible moment, then driving to the next stadium. and if it was _that_ easy to create a real winning system that depended on local help, the swedes would do it themselves - so your hired goons might stay as long as they needed to understand your system and then continue on their own (lowering your margins in the process).

if the process could be completely automated, i.e. with online betting, the story might be different. but i'm sure there are already a lot of syndicates looking into that (even the analog version). for them it doesn't even have to be profitable. 80% return is good enough for money laundering and they can out-finance the legal competition.

I remember a very similar story, it stuck with me from a couple of years ago:

> His next thought was utterly predictable: "I remember thinking, I'm gonna be rich! I'm gonna plunder the lottery!" he says. However, these grandiose dreams soon gave way to more practical concerns. "Once I worked out how much money I could make if this was my full-time job, I got a lot less excited," Srivastava says. "I'd have to travel from store to store and spend 45 seconds cracking each card. I estimated that I could expect to make about $600 a day. That's not bad. But to be honest, I make more as a consultant, and I find consulting to be a lot more interesting than scratch lottery tickets."[1]

[1]: https://www.wired.com/2011/01/ff-lottery/

But why drive? Just hire some low-income 19 year old Swedes to do it. They don’t have the capital to deploy. The same could be said of any software you right... your developers don’t have the capital to get that userbase.

I also think there is a misunderstanding here. In this kind of bet, the track takes a cut. In other word, your 10-1 odds are calculated on what's left after the track took a cut.

I simulated this myself on the HK tracks: if you bet randomly _and_ don't ignore track take, your expected return on the long run is precisely $BET - $TRACK_TAKE, which was about 80% in HK. ie: you lose 20% of your money on average on each bet.

Interesting comparison to the old fashioned way: https://www.bloomberg.com/news/features/2018-05-03/the-gambl...

The OP writes repeatedly that the "odds of winning are related to the amount bet on each horse." This is not correct. The payout is related to the number of people betting. The odds of winning are based on the horse's DNA, training, competitors, etc. The horse does not know who bet it on it while it's running. It's easy to confuse the word "odds" in gambling because that often refers to payout, but these are not the odds of winning.

Here's my 2 cents.

Anyone who is telling you they have cracked the parimutuel pools, and "here's how to do it", is trying to sell you something else. Anyone who has actually cracked the parimutuel pools knows how they work, and that if you share your "tips", or selections, you only serve to diminish your own returns, so you are not going to do it.

Indeed. This is the origin of the expression "break a leg." At the track, you could never wish someone else good luck because their good luck will cost you money. Instead, you wish their horse breaks a leg so he'll be eliminated, moving the money in his pool to yours if you win.

He was unlucky in picking harness racing. Standard horse racing is already corrupt, but harness racing takes it to the next level. The rules and the addition of the harness make cheating much more prevalent.

I'd rather bet on Professional Wrastling than harness racing.

appreciate the write up but is this even AI? It seems like he just did some analysis & stats on observed data, and came up with a simple model.

The article doesn’t really say, as this was a write up of a 20min talk at a conference for economists. The main model where vanilla ANNs. Also SVMs and LSTMs where tested but unsuccessfully. As I had a lot prior experience writing Java, most models where implemented in the beginning in DeepLearning4J but was in the later stages moved over to the frameworks from Fast.ai. I’d be happy to do a write up of the details at some point. The majority of work was in fact the data pre-processing.

Ah, no need to give me all the details good sir. Appreciate the post. Also yeah, sounds about right with the pre-processing. That is always the case!

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact