

Ask HN: I have 15 years of horse racing data what should I do with it? - mcgin

The data covers 203,916 races.  I have info trainer, jockey, individual horses, betting odds, and a multitude of other things.  I have a few ideas for what I can do with it, but thought I'd get some input before starting work on it.
======
imp
Do you have a background in econometrics or regression analysis? You might be
able to develop a model to predict winning horses.

~~~
mcgin
That is one of the plans I have in mind

------
chrisclark1729
Do you have info on the way the lines change from initially being set to post
time? That shit is totally fixed, if you could figure out what certain line
changes mean in terms of predicting a winner you could perhaps make some
money.

~~~
mcgin
Unfortunately no, its purely the prices at the off which I have available.
However I am planning on monitoring future races to see how their prices
change over time

------
noomerikal
post it on <http://infochimps.com/>

------
znmeb
Just off the top of my head, I'd say it's pretty much useless without a
betting strategy. I'd recommend starting with "Dr. Z's Beat the Racetrack".
It's out of print, but you should be able to find a copy.

------
Jun8
How about creating a simple site where you simulate a horse race. i.e. give
users part of the history data for horses, ask them to select the winners, and
then reveal the true outcome. People would be betting for boast points.

A discussion board where people can discuss their heuristics for selecting
winners would be great. For the famous horses, you can link to its Wikipedia
page, Flickr photos, etc. if they exist to create a more immersive experience.

I, for one, would play with such a site for a while.

~~~
Ataraxy
Just to add to this, I would center it around the 'process' of trying to
select the winners if this is all being done off historical data, otherwise
people will just game the system via google/wikipedia.

That being said, I definitely like the idea of having it be a social sort of
tool where people compete on the winners.

Perhaps using the data you can somehow figure out how to generate random horse
races based on real historical data and real horses and then people can really
try to pick the winners instead of game the system for points through a simple
google search.

------
patternexon
Share it!

------
notahacker
Are there any free/commercial sites with comparable datasets? I can see the
data itself having a not-insignificant market value if it's of sufficient
quality.

~~~
mcgin
Unfortunately it is scraped from various sources on the web so it cannot be
used commercially

------
Adrock
Use it as a training data set for a bunch of machine learning algorithms and
go pro, or discover that it's a really hard problem.

------
korch
_Share it on Bit Torrent!_ Large, accurate real-world datasets are difficult
to find for the purposes of testing and experimenting with various machine
learning algos.

~~~
mcgin
Good idea, Ill create a torrent once I get the chance

