
Show HN: Applying Machine Learning to March Madness - adeshpande3
https://github.com/adeshpande3/March-Madness-ML
======
solidasparagus
This is a fun project and there's plenty to learn even if you don't end up
with a great model.

But if you really want to try to solve this problem, you're going to need more
granular data - play-by-play, per-play lineup, injuries, player tracking if
you can get your hands on it. The stats you are using are very lossy summaries
of the season, so they aren't very strong predictive features.

From a technical POV, consider bucketing some of those per-game stats (e.g. 4
binary features representing which quartile the team's stat falls in compared
to every other team that season). This can help to adjust for year-to-year
differences. Work with pace-adjusted stats if you have access to them. Find a
baseline accuracy by picking the simplest possible non-ML strategy and
measuring how accurate that is (e.g. what is the accuracy of a model that
always picks the better seed or W/L record?).

You need to adjust your training data to only use data that would have been
available at the time of the game - if W/L or PPG includes data from future
games, this is a form of data snooping and will probably give you results on
your test set that won't generalize to the real world. Time-series snooping is
a very easy mistake to make, but it's crucial to avoid it in order to build a
good model.

Interesting work, thanks for sharing!

~~~
ham_sandwich
You’re right. Like a lot of other engineers, I once thought “ML+High level
team info=$$$$” but quickly learned that you really can’t get an edge unless
you’re digging into that more granular data and even then it’s really tough.
It can be very hard to improve on simple linear models.

The odds coming out of Vegas are usually priced correctly. Sports markets are
very efficient—although perhaps not as ruthlessy efficient as public equity
markets. I would imagine there are still syndicates out there that are the
“RenTec of sports betting” and just printing alpha.

~~~
nrjames
This is a great article about how the actions of some players are very
predictive of the outcome of the game even if their stats don't reflect that:
[https://www.nytimes.com/2009/02/15/magazine/15Battier-t.html](https://www.nytimes.com/2009/02/15/magazine/15Battier-t.html)

------
QuackingJimbo
> One in 9.2 quintillion. Those are the odds that you will correctly pick the
> winners of all 63 games played over the course of the tournament.
> Mathematically speaking, there are 2^63 (~9.2 quintillion) number of ways
> that you can fill the bracket

This is wrong. Many of the games are not even close to 50-50.

~~~
dagw
According to: [https://math.duke.edu/news/duke-math-professor-says-odds-
per...](https://math.duke.edu/news/duke-math-professor-says-odds-perfect-
bracket-are-one-24-trillion) by using all available knowledge from seeding and
betting odds you can get your odds down to a mere 1 in 2.4 trillion. Or
perhaps even as good as 1 in 128 billion if this guy:
[https://www.youtube.com/watch?v=O6Smkv11Mj4](https://www.youtube.com/watch?v=O6Smkv11Mj4)
is to be believed.

~~~
jsjohnst
So even in the best case you list (which I share your skepticism of), you are
1,000x more likely to win the MegaMillions or Powerball jackpot than pick the
perfect winning grid.

To put the odds of picking a perfect grid by hand in rough perspective, it
would be like winning the lottery, boarding a plane that crashes into the
ocean, being the only survivor floating in the ocean, but then are struck by
lightning 3 times and yet you live, only to then be eaten by a shark, all in a
single day.

~~~
dagw
_So even in the best case you list (which I share your skepticism of), you are
1,000x more likely to win the MegaMillions or Powerball jackpot than pick the
perfect winning grid._

That makes me wonder what odds you'd get at the bookmakers for a perfect
winning grid.

~~~
jsjohnst
I think I remember seeing a prize somewhere of an obscene amount of money
(memory is hazy, but was something like over $100M, maybe even a billion) for
a perfect winning grid.

Edit: it was a billion.

[https://genius.com/Warren-buffett-billion-dollar-march-
madne...](https://genius.com/Warren-buffett-billion-dollar-march-madness-
bracket-annotated)

~~~
dagw
Well since submission is free that bet has a positive expected value! Much
better than the lottery.

------
matt4077
Here's the Kaggle competition for this year, with many more potential starting
points and ressources: [https://www.kaggle.com/c/womens-machine-learning-
competition...](https://www.kaggle.com/c/womens-machine-learning-
competition-2019)

------
dandigangi
Our non-data scientists are eager to steal this so they have a chance to beat
our data scientists. Someone asked me how to install Python to run this.

XD

~~~
loblollyboy
Interesting blog post but looks like they'd be better off just guessing

------
rococode
Cool project! It sounds like you've run this in previous years, so I'm curious
- how well has the model done in the past?

~~~
aaaaaaaaaaab
Probably about as well as trying to predict the result of a coin toss.

~~~
throwawaymath
If that were actually the case, this model would be far and away the best ever
developed!

------
bitxbit
I’d be interested in seeing something like this to create winning game
strategies that coaches can utilize.

~~~
navigatesol
This exists, kinda:

[http://www.sloansportsconference.com/wp-
content/uploads/2018...](http://www.sloansportsconference.com/wp-
content/uploads/2018/02/1006.pdf)

There was a mainstream article written about the tech a while back, which had
GIFs demonstrating simulations, but I failed to find it. Essentially, it would
model what the players did versus what they _should_ have done, optimally.

~~~
solidasparagus
This one? [http://grantland.com/features/the-toronto-raptors-sportvu-
ca...](http://grantland.com/features/the-toronto-raptors-sportvu-cameras-nba-
analytical-revolution/)

