

Selection Sunday: Is your March Madness prediction algorithm ready? - danger
http://blog.smellthedata.com/2011/03/official-2011-march-madness-predictive.html

======
aothman
If you're interested in this kind of stuff, Ken Pomeroy (kenpom.com) runs a
fantastic basketball analytics site. He's a big proponent of what are called
"tempo-free stats", which aim to filter out issues of playing speed from
scoring (a team that plays quickly will score a lot of points, nearly
independently of whether or not they are winning). Tempo-free stats instead
count possessions; one interesting statistic is that the average team this
year in college basketball produced 1.01 points per possession - such a tidy
figure to emerge from the chaos.

In terms of predictions, one of the most interesting teams this year is Kansas
(<http://kenpom.com/team.php?team=Kansas>). They've only lost twice but have a
large number of narrow home wins. Depending on how your algorithm treats those
wins they either look like a team that will struggle to reach the sweet 16 or
like a potential national champion.

~~~
danger
One other issue that comes up is "garbage time". When a game isn't close (say
in the last quarter of a blow-out), the stats are basically meaningless. Does
Pomeroy have good ideas about how to deal with that?

------
kenjackson
Why can't people use any dataset they want? I think the rule should be, "Use
any data you like, but you must submit any data used for use by the rest of
the field".

~~~
danger
This was the intention. Around a month ago we started asking what data people
would like to use. We incorporated some of that feedback to decide what data
to use for this year.

If you have other suggestions, please let us know, and we'll add it for next
year (if possible). The only thing we're trying to avoid is somebody coming in
with a lot of data at the last minute, beyond the point when anybody else can
realistically get it incorporated into their model.

~~~
kenjackson
Actually that makes sense. I didn't realize that you did the request for data
earlier. I may have to do this next year, if you're still doing it.

~~~
danger
Yeah, unfortunately the "getting the word out" could have been done better.

------
listrophy
See my unrefactored-ruby-from-two-years-ago solution.
<https://github.com/listrophy/lazy_ncaa>

It's pitifully simple... it just uses the historical probability of seed A
beating seed B in round C. Going into the Elite 8 last year, it produced a
bracket in the top 800 on ESPN.com. Of course, the other three brackets it
produced (and I posted) failed miserably.

------
bumticks
No

------
ycnewsname
What does last years algorithm look like on this years tournament.

~~~
danger
I haven't run it yet. It should be done in time to enter the contest, though
(but it will just be a baseline--i.e., not eligible to win prizes).

------
zdw
Oh boy, basketball. The only thing less interesting than Gruber's baseball
related posts, other than Golf on TV.

I just can't get into watching sports - I love playing them (soccer and
ultimate frisbee on the weekends) but watching them seems pointless for some
reason. My wife watches more sports than I do, and she watches figure skating
and track & field, which comes on like 4 times a year...

Anyone else feel this way?

~~~
vyrotek
I had to reply because my wife watches more sports than I do as well. When the
in-laws get together for things like the Super Bowl they let me hide in the
basement to work. I'm definitely no good at actually playing sports, but I'd
much rather do that than watch someone else play them.

