
Ask HN: Football statistical analysis - socrates1998
Hi HN, I have asked around my local network, but no one seems to be able to point me in the right direction.<p>I was a High School football coach for about 10 years, but now I am self employed and am doing some analytical work for some local football teams.<p>Anyway, in football when breaking down an opposing teams offense, you have a bunch of data. What I think is happening is that the opposing coach is calling plays to the right and left in almost 50&#x2F;50 split, you can see it in the data.<p>I want to be able to compare the other teams decision with a truly random data set.<p>So, if the coach calls three Right plays in a row, how does this compare to his decisions in the past AND how likely is he to call either right or left on the next play.<p>I was told it had to do with some Bayesian statistics, but I am not sure.<p>Here is some example data of a recent breakdown I did last season:<p>https:&#x2F;&#x2F;www.dropbox.com&#x2F;s&#x2F;3m9mr24rahwwzoz&#x2F;St%20Thomas%20Aq%20Break%20down%20data%202016%20copy.xlsx?dl=0
======
usgroup
Looks to me like you're trying to figure out the conditional probability of
L/R given a bunch of other factors, and you're looking to tune the model using
historical data.

Your factors look to be both categorical and numeric. E.g. off-play and yard
line. This given, I'd start by training a conditional inference tree with 60%
of the data. Amongst other things it'll give you a good idea of how your
factors are related to each other and which matter (ctree is not difficult to
interpret). You can then use the other 40% to test how well your tree
predicts. Then, you might try other in a more informed manner.

If the above paragraph made no sense it may be advisable to out-source the
task. There's quite a lot to the modelling effort.

~~~
socrates1998
Thanks for the feedback. Ideally, I would like to do it the way I think you
are describing it. Get a bunch of factors and see how they all interact with
each other. And try to get the probabilities of all the different scenarios.

For example, if the coach runs the ball to the right in the red zone, what is
the probability that he runs it again to the right on the next play.

The problem with trying to input a bunch of factors and trying to get the
conditional probabilities of all those situations is that it can get
exceedingly complex and I don't think it will be particularly useful given my
small data sets.

I normally only have about 100 offensive plays to analyze and use data from.

My theory is that if I compared the R/L direction of the plays that the
opposing coach has historically done in the last two or three games with a
truly randomization of R/L data, that the coach will severely regress to the
mean because he doesn't like the idea of going 5 times to the right in a row.

However, I need some type of percentage comparison or data analysis to show
this instead of a subjective picture.

~~~
almostkorean
What usgroup said is correct. To give a little more background, this is a
classification problem (predicting L or R) and it is a supervised learning
problem (since you are training it on historical data).

There are many supervised learning models that can be used for classification.
The simplest ones that first come to mind are logistic regression and decision
trees. If you want to get into more complex models, look into boosting models
and random forest.

I'm not sure what your programming background is, but creating classification
models is very easy in Python using the scikitlearn library. Creating the
features that are used to train your model would be more difficult, but you
can always start with something simple and iterate.

It's also important to know how to measure the performance of your model. In
this situation, plain accuracy will probably be OK assuming predictions are
roughly 50/50 chance. But AUC is typically the standard used to measure model
performance.

------
jetti
While I'm a novice in this area, you may want to look into Markov Chains. It
only looks at the last state before making a decision but you can determine
how big your state is. You could have 3 plays be a single state and go after
which three plays come after, for example.

~~~
socrates1998
Great, thanks for the idea. I will look into Markov Chains.

