They also include live updates while games are playing.
I'd be curious to see if you get any substantially different results using structured data (nflgame gets it from NFL.com's JSON feed) as opposed to parsing the text descriptions.
 - https://github.com/BurntSushi/nflgame
 - https://github.com/BurntSushi/nfldb
The data is from play-by-play text parsing going back to 2000.
I like finding projects in the wild that do creative things with nfl.com data. This guy is building an Arduino Fantasy Football trophy: https://github.com/sambrenner/future-trophy And this person built an OSX app using score strip XML for current scores: https://github.com/kchau/NFL-Menu
Wow, nice! Tell me, how many different unique identifiers to you have for each player/game? :P (Elias id, GSIS id, profile id, ...)
> I see a lot of 404s on Game Center data feeds.
Hmm, I'm not sure what you mean? It seems like the URL stays the same: http://www.nfl.com/liveupdate/game-center/2012080953/2012080...
> Not sure how to reduce the 404s, suppose we could document the feeds and make them openly available.
Yeah, that'd be great! I had figured that you guys kept quiet about them purposefully.
One of the things that has bit me is that the JSON feeds at the URL above only exist back to 2009. I haven't been able to discover a similar feed for older games. Any ideas? :-)
Those projects are pretty neat, btw. The trophy one is really cool.
I'd definitely recommend people use this over the download at the bottom of the blog post, it's really painful to parse the free text (there's a lot of weird edge cases). I'll add a link to this stuff in the post.
update: link added to post
I can only imagine. I've tried motivating myself to do it a few times so I could increase the amount of data in nfldb (I believe they are available in one form or another all the way back to 1999), but it's a rather daunting task when there are so many statistical categories.
 - https://github.com/BurntSushi/nfldb/wiki/Statistical-categor...
Thank you so much for helping to spread the word. I really appreciate it!
(But there haven't been much---if any---changes to the JSON feed's structure. At most, they add some statistical categories.)
About 45% of all games finish with a spread of 7 or less according to a quick search. Making a play that has a close to 50% chance of making you be down 3 points is costing you a lot of the margin if you think you are a close favorite.
You can't win more in football but you can lose a sure win, so if you believe you are say a 3-4 point favorite then the right play is to take the field goal every time - giving up the safe points means you take half the games and make them a crap shoot.
As Bill Barnwell of Grantland is fond of saying, in the first half, your only goal is to maximize your expected points. You don't know exactly how many points you're going to need in the game until the end.
His phrase doesn't really have any impact on the argument being made here unless you really dissect the relative importance of offense versus defense in the game, which requires a lot more evidence to make a case one way or the other.
To illustrate: in an infinitely long game, you always pick whatever gives you the highest expected value. In a game with one play left, you always pick the play with the highest probability to put you over the amount of points the opponent has, even if has a lower expected value than other plays.
The quote is saying that the first half is more similar to playing an infinitely long game than a game with one play left.
I believe the NYTimes "4th Down Bot" actually performs a likelihood-of-winning analysis, as well.
For example, let's say you are the coach of the Patriots and you have been running the ball very successfully the past few games, even winning games because of your running game.
And it's 4th down and 2 yards to go against the Broncos.
The data says, run the ball. Especially since you have done it well in the past. However, you forgot to take into account the stud defensive tackle that has just started playing really well for the Broncos. So, you try to run the ball and you lose the game.
This is just one example of the inability of data to deal with match-ups and schemes.
As both a person who likes data and coached football, I would love the integration of the two, but football has too many variables.
If you have all this data, you are actually going to make the wrong decision because the matchup is bad for your team.
Matchups and schemes trump data.
Coaches do what they think will be successful. If they can't run the ball well, then they don't run the ball, skewing the data.
Also, tendencies change from week to week. You really need a deep understanding of football to get the right, correct data for that week.
This is why data hasn't dominated football, you need to have tons of knowledge before anything else.
It's difficult, but data can in fact control for matchups and schemes.
In this case, the positive outcomes for the offense are a first down (or score, which would also be a first down). If the data says you will pick up a first down 51% of the time and you know the Broncos tackle has recently begun playing well, you adjust down for it, and likely decide not to run (and no analytics guy will fault you for it). If the data says you'll pick up a first down 85% of the time, you need a better reason to justify not going for it (is this new tackle the greatest of all time? Or just an upgrade on what they had?). Data is not best used as a replacement for a coach. It is best used to give a coach the likelihood of a given event occurring, and then let the coaches knowledge of matchups, injuries, and personnel issues adjust that likelihood up or down.
Sassy. Jesus Christ.
Made the message a bit friendlier for future folks, thanks for the feedback :)
The its most likely that the optimal solution is a mixed strategy; you're going to need to mix it up to keep defenses honest.
You might want to make it clear you want the decisions most likely to succeed, not the decision most common among professional coaches (who are presumably optimizing some other form of career-stability-against-criticism).