> Football data in Europe is so expensive Depending on which data you need, ther...

eeeuo · on July 4, 2018

I would argue that almost all of that information in your post is stats not data.

The type of data that people in this thread are talking about would be more in-line with detailed positional information about each of the players on a football pitch over 90 minutes. In a cricket context, it would be more along the lines of the exact release angle and speed for each of the bowlers.

This type of information is clearly available, as Michael Caley is able to quickly generate xG maps for an entire game[1], but I do not believe it's public.

Your [9] link points out that much more information is available to baseball betters, but even baseball has a significant walled garden in terms of data. For example, the raw data used to generate the stats in [2] is not open to the public.

[1] https://twitter.com/caley_graphics

[2] https://www.youtube.com/watch?v=tzPKlQXo6hk

phillc73 · on July 4, 2018

You make a good point and my post requires clarity.

My links were all to post-event data, not live in-play data sources. I still wouldn't call, for example in a cricket match, the number of wickets taken by a bowler a stat. It's just data. A stat is derived from the data, for example bowling stike rate or economy. Or that a trainer had a winner at a certain race track. That's just the post-event data. If you want to derive further statistics, you have to calculate it yourself.[1]

The links above just have, for the most part, raw event data.

[1] https://blog.betwise.net/2018/06/19/loops-with-r-creating-a-...

eeeuo · on July 4, 2018

The number of wickets taken is a stat. The raw data that informs it is the collective set of all balls bowled by a bowler.

I'm not being needlessly pedantic, it's an important distinction when considering the level of analysis that one is able to perform. If you are doing major cricket analytics, you need ball-by-ball information, including as much information about the bowler's position, movement and arm motion, batter's position, movement and stroke information, how the field is set up, conditions of the pitch, situation in the match, etc.

For example, consider a situation where we're attempting to compare two bowlers. Bowler A may have got a wicket off a shot that 95% of batters would not play, whereas Bowler B did not get a wicket despite bowling a ball that achieves a wicket 10% of the time. The stats suggest that bowler A is in better form, but a data-driven view of the game suggests that bowler B is actually in better form.

As it stands, stats are available in abundance for every major sport, but detailed data is not. If a better had access to the latter, and they were were able to parse it with an in-depth understanding of the sport, they'd be at a huge advantage versus betters that did not, and they would reap the benefits.

PaulRobinson · on July 6, 2018

This is a good list, but as others have said, is not the level of detail I'm talking about.

Take the NBA for example. Let's look at this: http://toddwschneider.com/posts/ballr-interactive-nba-shot-c... - this is able to give you super detailed analysis thanks to the NBA's stats API.

The equivalent from Opta is thousands a year per competition. I was fortunate enough to get to play with detailed Opta data and ChyronHego data as part of a Man City hack day a couple of years ago. The latter data simply isn't commercially available.

For cricket, you can do something interesting with ball by ball data, but ideally you want ball tracking data. You want to know speed of release, length, speed and movement after the ball has pitched, and speed after interaction with the batsman along with angles, etc. - and that's just to get started. Ideally you want positional data on fielders, etc. too.

Don't get me wrong, this is a great starting set to get people interested, but there's a way to go for high-quality data being accessible to the hobbyist or academic researcher (although I believe Opta gives academics discounts to help make them "the" standard for clubs, etc.)