Hacker News new | comments | show | ask | jobs | submit login
Show HN: Euro 2016 predictions using Bayesian inference (kickoff.ai)
79 points by lum 553 days ago | hide | past | web | favorite | 57 comments



For those who are interested in the details: we model the strength of teams using the well-know Elo model (used e.g., for official chess ratings).

We innovated on two aspects of the traditional Elo model, in which every team has an independent parameter:

1) We actually model the strength of players instead of teams. This makes it possible to learn from games that were played in championships between clubs, and transfer this knowledge to games between countries

2) There is a not-so-widely-known connection between Elo-type comparison models and Gaussian process classification. We leverage this, and get a full posterior distribution for each team's strength. (Information on the uncertainty of our estimates helps a lot in coming up with sensible predictions)

If anyone wants to know more (explanations on the web page are very superficial at the moment), please drop me a line!


But how does that make sense in a game where you can have a draw? By definition 100% cannot cover only home/away. Draw must be factored in. So doesn't that make all the Elo assumptions false? Just wondering how you can take draw into consideration...


Excellent point - this one is definitely on our todo list. There are several simple extensions of the Elo model that take draws into consideration (i.e., give a non-zero probability to draws), for example the Rao-Kupper model. There are only minimal changes needed w.r.t. the original model, but still we didn't manage to make the changes in time for this version of the site.

In short: at its core, the "Elo assumption" postulates that every team can be represented by a real number (that can be interpreted as the strength of the team), and that the probability of the outcome depends on on the difference in strength. In the vanilla Elo model, the outcome is binary, but it's easy to make it ternary.


The thing is that football is a time based sport, so draw is an outcome with pretty good chances. Usually weak teams will try to delay as much as they can to get the draw. Given enough time, the strong team would have much bigger chances to win. Also, depending on the context ( points needed for each time ) a team might have bigger motive to go for a draw than a win.

Predicting outcome possibility in football is a very complicated story, I doubt it can be solved in a simple way like elo ranking the players or teams.

That said, the knock-out phase might be more suitable for that model.

Kudos for the effort anyway, and nice UI.


Have you considered using a TrueSkill-alike with extensions for scores/teams, such as PoissonOD?

http://research.microsoft.com/pubs/193839/sbsl_ecml2012.pdf

A few years ago I had a similar idea for trying to build team models, so you can make a better guess at the performance of national teams, since they don't play very often, or for league teams due to transfers at the start/during the season, but hadn't got as far as you :)


We did some preliminary experiments in this direction. Basically, we tried to do a regression on the score difference instead of using only binary outcomes. In our experiments it didn't improve the predictive accuracy - but there are many more things to try. It does feel a bit wasteful not to take score data into account.

Nice that you had some similar ideas :-)


In chess you can also have a draw.

Either by having the player agree to a draw, or by killing every piece that is not the kings.


Right. But then again, you cannot say it's 100% that player A or player B will win, right? There's a chance for draw that is not negligible.


One common way to represent the probability of outcomes in chess as a single number (e.g. in comparing opening lines) is to say "white gets 55% of the points", which aggregates wins and draws. (So for instance if white wins 50%, draws 25% and loses 25% of the games, it gets 0.5 * 1 + 0.25 * 0.5 + 0.25 * 0 = 62.5% of the points.)


In fact, at top levels of play, draw rates are 40% or higher.


Is there a reason you opted to use Elo rather than other rating systems commonly used in baseball analysis like log5 or Pythagorean (e.g. https://summerofjeff.wordpress.com/2010/12/05/serving-agains...)?

I'd also be interested in knowing a bit more about point 2 as well.


One specific criticism regarding Portugal-Iceland: main squad will not be exactly as considered there. E.g. Cristiano Ronaldo is considered a substitute.. Does it make a difference to your predictions?


You are correct about Portugal, not exactly the squad that will likely play against Iceland. We used the starting lineup of the last official game (I believe in this case it was Portugal vs England on June 2nd), which did not include e.g. Cristiano Ronaldo.

The actual lineup does impact the predictions - we will update the lineups before the start of each game, when the lineup is anounced.

In a future version, we would like to make it possible for visitors to change the players in the team, and automatically update the prediction.


I think the same is also true for Ireland. The last friendly match was used to view mainly fringe players. I guess 7/8 of the likely starters are here as substitutes. Perhaps you should look at the last competitive game in the qualifiers.


Have you had any chance to compare your player strength values against other models to check how well they are aligned? You can find from Premier League's web site "Player Performance Index", e.g. for last season the top three players according to PL PPI were Harry Kane, Riyad Mahrez and Jamie Vardy.

Another obvious question is that have you checked your model against the odds on betting sites that provide "Draw-no-bet" bets since you are not yet taking into account draws?


I'm interested in how you model the strength of players, is that for the whole squad or the expected starting eleven? The prediction that stood out for me was the Wales vs Slovakia which FIFA rankings and betting odds would both suggest will be closer, would love to hear more about the factors behind that particular prediction.


The Wales starting lineup is missing Gareth Bale (arguably in the top 5 best players in the world). His 'kickscore' is listed as higher than any of the welsh players so I don't understand why the model has excluded him from the starting lineup. See also Portugal and Ronaldo (kickscore 100!).

Interesting but perhaps needs some tweaking to match the expected starting lineups.


Hi! I'm Victor, one of the researchers behind this project. We used the most recent lineups (up to yesterday) to do the predictions you see on the web page. We will update them with the latest friendlies (e.g., Portugal indeed) and before every game, as soon as we have the official lineups!


It may be worth taking competitive matches into account over friendly matches, as the purpose of competitive matches is to win at all cost. Friendly matches, on the other hand, tend to be used primarily for match fitness (especially leading up to tournaments), and for trialling new tactics before the competitive games begin.


Thanks for the clarification and great project!

It will be interesting to see how things change when you get the official lineups. As I'm sure you are aware, teams often rest their 'star' players in friendlies immediately prior to a tournament in order to minimize the risk of injury.


Where did you get player/club data from?


We scraped it from soccerway.com.


Are there any plans to make the clean dataset public? It would be nice to be able to play with it.


You can use the FIFA data in my soccer GitHub repo - https://github.com/octonion/soccer/tree/master/fifa


Could be, yes! We'll see after the Euro ;)


Has there been any work (by you or others) towards factoring in the performance effect of the coaches on the teams?


We tried to consider it as an extra player, but it didn't help.


What's a reference for the connection between Elo and Gaussian process classification that you mention?


Great base for future development of the idea.

Some things to take into account though:

- Friendly games are sometimes about fitness rather than winning at all costs

- Weather greatly effects results. For example, in heavy rain and high wind, the likelihood of the game yielding more than 2.5 goals reduces massively.

- How does the algorithm take into account substitutions?

- If a team in the group has already won 2 games, they may have already won. In that case they may rest many players or try new tactics

- Not all leagues (and levels of those leagues) are taken into account

Very much looking forward to seeing where this goes!


Thanks a lot for the comments!

- about friendly games: agreed, in the future we'd like to downweight them (as well as downweight older games over newer ones)

- about substitutions: at training time, we look at the number of minutes each player has spent on the field. At test time, we assume there will be no substitution. Could certainly be improved :-)

- about teams that are already qualified: true, our model does not encode all these contextual factors, and arguably they are very important for certain games. If the lineup changes (e.g. star players resting) it does impact the prediction.


Cool work! Appreciate it's not your mission to say 'this will or won't happen' and is just a bit of fun.

But the great thing about football is that interesting as this is, I won't be placing any bets off the back of it. Too many variables in human nature, playing conditions, external factors, and maybe even luck. It's what keeps the game interesting and millions watching.

It would be interesting as a question to wonder what parts of human behaviour can be predicted accurately in this way though!


It is my personal opinion that soccer has a very high luck component. Often four or five plays decide the game. Frequently enough it is even decided by a referee call. It makes the game attractive because even an inferior team has a decent chance against a superior team. It may be just a lucky shot and then you close your defense.


> It may be just a lucky shot and then you close your defense.

See Greece in WC 2010 as an example of this as a deliberate strategy.


>> "I won't be placing any bets off the back of it. Too many variables in human nature, playing conditions, external factors, and maybe even luck"

That's what makes betting on it fun! If you can reliably predict who's going to win it's not gambling :)


Betting 2€ with my coworkers on the euro results. Will be using your prognostics because I don't know anything about football.

MAKE ME RICH! \o/


I think the fundamental assumption that a team's performance is the sum of the individual players' past record is wrong.

If you studied how a pundit would judge outcomes for these games, I'm sure they'd be talking about current form, whether the players play well together, tactics, playing styles etc. Take England for example, they've given Rooney a far higher score than either Vardy or Kane, even though any sane pundit is clammering for Rooney to not start, saying the other strikers are on far better form and that Rooney's presence detracts from the overall performance of the team.

As a technical project it's pretty cool, I just doubt its accuracy. I guess we'll soon find out!


Man Utd fan here. Agree completely with assessment of Rooney.


Football (it's Euro 2016) is a team sport so players are important but it's also very important how they play together as a team. Sometimes there are teams with average players that win competitions because they help each other much more than teams with star players (example: Leicester this season).

I don't know if you can model this taking in account single players, even if their performance in previous games could be an indicator about the strength of the team. We will see soon how the model fares.


"We will see soon how the model fares."

I'm no statistician but "soon" seems unlikely; wouldn't you need a lot of results in order to judge how good these predictions are? I'd like to know if the model can use data capped at any time period, so that matches from the past can be judged against what the model would predict.


If they have data for the last 10 years they could use the model to predict thousands of games. That would start to be statistically significant for anybody.

However, I'm no statistician myself too but if the goal is to predict only 12 results how do we evaluate the predictions of the model? There should be a difference between 0 and 12 successes.


They are 5-2 after the first 3 days. The first game was a draw (72%-28%) and the third one a win for Wales (39%-61%). Two days and five games to go.


Here's another prediction data point, based on betting odds:

http://eeecon.uibk.ac.at/wopec2/repec/inn/wpaper/2016-15.pdf


This is similar to the stuff GoalImpact does:

https://twitter.com/Goalimpact/status/740657055841816577

It's true most predictive models work at the team level, but it's interesting to point out that one of the more successful models over the last season was Chad Murphy (@soccermetric)'s MOTSON, an SVM trained over a bunch of stats per player:

https://twitter.com/JamesWGrayson/status/732673663019802625


I can say right off the bat this is completely wrong. The sum of probabilities of each team winning cannot be 100%. Why? Because draws exist. Worse yet, the concept of "playing for a draw" based on your current position in the group also exists. Also, the concept of "nothing to play for", when the team is already guaranteed the first place in the group and might not play at full strength in the final game.


Yes, for now we only predict win / lose. It is fairly reasonable to assume that draws are more likely to arise if the winning probabilities are close to 50/50 (and less likely if there is a big difference) - but you are also right that there are some contextual factors that influence this.

You know the saying, "all models are wrong, but some are useful" - our model is clearly too simple to encode many of these complex patterns, but we believe that's alright!


Yes, the draws exist, but every other argument you've posted cannot be applied to the first round of group games. Chances are huge that all of the times are going to want to start strong.

As the researchers said multiple times, they're going to be updating their algorithms all of the time and take into account the things they've missed in the current version.


Chances are huge that all of the times are going to want to start strong.

Except when you're the lowest-ranked team, and starting your group with a match against the favourite. In that case, you're likely to play for damage control (e.g. aim for a draw, don't concede too many goals).


It's an interesting prediction. The UI design is clear too!

Do you have the prediction for the final rankings and the winning %? Some predict France and Germany are in top 2.


The predictions look good to me except Wales vs Slovakia because of Gareth Bale.


You're right. We took Wales' last lineup, but it will very probably be a different lineup against Slovakia. We'll update the prediction shortly before the game, when we know who's going to play.


Are you guys confident that this can beat the market (i.e Betfair)? I would highly suspect that the answer is "not at all". Does this affect you?


Noice! I'm excited to see how the results turn out. Have you guys thought of incorporating info other than player and match data, like money-lines?


Did you validate your model on former EC or WC data?


interesting, the model seems very slightly confident that italy will win against belgium, even though it recently lost 3-1, and most rankings put the belgians ahead, and gut feelings say that italy's current team kinda sucks.

I really hope your model is right :)

EDIT: could it be you are underestimating the weight of player age?


We are not considering the player's age as a feature at all actually.

Concerning the predictions, they depend on the starting 11 for each team. We took the most recent ones, i.e. during the last friendlies, which are probably not the ones that will actually start the game!


Did you take care about home advantage? I would be very happy to know more about math of your model


We did, it helps quite a lot! Concerning the model, we use Gaussian process classification with a logit link function, a linear kernel and Expectation Propagation as inference method.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: