
Prediction of the FIFA World Cup 2018 – A random forest approach - ajonnav
https://arxiv.org/abs/1806.03208
======
avip
Projection: An octopus, or possibly another cephalopod, would outperform this
RF predictor.

~~~
mlthoughts2018
In fact, if you can get me (6^8) * (2^15) cephalopods, I am certain of it.

~~~
avip
If you get 2^15 squids, just play it safe and sell it at the fish market. They
run ~10$/kg.

------
everdev
Full article: [https://www.technologyreview.com/s/611397/machine-
learning-p...](https://www.technologyreview.com/s/611397/machine-learning-
predicts-world-cup-winner/)

------
caio1982
I always wonder if those people have tried to fit their approach to past
results. Could they simulate past World Cup results to be sure their method is
sane by using player/team's archived statistics?

~~~
avip

      3.4 Combining methods
      [...] are now compared with regard to their predictive performance.
      For this purpose, we apply the following general procedure:
    
      1. Form a training data set containing three out of four World Cups.
    
      2. Fit each of the methods to the training data.
    
      3. Predict the left-out World Cup using each of the prediction methods.
    
      4. Iterate steps 1-3 such that each World Cup is once the left-out one.
    
      5. Compare predicted and real outcomes for all prediction methods.

~~~
nonbel
Do they say they only did this once? Or do they imply they did hyperparam
tuning, pipeline optimization, etc using one or more (or all) of the world
cups included for the cv?

------
ultrasounder
CTRL Fed the PDF and searched for Keyword "MESSI" and didn't find any. So
obviously "Random" forests.

------
beyondCritics
I once had run my own prediction of the FIFA World Cup. I used elo ratings
from [http://eloratings.net/Europe](http://eloratings.net/Europe) which i
converted into a model giving probabilities for any outcome between two teams.
For each group A-F and for each possible pair of teams (a,b) in that group, i
estimated the probability for them to qualify in that order using monte carlo
simulation. Doing this explicitly should reduce overall estimation variance.
Using that data i calculated the probaility for each team to become world
champion using exact formulas. The results looked reasonable to me. What i
have learned was, that the chances of extreme outsiders typically are extremly
over estimated by the book makers. I have checked this paper for comparison
(Table 8) and think the results are in line with my observation. I remember
vaguely having heard, that there is a similar observation for stock options,
which are extremly out of the money, there are typically much too expensive to
buy. The bottom line is, never to bet on Korea or Japan getting FIFA world
champion :-)

~~~
hedonistbot
How about betting on Greece winning the Euro?

------
dm8
I don't think any ML models can take into consideration the meltdown like
Brazil vs. Germany in 2014 or player getting sent-off and changing the
dynamics of the game. Football/soccer (along with field hockey) is much more
unpredictable than any other team sport.

~~~
imh
Uncertainty and randomness are absolutely things that models take into
consideration. That's the _statistical_ part of statistical modeling. If
unpredictability were a deal breaker, we wouldn't be able to model coin flips.

~~~
dm8
Yes, even I use random forest for some of the predictions at my new project.
Somehow I feel doing statistical modeling for the sake of modeling/predictions
is not the right approach. I'm still new (year or two exp.) into
ML/statistical modeling so I'm much more conservative in my approach.

~~~
imh
Predicting things and forecasting is an enormous part of ML and stats.
Especially in the more statistical side of the community, there are very
principled effective ways to do this, even in the presence of weird outliers
like the germany/brazil meltdown.

Take a look at robust regression and influence functions to see some of the
interesting flavor of one way to look at outlier weirdness.

------
gus_massa
[Somewhat related] I remember that someone posted a worldcup simulator during
the 2014 championship.
[https://news.ycombinator.com/item?id=7941898](https://news.ycombinator.com/item?id=7941898)

After playing a little with it, my takeaway was that the FIFA index is
worthless (too political?) so I'd not pay attention to it. Also, the ESPN
index was one of the better predictors, so I'd look almost only at it.

------
gd2
FIFA has the reputation of favoring the big market countries, and scandals
happen. Did the model here include the probability of poor and biased
referees? ? Anyway I'm looking forward to the World Cup, but wish it wasn't so
early on the West Coast. The first matches I'm especially looking forward to
Spain vs Portugal & Mexico vs Germany Should be fun !

------
OscarCunningham
Compared to betting odds
([https://www.betfair.com/exchange/football/competition/561474...](https://www.betfair.com/exchange/football/competition/5614746))
this approach thinks that Brazil is much less likely to win.

------
merb
I do not think that the germans are the top favorite.

First of all, not a lot of teams won the cup twice in a row

Second it's way harder if everybody is focused on you (because you won the
last time)

I'm pretty sure barley any data can reflect these two statements. (ok the
first one can, but the second one is a more emotionell effect)

~~~
mart187
No Team actually has done it before...

~~~
thiagotomei
That is not correct.

* Italy won in 1934 and 1938

* Brazil won in 1958 and 1962

------
adam
In contrast, a fully human approach using a prediction market:
[https://alphacast.cultivateforecasts.com/challenges/167-2018...](https://alphacast.cultivateforecasts.com/challenges/167-2018-fifa-
world-cup)

------
alecco
This is a perfect example of abuse of ML. Last world cup ML predictions were a
joke. I guess we won't learn the lesson. And if by chance any of these
predictions work, we'll take it as a valid thing.

------
masteruvpuppetz
In 2006 world cup I had predicted to colleagues that Italy will win just
because I liked them. I stood by this throughout and viola! It happened.

Last WC my prediction was Argentina (reached the final).

I feel intuitions do matter. This time: Argentina again

------
billforsternz
TLDR; We don't know who's going to win either, maybe one of the really good
teams?

------
sien
Oh gawd. Please stop.

The Economist is doing it. 538 has done one. Everyone is doing it.

They all wind up with similar predictions.

Germany, Brazil, France and Spain are favourites.

This is what the betting odds, transfermarkt and the mean salaries of the
teams also tell you.

There are limits to prediction. The WC is a good place for people to learn
that.

~~~
vermontdevil
For all the hype with World Cup and I love the event, it’s pretty much these 4
(Italy if they qualified) that has won the event. Last time a country outside
of the big four(five) is Argentina back in 86

Makes for a bit of a dull ending sometimes.

~~~
sien
The winner for an individual WC is hard to pick and there are usually 4-5 big
teams that could win but they have changed a little bit.

France hadn't won a WC and kept choking until 1998. Spain were the same until
2010.

Italy and Argentina have been in and out of the favourites as well.

England were #1 in Elo rankings in the late 1980s and got to the semis in 1990
and were only knocked out with the help of a handball in 1986.

With a bit of luck Holland might also have been able to win a WC. Finals in
74, 78 & 2010.

But outside a few big teams it is pretty unlikely that there will be a real
surprise winner. There are too many big good teams so that one or two
surprises isn't enough to win. For example South Korea shocked Italy in 2002
but then Germany beat them. Or Croatia beat Germany in 1998 but then got
beaten.

