
Goldman Sachs model to predict World Cup game results didn’t come close - rodionos
https://www.bloomberg.com/view/articles/2018-07-14/world-cup-goldman-sachs-gs-model-got-it-all-wrong
======
yk
> And in any case, the model only generated probabilities of winning a game
> and advancing, and no team was given more than an 18.5 percent chance of
> winning the World Cup.

> [...]

> But Goldman Sach’s misfire is perhaps the most curious.

The model said, that there is a lot of uncertainty, and as it happens, it was
entirely correct. A World Cup chance of 18.5 percent means, that 4 out of 5
times the team will not win, and that that is the highest chance does not say
much about the model.

And in general this is one instance of the well practiced journalistic
technique to wait for results first and then define a bar afterwards to
criticize the results according to standards that did not exist when the
performance happened. (I guess in this case it is even worse, we could
construct a reasonable test of the model performed, I have the suspicion that
that was in the original paper and that the journalist either did not
understand it, or, more likely, choose to ignore it in favor of writing a
better story.)

~~~
jbob2000
But Goldman Sachs are the kings of predicting uncertainty! This is their whole
business! They make billions predicting certainty through the murky, uncertain
waters of the global economy. Would you argue that the global economy is more
uncertain that soccer? I'd say so. How is it that they can find success in the
market but not in soccer?

I think this is a smoke signal. Soccer is corrupt; you can't predict the
winner unless you know what's being passed around under the table. Goldman
Sachs does these predictions so people read between the lines to see how
corrupt it is.

My argument is: "Goldman is amazing at statistical analysis and they routinely
practice it on much tougher models (the global economy), so they should have
no problem predicting a simpler model (soccer). But since they drastically
failed at predicting soccer, then there must be an equally drastic variable
missing from their predictions. Since we can trust Goldman to use all
available public information in their analysis, there must be critical
information that is hidden from the public which affects the outcomes". I make
some assumptions, but it's fairly sound, no?

~~~
cepth
Unclear if your comment is tongue in cheek, but assuming that you're serious,
I'd encourage you to give a listen to a podcast episode like this:
[https://soundcloud.com/bettheprocess/episode-35-ted-
knutson](https://soundcloud.com/bettheprocess/episode-35-ted-knutson).

In the world of sports betting/analytics, you have baseball and basketball at
the forefront, and then American football, soccer, and hockey (roughly in that
order).

Off the top of my head, there are several reasons why the latter three sports
have all lagged behind:

-Lack of data

It wasn't until the last 4-5 years that widely available, affordable, and
accurate data for soccer matches was available. Companies like Opta have
accomplished this by outsourcing the watching of games and the manual tagging
of events, which was made possible by the advent of cheap cloud computing.

It should be self-evident why tracking the position and actions of 22 players
is more complicated than something like baseball, where for the most part you
are looking at one pitcher vs. one batter, much of which can be automated with
computer vision that tracks pitch position, speed, and spin.

-Complexity

It's no accident that baseball was the first sport to be revolutionized by
analytics. Most of the time, it's a static game, with a clearly defined action
set. I.e. do I swing at the pitch or not. Do I throw a fastball or not. Do I
attempt to steal a base or not.

In games like American football, soccer, and hockey, you have anywhere from
12-22 players on the field at a time. Tracking what the players without the
ball or the puck are doing is a difficult task technically, as is quantifying
their impact. Concepts like expected goals and expected goals added are recent
ones.

-Sample size

Typical elite soccer leagues see each team play each other twice. In England
and Spain, this means you have 38 games per season.

Baseball has a 162 game season and playoff games, basketball has an 82 game
season and playoff games, etc. Coupled with the fact that quality data has
been only collected for a few years, and you get other problems.

In basketball and baseball, the effects of aging on player performance and
statistics is fairly well understood now. We can generally calculate the
5-year market value of a player etc. In the other sports I mentioned, we don't
yet have that kind of time series data to be able to make those judgements.

\--

Specific to the World Cup, there are other reasons why you may find it hard to
predict results.

-Team chemistry and style

Even though the World Cup is the most high-profile soccer event in the world,
most players are spending 1-3 months a year with their national teams. Their
"day jobs" with their clubs teams take up most of their playing time and
attention.

As anyone who has played the game Football Manager will know, managing a
national team is a tough job. You have no say over how the players are
practicing when they're away from you, and no control over the physical
condition in which they arrive at the World Cup. This year, there was barely a
month between the end of the regular European seasons and the start of the
World Cup.

In that month's time, you have to get at least 11 players who have not played
with each other, to learn your style of play. Do you want to play a pressing
style? Are you attempting a slow buildup, or trying long balls? Etc. etc.

-Home field advantage

In baseball and basketball, most modern statistical models account for home
field advantage. Having 60,000 Russian fans chanting and heckling likely
played a role in the team's ability to upset Spain, particularly during
penalty kicks.

This goes back to the sample issue. How many times before have Spain played
Russia IN Russia in front of a large crowd? Probably never.

\---

All this is to say, cut Goldman some slack. There are a number of non-
nefarious reasons why you may expect a soccer model to produce some
spectacular miscues.

~~~
jbob2000
Ok, I understand this - that soccer has many variables and it is difficult to
create a model with all of these variables. But my point is, the global
economy has _way_ more variables than soccer. Way way way _way_ more
variables. At least 7.5 billion of them.

So would you argue that creating a statistical model of soccer is harder than
creating one for global economies? I think it's harder to model economies.

I'm not even trying to give Goldman a hard time! I'm saying that Goldman
probably put together a very accurate model of "soccer", but we aren't
watching an accurate model of soccer; we're watching the corrupted one where
the players and skills don't matter.

~~~
cepth
I think we have to be very clear on what economic "models" Goldman uses.

If you're talking about GDP growth forecasting, or forecasting unemployment
numbers, these are ultimately questions of aggregation. Yes, there are 7.5
billion people, but at the end of the day each individual agent's actions
don't make a tremendous difference for an aggregate measure like GDP. During
periods of low volatility, as we are currently experiencing, it's really not
all that impressive to forecast the unemployment rate +/\- 0.25%, or GDP
growth within 0.5%.

If you're taking about their market-making and trading businesses, they've had
some horrendous quarters recently as well
([http://www.businessinsider.com/goldman-sachs-just-had-a-
hist...](http://www.businessinsider.com/goldman-sachs-just-had-a-historically-
bad-quarter-in-trading-2018-1)). A very small portion of Goldman's business is
taking an opinionated stance, most of their income comes through relatively
low-risk market making activities.

And let's not forget that during the 2008 financial crisis, certain
departments within the company correctly wagered against credit default swaps,
while others had exposure to subprime mortgages. The company still needed an
injection of capital from Warren Buffett and the US Treasury to weather the
crisis. Point being, they aren't clairvoyant oracles.

\---

Regarding your last point, which was also made in your original comment, you
seem to be claiming some form of what economists call "omitted variable bias",
and seem to be hypothesizing that the "omitted variable" is corruption or
cheating.

From the purely technical standpoint of building models, the tiny samples
([https://www.theringer.com/soccer/2018/7/11/17557720/world-
cu...](https://www.theringer.com/soccer/2018/7/11/17557720/world-cup-lessons-
talent-sample-size-france-england-croatia)) and the nature of the "data" being
collected means that there are plenty of other explanations, like incorrectly
estimated parameters or measurement error.

If you're trying to suggest that there is corruption or cheating in soccer,
please point to a concrete example of a team in a critical game receiving a
disproportionate number of calls. Unsure if you're aware, but this was the
first World Cup with instant video replays for the referees to use. Had this
replay been in use more widely in international soccer, the US might've
qualified for this World Cup ([https://deadspin.com/u-s-a-out-of-world-cup-on-
phantom-goal-...](https://deadspin.com/u-s-a-out-of-world-cup-on-phantom-
goal-1819343176)), England might've won/tied that pivotal 2010 World Cup game
([https://en.wikipedia.org/wiki/Ghost_goal#England_v_Germany_a...](https://en.wikipedia.org/wiki/Ghost_goal#England_v_Germany_at_the_2010_World_Cup)),
etc.

Soccer may have had a sordid past with the picking of host countries, but the
trends in the actual game itself point to technology reducing the ability of
referees to make blatantly terrible calls.

~~~
jbob2000
Thanks for the replies and the detailed sources, it's interesting to read!

> Point being, they aren't clairvoyant oracles.

Yeah, my argument was weak in that regard. They aren't anywhere close to
perfect or accurate, I'll admit.

> you seem to be claiming some form of what economists call "omitted variable
> bias"

Yes! Is that what it's called?

> please point to a concrete example of a team in a critical game receiving a
> disproportionate number of calls

Corruption doesn't have to be that explicit. Maybe key players or coaches are
paid to perform poorly? It doesn't always come down to the ref. But I admit I
have no examples.

------
jasode
Leonid Bershidsky and a lot of other journalists laughing at Goldman Sachs'
incorrect predictions seem to miss the point.

The World Cup predictions from Goldman Sachs (and also UBS) are a form of
_recreation and entertainment_ with machine learning. It's an expression of
quant nerd humor.

Analogous intellectual games would be engineers devising ridiculous Rube
Goldberg contraptions[1] or programmers building "enterprise" FizzBuzz[2].

(I think it would add to the fun if GS uploaded their raw data and models to
Github for others to play with.)

 _> It certainly didn't predict the final opposing France and Croatia on
Sunday._

True, but it did predict France having better chance winning overall but was
handicapped by a tougher draw. It also predicted France beating Croatia in
round 16 instead of the final. The pdf says:

 _> While Germany is more likely to get to the final, France has a marginally
higher overall chance of winning the tournament, _

[1]
[https://en.wikipedia.org/wiki/Rube_Goldberg_Machine_Contest#...](https://en.wikipedia.org/wiki/Rube_Goldberg_Machine_Contest#Past_tasks)

[2]
[https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...](https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpriseEdition)

~~~
learnstats2
On the other hand, this is a predictive task that has defined outcomes and
clear historical data - by my understanding, it is easier than commercial uses
of machine learning [at least, easier to measure the effectiveness].

It's also Goldman Sachs and UBS choosing to attach their names to these and
stake some reputation on these predictions. If they had hit the bullseye, they
would be lauding these results.

~~~
CoryG89
It may be easier to measure the effectiveness (give a confidence level for the
prediction), but just because there is clear historical data and defined
outcomes, that does not mean you will be able to predict a particular outcome
with any high level of certainty.

For example, imagine a tournament with a large number of participants, where
the winner is picked simply by fairly choosing a single random participant.

If I then gave you all the perfect historical data going back decades, you
could do statistical analysis and determine that the winner is completely
random and therefore the probability of success, for any particular
participant, is p~=(1/n), where n is the number of participants. Your
confidence in correctly predicting any particular outcome will drop as n
rises.

Not everything can be easily predicted just because you have enough data.

------
raverbashing
People conflate statistics with actual results more often than not and I think
those reporting on such stories and maybe even the original authors might fall
for this.

It was not wrong to say Hillary had a 95% chance of winning the presidential
election, but the confidence was low and that value _still allowed for the
opposite result to happen_.

Also football has a lot of variance concerning team capability and end
results. The better team might (and does) lose often, especially when going to
penalty shoots.

With basketball, the stronger team will be easily scoring more in most cases.

~~~
kgwgk
> had a 95% chance [...] but the confidence was low

So she had 95% chance of winning with 50% probability or what?

~~~
fny
So this is something that people don't seem to grok quite well, and it really
depends on the type of statistical analysis used.

Say you make the assumption that the quantity being estimated is truly fixed:
that there's some true value for the force of gravity or some true value for
the number of people that vote for X or Y.

The second assumption that comes along is that the stochasticity observed
comes from your perspective of observation, and not from the ground truth. To
be more blunt, you know that of all the observations you make 95% of them have
the probability of yielding the result observed... but the ground truth is
still fixed. Gravity has a fixed quantity, despite your experimental error,
and you may have been lucky enough to observe it in your sample.

Predicting elections with frequentist methods has this same characteristic,
except the observed quantity itself shapeshifts and even lies... so then there
are other complications that need to be dealt with.

This is where that 50% feeling comes from. There are two outcomes, one will be
true. You're data analysis just tells you that if you repeat your procedure,
you'd expect 95% of those result to give you the outcome you observed.

~~~
kgwgk
If you expect to get it right (in this particular prediction, Clinton to win)
with 95% probability, what does it mean to say that this 95% is with low
confidence or with high confidence?

~~~
thousandautumns
Not OP, but that opens a whole different can of worms. “Confidence” has a
specific meaning in the context of statistical theory, and specifically in a
particular flavor of statistics called “frequentism”. I won’t get into what is
involved in frequentism, and how it differentiates itself from the
alternative, Bayesianism, but essentially “confidence” refers to a measure
that really says more about the statistical methodology used to arrive at the
estimated value (in this case, that Hillary had a 95% chance of winning) than
the value itself. This makes is a bit esoteric and something that people
misinterpret all the time.

Basically, confidence refers to a hypothetical scenario in which a the data
gathering process were to be repeated and the same analysis done, X% of the
confidence intervals (essentially, the +/\- bounds around your estimate) will
contain the true value for what you are trying to estimate.

So in this hypothetical scenario, we say we have the power to go back in time
and recollect the polling data in 2016 and run the same analysis used to
arrive at that 95% number. And let’s say we use this power over and over
again, a very large number of times. Then 95% of the error bounds we construct
should contain the true value of the probability Hillary wins, whatever that
is.

The thing is that those error bounds can be huge. You can have 95% confidence
that the probability that Hillary wins is between 3% and 98%, for example. You
can also have 10% confidence that the probability of a Hillary win is between
94% and 96%. Without the confidence intervals, a “confidence level” doesn’t
say much. It’s also predicated on the assumption you haven’t screwed up your
data collection process or analysis methodology. And if you are predicting
something will occur with a probability of 95%, and it doesn’t, that doesn’t
automatically mean you are wrong, but the likelihood of you having screwed
something up is definitely higher.

~~~
kgwgk
I agree that this is a different can of (nasty) worms.

The message I replied to said that > It was not wrong to say Hillary had a 95%
chance of winning the presidential election,

Frequentist inference cannot be interpreted as a probability unless one goes
through some (often misunderstood, as you pointed out) contortions. In your
scenario where you have 95% confidence of something it would be wrong to say
that Clinton had a 95% chance of winning.

------
boomboomsubban
The World Cup is about the worst sporting event for data led predictions like
this, far too much can rely on a few events that are basically a coin flip. It
would be interesting to see how the predictions went for something like the
Premiere League tables.

~~~
fwdpropaganda
> far too much can rely on a few events that are basically a coin flip

Can you give us some examples?

~~~
gtr
As there are relatively few goals, anything that can turn a goal into not a
goal or vice versa can have a massive impact on the game. For example the
penalty decision against Croatia in the final. Another thing that adds to the
randomness is the chance that a key player may be sent off or injured.

~~~
systoll
To make that be specific -- in 44 of 64 games, and in every single penalty
shoot-out, turning one goal into not a goal or vice-versa would've changed the
outcome.

~~~
ufo
It is not so simple because goals in football are not independent from each
other. A team that scores first has the opportunity to play more cautiously
and go for more counter attacks.

------
anonu
People love to beat up on these companies because of this stupid world cup
prediction. Yes, Goldman is a giant vampire squid wrapped around the face of
humanity (Matt Taibi quote). But it turns out it's really just great marketing
for their research teams.

Also, I've seen some people say (not in this forum) that banks now look stupid
because they're in the business of making predictions and they can't even get
the world cup right. Guess what? Banks make no money on predictions. They make
money on flows and taking spreads on trades they do with clients. Any research
or prediction is meant to be a catalyst for that trade.

~~~
jasode
_> Banks make no money on predictions. They make money on flows and taking
spreads on trades they do with clients._

You're mostly right but to further clarify, an investment bank like Goldman
Sachs has revenue from mostly "market making" spreads but it does also have
activities that depend on predictions such as their proprietary trading
(before the Volcker Rule shut them down) and their GSAM (Goldman Sachs Asset
Management) fund. The GSAM is basically a hedge fund for their wealthy
clients' money. They will run predictions on macro trends on data like
interest rates, commodities, indexes, etc to help them pick stocks for their
portfolio.

As the pdf noted, the World Cup data models and simulations came from Adam
Atkins of GSAM.

~~~
anonu
The Volcker Rule shutdown approximately 0 amount of proprietary trading on
wall street. Any articles you can point me to were merely media stunts by
their respective firms.

The rule was too complex and onerous to be implemtable. Case in point, it's
already being rolled back... Certainly because of the current administration
we're in. But more because it was just a poorly written and thought out idea
to start with.

------
crispyambulance
I am somewhat shocked that GS would jump into the prediction business of the
World Cup, even as joke. The risk of people getting the wrong idea about the
prediction and GS itself is too great, even with a perfectly defensible model.

This is an enterprise for bookies, not Goldman Sachs.

~~~
TuringNYC
FYI - I worked at Goldman Sachs and then a hedge fund for a decade. On the
Capital Markets / Trading side, you are _literally_ a bookie. In fact the
nomenclature is "you have a book." You are setting trading spreads based on
where you think things will go. Depending on the market, your work may be more
or less statistical and you're trying to gain a statistical advantage.

~~~
Maro
Off-topic: were you able to retire after that decade?

~~~
lloyd-christmas
Not OP, but I worked in finance as an trader for more than half a decade. I
make more per hour once I switched to software development in NYC.

I think people have a strange view of finance. Most people aren't paid obscene
amounts of money in finance, just like most software developers don't make the
salary of a senior developer at Big Tech. They also work an obscene amount of
hours. During earnings season, I would be at my desk by 5am and work 80+ hours
per week. Nowadays, It's a rarity to go more than 50. My brother currently
works at Big Bank, and makes more than I do on an absolute basis, but I
definitely make more than he does hourly. I get to work at 9:30-10, he gets to
work at 7:30-8. I get home at 6:30-7:00, he gets home 8:30-9. He works at
least a half day every Sunday, I enjoy my hobbies. I'm also commenting on HN
at 11:00...

Most of my college friends still work in finance. I make more money than a few
of them based on overly honest drunken conversations, and we're all more than
10 years into our careers. There is a glass ceiling in tech that is a lot more
all-encompassing, but it's not like it doesn't exist in other industries.
There are only so many higher-up positions, and most people burn out (or
aren't capable of competing) before they even get in position for that
promotion. The running joke when someone was getting poor performance reviews
was "That's it, I'm moving to Vermont to open an antique store".

For some more comparison, I grew up in a 1%er town in the suburbs of NY. The
average lawyer family lived in nicer houses than the average finance family,
who in turn lived in nicer houses than the average medicine family. However,
the most expensive house was owned by the CFO of Big Bank. Income is very
right-skewed in finance.

------
denzil_correa
The "Ludic Fallacy" strikes again [0].

> The ludic fallacy, identified by Nassim Nicholas Taleb in his 2007 book The
> Black Swan, is "the misuse of games to model real-life situations."

...

> The alleged fallacy is a central argument in the book and a rebuttal of the
> predictive mathematical models used to predict the future – as well as an
> attack on the idea of applying naïve and simplified statistical models in
> complex domains. According to Taleb, statistics is applicable only in some
> domains, for instance casinos in which the odds are visible and defined.

Both Taleb's books, "The Black Swan" and "Fooled by Randomness" are an
interesting take for such models. Meanwhile, most economists know about
"Knightian Uncertainty" [1] which talks about differentiation of risk and
uncertainty.

> "Uncertainty must be taken in a sense radically distinct from the familiar
> notion of Risk, from which it has never been properly separated.... The
> essential fact is that 'risk' means in some cases a quantity susceptible of
> measurement, while at other times it is something distinctly not of this
> character; and there are far-reaching and crucial differences in the
> bearings of the phenomena depending on which of the two is really present
> and operating.... It will appear that a measurable uncertainty, or 'risk'
> proper, as we shall use the term, is so far different from an unmeasurable
> one that it is not in effect an uncertainty at all."

[0]
[https://en.wikipedia.org/wiki/Ludic_fallacy](https://en.wikipedia.org/wiki/Ludic_fallacy)

[1]
[https://en.wikipedia.org/wiki/Knightian_uncertainty](https://en.wikipedia.org/wiki/Knightian_uncertainty)

~~~
fwdpropaganda
Damn, do I disliked Nassim Taleb. I don't think I've ever heard him say
anything deep. That wikipedia article is an excellent.

In [0] you have the following:

> The ludic fallacy, identified by Nassim Nicholas Taleb in his 2007 book The
> Black Swan, is "the misuse of games to model real-life situations."

And he gives an example of this:

> One example given in the book is the following thought experiment. Two
> people are involved:

> Dr. John who is regarded as a man of science and logical thinking

> Fat Tony who is regarded as a man who lives by his wits

> A third party asks them to "assume that a coin is fair, i.e., has an equal
> probability of coming up heads or tails when flipped. I flip it ninety-nine
> times and get heads each time. What are the odds of my getting tails on my
> next throw?"

> Dr. John says that the odds are not affected by the previous outcomes so the
> odds must still be 50:50.

> Fat Tony says that the odds of the coin coming up heads 99 times in a row
> are so low that the initial assumption that the coin had a 50:50 chance of
> coming up heads is most likely incorrect. "The coin gotta be loaded. It
> can't be a fair game."

> The ludic fallacy here is to assume that in real life the rules from the
> purely hypothetical model (where Dr. John is correct) apply. Would a
> reasonable person bet on black on a roulette table that has come up red 99
> times in a row (especially as the reward for a correct guess is so low when
> compared with the probable odds that the game is fixed)?

So Nassim Taleb wanted to discuss "using games to model real-life situations"
and to demonstrate the pitfalls he uses two characters. He _portrays_ the
characters as "man of logical thinking" vs "man who lives by his wits", but as
we'll see he's missing one dimension to his characterization.

The first problem here is that implicitely he's suggesting to the reader that
the decisions of the "man of logical thinking" represent the pitfalls of
"applying games to model real-life situations", whereas the the other guy's
decision represent.... it's not specified, but clearly has a better outcome.

The second problem, is that he conflates "applying something you read on some
textbook to real life without thinking" with "modelling real-life". He
suggests to the reader that those two people are actually "logical" vs
"instinct", but they're not. They're a dumb guy who knows maths vs a smart guy
who doesn't know math. _Obviously_ real-life is more complex than your
textbook examples, and so the smart guy is going to win because his fuzzy
heuristics beat the first guys decisions which are optimal within his flawed
model. An actual smart and logical person would update his model based on new
evidence (i.e. "I was told that this coin was 50-50 but actually the chance of
what I just saw is so small that it's more likely that I was just lied to")
and then use maths to make predictions and beat the guy who's smart but
doesn't know math.

So ironically, he wants to portray the dangers of using over-simplified models
and to do that he uses an example where he obscured one dimension.

Nassim Taleb is really good a rhetoric but light on substance.

[0]
[https://en.wikipedia.org/wiki/Ludic_fallacy](https://en.wikipedia.org/wiki/Ludic_fallacy)

~~~
neuromantik8086
Nassim Taleb is basically a stopped clock. He's pretty big on pointing on how
we're all prone to find illusory correlations (not his discovery) and he's a
great promoter for Kahneman and Tversky, but there are other areas where he
clearly is out of his depth. It's beyond obvious, for instance, that he's
never gotten past Popper in his studies of philosophy of science.
Unfortunately, his disciples are (ironically) quite terrible at thinking for
themselves and buy into his demagoguery.

Basically a book by Nassim Taleb is an incoherent summary of the books that
Nassim Taleb has read within the past year, with a few morsels of recycled
insight here and there.

------
lowkeyokay
If anything, this is a clear illustration of poor use of probabilistic
prediction. When used for investments you have many outcomes. If the model is
any good, you will most of them right. In the World Cup you have very few.
Even if you count all games played. Definitely not excusing Goldman Sachs
here, they should have known better than to try to predict this. There was
only a tiny chance this could be great advertisement for their model.

~~~
Ntrails
> they should have known better than to try to predict this.

There's no downside, only free publicity. If they, by good fortune and a
following wind, get it right - then the publicity is incredible. If it's wrong
they laugh and say "well, better stick to predicting what we're good at!" and
they _still get a shitload of headlines and awareness of their product_.

This was not a mistake.

~~~
jamespo
Well, there's the downside of articles like this pointing out they've had 4
years to work on their models and they've got worse

------
geraldbauer
PS: If you want to build or train your own model or make predications, you can
find open (structured) data about all world cups at the football.db, see
[https://github.com/openfootball/world-
cup](https://github.com/openfootball/world-cup) and
[https://github.com/openfootball/world-
cup.json](https://github.com/openfootball/world-cup.json) Enjoy the beautiful
game.

------
kgwgk
The predictions were not _so_ bad. At least one of the favourites won in the
end. GS had France winning with 11.3% probability, second to Brazil with
18.5%. UBS was less fortunate, they had Germany (24%), Brazil (19.8%), Spain
(16.1%) and England (8.5%) before France (7.3%).

I compared the logloss for their predictions with the "uniform" benchmark
(giving each team 1/32 probability of winning, 1/16 probability of getting to
the finals, etc) and the results are the following (if I transcribed the data
properly):

Getting to second round:

GS: 0.495 UBS: 0.495 bench: 0.693

Getting to quarter-finals:

GS: 0.463 UBS: 0.459 bench: 0.562

Getting to semi-finals:

GS: 0.310 UBS: 0.327 bench: 0.377

Getting to final:

GS: 0.231 UBS: 0.269 bench: 0.234

World-cap winner:

GS: 0.097 UBS: 0.113 bench: 0.139

The performance of the models was ok until Croatia got to the finals. This
hurt specially UBS, who predicted less than 0.9% probability of such an event
(compared to 2.1% in Goldman's model).

Edit: these would have been the "best case" scores (if the high-probabilty
teams had classified to each round, ignoring that this may be impossible due
to the structure of the tournament):

GS: 0.432 0.302 0.220 0.141 0.079

UBS: 0.365 0.251 0.176 0.111 0.070

UBS could potentially achive lower logloss metrics because it had more extreme
predictions.

------
cascom
Isn’t this a little like flipping a coin four times - getting heads four times
in a row, and looking at your friend and saying “but you told me the odds were
50/50 each flip?!”

~~~
thousandautumns
Yes, it is.

------
rcdmd
This article didn't compare the Goldman Sachs model to any other models-- why
not compare it with sports betting odds? Would Goldman have made or lost money
betting their model was better than the crowd?

~~~
sunstone
Or compare it with the fivethirtyeight blog predictions.

------
vl
>Soccer, with the many factors that affect game outcomes — players’ injuries
and intra-team conflicts, the refereeing, the weather, coaches’ errors and
moments of inspiration — remains only a tightly-regulated game involving a
_few dozen people_. The behavior and performance of big corporations, _entire
industries and nations_ is arguably even more difficult to model based on data
about the past.

Author misses the way models work entirely, the larger the entity, the more
statistics and averages kick in, and as a result, better model can be built.

~~~
Donald
Depends on the complexity of the interactions between variables. There are
plenty of examples where we have excellent local models, but make
(comparatively) worse prediction at scale. A pretty classic example is biology
- we have excellent knowledge about how genotypes work and their interactions
in cells, but models of phenotypes are typically expensive, error-prone, or
non-existent.

------
dmichulke
I watched quite a few matches and among the things I saw in the matches but
not in any statistics are:

\- motivation (Germany and Croatia were the two extremes here, no idea how to
measure it)

\- team cohesion (number of articles in a few journals questioning the team
cohesion, maybe also articles about individual players)

\- creativity in offense (maybe measurable via "target missed from close
distance" \+ "ball passed front of the goal")

\- number of errors in defense that didn't lead to a goal

\- percentage of times ball possession was lost from own goal to enemy's area
(England was really bad here against Croatia)

~~~
smcl
These kinda show what makes predicting football particularly difficult. I like
the ideas, and I think we (or more likely some ML algorithm) can come up with
the set of conditions that showed why France prevailed against the specific
opposition at this specific World Cup ... but I suspect that the conditions
would be pretty unique and invalid for Euro 2020, WC 2022 etc.

As you identified, motivation could be pretty hard to measure ... but even if
we could it might be a pretty poor predictor anyway. France in the early
stages didn't look very motivated, while England and Colombia looked pretty
lively.

Team cohesion - the German team were pretty consistent (not dazzling, but
consistent) and we know how that ended. Again France didn't really impress
until the latter stages of the WC.

Creativity in offense - I guess it can indicate a sort of calm or confidence
in front of goal but actually it can actually be seen as pretty negative. For
example Arsenal a few years back came under fire for having plenty of
possession in the 18 yard box but failing to convert. Spain's confident quick
pass-and-move "tiki-taka" was ever-present and has in my eyes been impotent in
the last few years (and more important as a neutral viewer - very frustrating
to watch).

Defensive errors that didn't lead to a goal could be a nice indicator of the
ability of a defence to pick up after each others mistakes - but at the same
time these errors that lead to goals (i.e. Croatia's second goal in the final)
are relatively rare and a lack of a goal could just point to the opposing
team's inability to convert due to a poorly organised or a lack of opportunism
from their strikers.

I'm not sure what you mean with the last one, but I think this could be a nice
one - if you mean "times you lost possession in your own half". A profligate
midfield and defence is bound to ship goals, I doubt there are many teams that
can either fight back after trailing by a goal or two or score enough to
maintain a reasonable buffer.

I applaud the effort though - it takes more creativity and care to think of
some new angles (like you did) than to think of some possible counter examples
(like I did)!

~~~
dmichulke
Thank you for the warm words, I guess the reason is my occupation plus the
fact that I just spend my last few weeks watching many games with family and
friends.

> I'm not sure what you mean with the last one, but I think this could be a
> nice one - if you mean "times you lost possession in your own half"

Almost, England lost the ball frequently (> 50+x% with a large x AFAI could
see) due to the keeper sending out long balls. I'd like to measure that
somehow. Could be done via number of seconds in possession after a goal kick,
an indicator whether a hypothetical 85% marker of the field was reached or
measuring whether the ball was at least 5x successfully passed (or resulted in
a goal).

~~~
smcl
Ahhhh I see. Actually this is something I've really been curious about myself
- whether the better strategy overall for a keeper returning the ball into
open play (from goal kick or from hand) is to just boot it as far up-field as
possible or passing it short to one of the defence or midfielders sitting
deep.

Interestingly something like this is a tactic used in Rugby
([https://www.youtube.com/watch?v=cbti6mLvSJs](https://www.youtube.com/watch?v=cbti6mLvSJs)).
I used to play a lot of football when I was younger and at our level (waaaay
down the scottish league pyramid) against tired, hungover or generally weak
opposition, keeping them under pressure by dominating the territorial game but
sacrificing possession was criminally underrated. Usually if you could keep
hammering them for 60 minutes and had the legs to step up a gear in the last
30 or so you could grab a valuable goal or two :-)

------
iainmerrick
_Thanks to the use of more granular data, made possible by AI, this year’s
model should have worked better than the 2014 one.

If anything, it worked worse._

"If anything"? All the results are available, so it would be easy to put a
precise number on this. Measure the Bayesian regret, or just report the
winnings if you had used the GS model to bet on the outcomes. Unless it
reports some concrete numbers, this article is garbage.

It doesn't report any concrete numbers.

------
corpMaverick
Soccer is a sport with a big random component. This is probably why it is so
exciting. An average team can beat a better team.

The reason is easy to see. The game can be decided by one, two or three key
plays. Compare that to basket ball. To win a game you have to consistently
score more and defend better. Rarely the game is decided by one or two plays.
That only happens when the game is already very tight.

------
barrkel
I put money on Belgium (12.0 decimal odds) and Croatia (15.0) after the group
stages, where some form was visible, combined with knowledge that they had
some of the world's best players.

The odds shortened as the tournament progressed, I was able to hedge as the
shortened odds made lay betting profitable.

(High variance in football outcomes means there's no guarantee of profit, I
don't bet big sums.)

~~~
anoncoward111
This answer is very useful and contains proper strategy advice :)

If someone were to bet during the round of 16, if someone were to bet $1 on
the bottom 8 and $2 on the top 8, the strategy would most likely yield a small
profit or a small loss, rather than a total loss.

------
tirumaraiselvan
It's a fools errand to predict high variance events like football games.

~~~
pbhjpbhj
Only predict events that are easy to predict, never fail!

------
patagonia
Financial modeling is about risk adjust return. Because GS knows they can not
determine with certainty the outcome of a given investment, they diversify and
hedge. Most of all, GS is a market maker, the equivalent of a bookie. To say
that GS’s models “didn’t come close” is to ignore all the ways in which such a
grading scheme is different than GS’s actual business model. If their WC
prediction efforts acted as anything more than a fun spirited PR project, it
was likely that GS wanted to somehow keep its employees engaged and adding
business value during the WC which they otherwise would have been certainly
watched all month.

------
rossdavidh
In addition to the many other problems with this article, I would like to
point out that if, somehow, Goldman Sachs had managed to create a model that
could accurately predict the results, the game of soccer would have to be
changed to make it more unpredictable somehow. It is intrinsic to the nature
of sport that, in order to be entertaining, there has to be a realistic chance
for more than one team to win. Not many people (even from the winning country)
would bother watching if it were accurately predictable.

------
kulu2002
Good... There was this discussion thread few days back on HN

[https://news.ycombinator.com/item?id=17509407](https://news.ycombinator.com/item?id=17509407)

Did this investment bank use same set of algorithms that they use for
financial predictions?

...And then I remember there was this Octopus[1] who used to predict winners
with 85% accuracy

[1][https://en.wikipedia.org/wiki/Paul_the_Octopus](https://en.wikipedia.org/wiki/Paul_the_Octopus)

------
IkmoIkmo
You'd have to run this world cup thousands of times by simulation, running it
a single time and determining the results are not in line with the model is
meaningless and silly.

It's as silly as saying my claim for the odds of nearly perfectly modelling a
coin toss (approximately 50/50%) is wrong because a series of 10 coin tosses
show different results from my model. The model is not any less correct.

------
Keyframe
It's as good time as any to plug in EA's simulation results:
[https://www.easports.com/fifa/news/2018/ea-sports-
predicts-w...](https://www.easports.com/fifa/news/2018/ea-sports-predicts-
world-cup-fifa-18)

------
msravi
Duh. Looks like there's a fundamental misunderstanding of how statistics works
all around. The probability of an event does NOT predict a particular outcome.
Ever. It only says that if the experiment is performed again and again and
again, like a few thousand times, then X% of those will match that
probability.

If I toss a fair coin you cannot predict the next outcome. You can only say
that if I toss the coin a 1000 times, then close to 500 are going to turn up
heads, and another 500 are going to turn up tails.

It was stupid of Goldman Sachs or whoever to predict an outcome. It was stupid
of anyone else to lend credence to that prediction.

Hopefully, Goldman Sachs is not relying on prediction of singular outcomes to
make their investment decisions. I don't think they are. Probably just
marketing brouhaha to ride the soccer wave. Although I'm not sure if that
worked as expected.

~~~
Sean1708
> It was stupid of Goldman Sachs or whoever to predict an outcome.

If you read the actual report they did[0], they never claimed that any single
outcome was more than 18.5% likely.

[0]: [http://www.goldmansachs.com/our-thinking/pages/world-
cup-201...](http://www.goldmansachs.com/our-thinking/pages/world-
cup-2018/multimedia/report.pdf)

------
hsienmaneja
They don’t have an edge like they do in their bread and butter markets,
combined with a small sample set == high probability of a single year of
sports predictions falling over like this

------
gesman
If GS would need to bet money - their actual business model would likely be to
sell a bit of each higher probability losers (less risk) vs. buy big on a
projected winners (higher risk).

------
blattimwind
This site is a good counter-example for website optimization: While it uses
many assets, so a CDN domain makes sense, it spreads them out thinly. It loads
over 100 CSS files, most of which are below 1K. Similarly it loads
approximately 30 JS scripts, most of which are just a few K each. This is
mitigated to a large extent by using HTTP/2.0, which permits a few dozen or so
parallel requests, but it still means that a repeated load of the page takes
2-3 seconds. (Without HTTP/2.0 this probably takes ages, since browsers open
only a few connections to each origin at most). There is also almost no
difference between reloading with and without the cache.

------
rdlecler1
In the world of models increasing precision for not necessarily increase
accuracy.

------
Sean1708
In case anyone was interested here is a table of how likely the model thought
each team was to make it through any particular stage[0] along with the stage
that that team went out in and the probability that the model gave for that
particular outcome (i.e. [probability of making it through the final stage
they made it through] - [probability of making it through the stage they went
out in]).

    
    
                    Groups  Round_16  Quarters  Semis  Finals    Out_In  Probability
            Brazil   87.5%     60.8%     42.0%  27.9%   18.5%  Quarters        18.8%
            France   81.4%     58.4%     36.6%  19.9%   11.3%       Won        11.3%
           Germany   80.5%     49.5%     30.5%  18.8%   10.7%    Groups        19.5%
          Portugal   75.2%     52.8%     32.2%  17.3%    9.4%  Round_16        22.4%
           Belgium   78.5%     51.1%     27.7%  15.8%    8.2%     Semis        11.9%
             Spain   72.3%     50.1%     28.8%  15.4%    7.8%  Round_16        22.2%
           England   73.1%     46.6%     24.4%  13.4%    6.5%     Semis        11.0%
         Argentina   79.7%     44.2%     24.1%  11.8%    5.7%  Round_16        35.5%
          Colombia   74.9%     37.3%     17.0%   8.5%    3.7%  Round_16        37.6%
           Uruguay   74.4%     34.6%     17.2%   7.2%    3.2%  Quarters        17.4%
            Poland   68.5%     30.5%     12.8%   5.8%    2.3%    Groups        31.5%
           Denmark   47.8%     26.3%     12.4%   5.2%    2.0%  Round_16        21.5%
            Mexico   52.0%     23.2%     10.5%   4.9%    1.9%  Round_16        28.8%
            Sweden   45.9%     19.4%      8.3%   3.7%    1.3%  Quarters        11.1%
              Iran   35.4%     18.1%      7.2%   2.6%    0.8%    Groups        64.6%
              Peru   37.3%     17.2%      6.8%   2.5%    0.8%    Groups        62.7%
         Australia   33.5%     15.4%      6.3%   2.3%    0.7%    Groups        66.5%
            Russia   47.9%     16.3%      6.0%   2.0%    0.7%  Quarters        10.3%
           Croatia   49.8%     16.9%      6.3%   2.1%    0.6%    Finals         4.2%
       Switzerland   52.8%     15.9%      6.1%   2.0%    0.6%  Round_16        36.9%
           Iceland   45.2%     15.1%      5.6%   1.8%    0.5%    Groups        54.8%
        Costa_Rica   36.8%     13.3%      4.7%   1.6%    0.5%    Groups        63.2%
            Serbia   32.9%     12.1%      4.5%   1.5%    0.5%    Groups        67.1%
             Japan   36.5%     12.8%      3.8%   1.3%    0.4%  Round_16        23.7%
      Saudi_Arabia   43.4%     12.7%      4.2%   1.3%    0.4%    Groups        56.6%
           Tunisia   35.2%     13.3%      4.1%   1.3%    0.4%    Groups        64.8%
             Egypt   34.4%      8.7%      2.5%   0.7%    0.2%    Groups        65.6%
       South_Korea   21.6%      5.9%      7.1%   0.5%    0.2%    Groups        78.4%
           Morocco   17.1%      6.8%      1.8%   0.5%    0.1%    Groups        82.9%
           Nigeria   25.2%      6.5%      1.7%   0.4%    0.0%    Groups        74.8%
           Senegal   20.1%      4.9%      1.2%   0.3%    0.0%    Groups        79.9%
            Panama   13.2%      3.3%      0.5%   0.1%    0.0%    Groups        86.8%
    

[0]: Exhibit 2 in [http://www.goldmansachs.com/our-thinking/pages/world-
cup-201...](http://www.goldmansachs.com/our-thinking/pages/world-
cup-2018/multimedia/report.pdf)

Edit: Fix copy-paste errors and atrocious maths.

~~~
ernesth
Japan went out in Round_16

Croatia went out in Finals

And I do not understand what the last column means (except for France and
teams out in group phase)

~~~
Sean1708
Urgh, I hate that you can't edit HN comments.

First two were just me making a mistake because I write that in manually.

That last column makes no sense. It was supposed to be the probability that
the model gave to the outcome that occurred, but I got the maths wrong.

~~~
sctb
There's an edit window of a couple hours, which has probably just past. We've
opened it up again so you can go ahead and edit.

~~~
Sean1708
Brilliant, thank you very much. Although it might be a tad late now.

------
known
GarbageIn = ML = GarbageOut

------
known
I worked in GS; Soccer/football prediction is not their forte

------
tomelders
While I agree that it's somewhat silly to try and predict a word cup winner
like this (and I suspect it was just a bit of fun anyway), there is one other
reason that could explain why all these attempts got it so wrong.

Cheating.

Before people start booing, let's not forget where this tournament is being
held, and all the other nefarious things that country has been up to recently.

~~~
teamk
FIFA has been corrupt for decades. Although supposedly its been cleaned up
since Blatter was removed, it is doubtful the institutional corruption has
been eliminated completely. The only question is how pervasive it is.

