
An exploratory statistical analysis of the 2014 World Cup Final - epiteton
https://beta.deepnote.com/article/statistical-analysis-of-2014-world-cup-final
======
rjtavares
Ok, so something I wrote years ago hit the HN front page. This is what it
feels, uh?

I also have a github with more notebooks about football here:
[https://github.com/rjtavares/football-
crunching](https://github.com/rjtavares/football-crunching)

If you gave any questions about football analytics, hit me up.

(Gmail and Twitter are the same name as my HN account, if you prefer email or
DM)

~~~
billforsternz
Interesting article thanks. A little constructive feedback on some language in
the conclusion. "Football is a game of space. That's why parking the bus can
actually allow to win a match." The second sentence doesn't make sense
unfortunately. Worse, it's not really possible to work out what it means
either. It sounds like you mean "That's why parking the bus can be a good
strategy" (or something like that). But the first sentence sets up the exact
opposite expectation, a sentence like "That's why parking the bus is never a
good strategy" would be compatible with the first sentence. Sadly the reader
is left not knowing whether you're saying one thing, or the complete opposite.

I understand English is unlikely to be your first language. Please read this
as a genuine attempt to be helpful.

~~~
pvg
There's a grammatical hiccup there with the 'allow' but the meaning is quite
clear for soccer-fluent readers.

~~~
billforsternz
Completely unclear to me and I've been soccer (football) fluent for over 50
years. I know what parking the bus means (for non football fans - it means
falling back and defending in numbers). If football is really a game of space
it should be a poor strategy. Is the author saying parking the bus is good or
bad strategy?

~~~
pvg
It's a strategy that gives up possession in exchange for denying offensive
space to the opponent and relies on exploiting (through counter attacks) the
defensive space the attacker opens. When executed well, it can win games but
it's not amenable to the type of analysis presented in this write-up. That's
what the bit is about.

~~~
billforsternz
Aha, so essentially he meant "football is a game of space, that's why parking
the bus is an _interesting_ strategy". Thank you. Too much binary thinking
from me. I often struggle to discern the meaning of unclear writing. Which is
probably why I spend disproportionate time on my own writing polishing and
rewriting for clarity. But I still fail regularly, it's not an easy problem.

~~~
pvg
Glad it helped! I brain-pretzeled myself over _soundness_ in this very forum
just last year:

[https://news.ycombinator.com/item?id=20196575](https://news.ycombinator.com/item?id=20196575)

~~~
billforsternz
An interesting and educational thread

> "I've now come to understand the two chief weapons of the JIT remain
> surprise, fear, ruthless efficiency and an almost fanatical devotion to the
> Pope."

I like this very much. Even adding nice red uniforms, it's still two chief
weapons.

------
LanceH
Germany 2014 was about the best I've seen the game played.

Something I watch that seems to differentiate players and teams at every level
is what happens on the first touch when receiving a pass. When you first start
watching this, you'll notice that as you move up the ranks, the ball just
sticks to a pro's feet, preferably in front of them a couple feet away where
they are ready to play it again. Watching Germany that year, their first touch
was not merely excellent, but aggressively so. They took the ball and instead
of settling it, turned it into a rolling ball in the direction they wanted to
play. Or a first touch pass. Speaking in wild generalities here and I don't
have numbers to back it up.

Statistically I would guess that their second touch was on average farther
away from them than other teams of comparable level, while still being under
control.

It reminded me of Tiger Woods when he burst onto the scene. He played the game
far more aggressively and relied on his skills to keep him safe rather than
traditional shot selection. Germany 2014 decided these slightly riskier
touches are consistently possible in the long run and the benefit outweighs
the risk.

It also seemed that with the aggressive play, there is just _more_ football
played -- more chances. The more chances that are generated, the more it
favors the better team.

~~~
Mikeb85
Watch Germany vs. France and say that. The Germans stalled and fouled the
whole game and scored off a set piece. In the final versus Argentina too, the
game was scoreless until extra time. They also needed extra time versus
Algeria. Their only real display of attacking was versus Brazil who self-
destructed after having their best player (and top 3 player in the world)
horrifically injured.

~~~
dekervin
And yet, perhaps the most breathtaking display of football dominance I saw
from that Germany team, was from their 2016 Euro semi-final against France. It
was way more impressive than their brazil dismantling. Their control of the
game was out of this world.

Football is "unstable" as a game. To dominate in a consistent way, you
actually need to be overwhelming superior to your opponent in many different
stages/aspects of the game. Any weak link and you are at the mercy of fate.

~~~
Mikeb85
Ummm they lost... I was actually in France that year. France advanced then
lost the final to Portugal.

------
thom
If anyone's interested in doing similar analysis, my company StatsBomb makes a
lot of event data available on GitHub, including the last World Cup, lots of
Champions League finals, NWSL and FAWSL etc:

[https://github.com/statsbomb/open-data/](https://github.com/statsbomb/open-
data/)

~~~
levidxyz
Are you able to share how StatsBomb collects event data? I'd love to build
something to capture stats for local club teams but probably not if it
involves computer vision. Thanks!

~~~
torvaney
If you're interested in collecting small-scale/simple event-data, I put
something together for my own use here:
[https://torvaney.github.io/projects/tracker.html](https://torvaney.github.io/projects/tracker.html)

------
ankit219
One of those who doesnt like the use of advanced stats in football. I do watch
a lot of football though. My reason is simple:

The advanced stats as they are called (xG, xA, xPA, offense actions, defense
actions) all end up as a mere tool for disregarding the actual result and
providing arguments for which was the better team. Like team A won the match
1-0 but team B were the better team since they had more <<insert favorable
stat>>. The idea behind stats should be to contextualize the game I watched a
bit more, and not replace the game watching.

Then there is also comparing the stats across games. In football(and every
other game) there is a different difficulty level associated with each game
(fatigue, condition, team condition, opposition condition, tactics, opposition
tactics, teammates, chemistry, opposition mistakes, opposition players,
adjustments, pressure, and even sheer luck on occasions), which makes the
comparison redundant. An offensive team would always have more shots and more
possession than a defensive minded park the bus kind of team. Like you said,
right now no stat is good enough to give us an idea of how well a team played.
Football is a game of spaces, and a lot of things depend on player movements
(or lack of) and vision. I find those reports helpful which tells us where a
certain team had the advantage and how they maximised it though.

~~~
imrankhan17
Do you think the game of football is more complex than, say, the global
economy? I doubt it, yet we still use data and statistics to analyse the
economy, so why not football or any other sport?

~~~
ankit219
Its not an exact comparison. We use models to analyse the economy but not
human behavior. Any sport is a lot about human intelligence, behavior, and
skill. Easy to quantify the final output (like numbers that matter such as
goals) and compare those. Right now, other than that, we dont have good enough
models for any action in football. The context is very important, but that is
completely ignored by statistics. Until we get where those things are given
due consideration, there is no point quoting or debating over vanity metrics
like xG, offensive actions etc. There is a very American tendency to reduct a
game to just stats, but that a game is infinitely more complex. You may have
just one offensive action in the game and yet still win the game if you are
well drilled defensively, and can stop opposition counters.

------
stephenhuey
Your browser is not yet supported

We're sorry about this, but we don't fully support your browser yet. Let us
know at help@deepnote.com which browser you're using and we'll make sure to
prioritize it. We currently recommend to use the latest version of Chrome,
Safari or Firefox and you should be all set. Thanks!

\- - -

I’m using Firefox 26 on the latest iOS.

~~~
rnicholus
Since Firefox mobile doesn’t use the same versions as desktop, I’m guessing
deepnote has just incorrectly implemented the browser support logic. This is a
very common issue (Firefox specifically)

------
cambalache
Nice as an exercise in Pandas and the whole Python data analysis tool chain,
the graphics are awesome. As an exercise on football analysis is pretty
useless and feeble, and even worst, judging by the author's name (most
probably Brazilian or Portuguese) it led his dislike of Argentina to cloud his
judgement.

You can perorate all you want about possession and passes completion but if I
am a manager and you offer me a first half (who according the author was a
Germany slaughter fest) as the Argentinians had it, I will take it every
single time. Absent from the analysis are also the clear cut chances of
Higuain and Messi (far better from anything than Germany had, despite all the
possession) and the controversial disputed ball between Neuer and Higuain.
Anyway, it was nice to see the exercise.

------
s14ve
Pretty interesting read! I can also recommend the author's medium account[1]
full of similar articles (even though in _a bit less hands-on_ format)

[1] [https://medium.com/football-crunching](https://medium.com/football-
crunching)

~~~
rjtavares
They're similar indeed. I wrote the linked post and that blog is mine.

------
s1t5
Pandas code with long lines can look so so ugly. Not a dig towards the
original post at all, it's just that the pandas API isn't the most pleasant to
work with.

------
Equiet
Hi, the author of Deepnote here, this is cool!

~~~
s14ve
Not sure if you can influence this, but I have two platform suggestions that
crossed my mind while reading it:

1\. I would make the "Run this article as a notebook" more visible. On a first
read, I've completely skipped that part as it's _very similar_ to pop-ups on
medium. Having an option to directly run/modify this blog would be pretty
amazing.

2\. The chosen color scheme of code formatting is a bit odd, but that might be
just my subjective preference :-)

~~~
Equiet
Of course, thanks for the suggestions.

1\. I agree, but there is a reason for this. We'll soon be adding the ability
to run/modify feature even without the need to sign up so the whole thing will
go away.

2\. Thanks! We are experimenting with this (the default scheme is different,
but we had a lot of people requesting dark mode so trying out different things
for the published articles.)

~~~
lowdose
I really love the dark theme of Sourcegraph in VsCode.

[https://marketplace.visualstudio.com/items?itemName=sourcegr...](https://marketplace.visualstudio.com/items?itemName=sourcegraph.vscode-
sourcegraph-theme)

------
SkyMarshal
Very cool. For anyone interested in this kind of tactical analysis, another
(though non-computational) one I've enjoyed is
[http://www.zonalmarking.net/](http://www.zonalmarking.net/).

~~~
dekervin
In the same vein, but with a more statistical oriented take on european
football you can check
[https://www.alfadata.xyz/blog](https://www.alfadata.xyz/blog).

And here is an interview of the founder about his experience building the
service and data infrastructure behind it
[http://datapeek.org/interview/alfadata](http://datapeek.org/interview/alfadata)

------
lordgrenville
Nice visualisations! Note that (if you're a little OCD about this like me) you
can append a ; to the last matplotlib command in a block to suppress the text
that appears before notebook inline plots.

------
helloiloveyou
what's the function that returns from the pandas dataframe the foul that the
german goalkeeper made to Higuain?

Jokes aside, pretty good article. I am an Argentinean, so consider that a lot.

~~~
georgiecasey
How about Higuain's miss from the German header back to Neuer?

------
set92
I dont know why this is here and not in
[https://datatau.net/](https://datatau.net/)

------
zwaps
Out of interest, do you guys know of other such great data sources for other
games, for example American Football?

Or are those things mostly proprietary?

------
ppod
It's surprisingly hard to find practical in-depth pandas code examples like
this.

------
hoerzu
How would turn a team into a vector?

------
femg
thanks for sharing this. Want to learn more about this kind of things.

