
The impact of Prince’s death on Wikipedia - The_ed17
https://blog.wikimedia.org/2016/04/22/prince-death-wikipedia/
======
semi-extrinsic
For others who were left scratching their heads at what exactly this pop-sci-
explained PoolCounter mechanism actually is:

[https://wikitech.wikimedia.org/wiki/PoolCounter](https://wikitech.wikimedia.org/wiki/PoolCounter)

TL;DR:

It's a limiter on how many workers start rendering the new page version when
the old page version in cache has been invalidated.

~~~
atdt
Yeah, I am the engineer mentioned in the article, and I agree the explanation
doesn't really work. The pieces of the explanation that ended up in the
article itself don't add up to a coherent explanation. The fault for that is
mostly mine. In hindsight, my original explanation was too long and too
elaborate to be helpful. It's a good reminder that it is easy to go to far
with an analogy and end up complicating the thing you were trying to simplify.
Oh well, live and learn :)

There are more (coherent and to-the-point) details about PoolCounter in the
prologue to PoolCounter.php in MediaWiki's source tree:

[https://github.com/wikimedia/mediawiki/blob/1617e7822eaf7426...](https://github.com/wikimedia/mediawiki/blob/1617e7822eaf74261687d2d2c3148ce10ab5da69/includes/poolcounter/PoolCounter.php#L3-L43)

And in a short blog post by Domas Mituzas, who is the original author of
PoolCounter:

[https://dom.as/2009/06/26/embarrassment/](https://dom.as/2009/06/26/embarrassment/)

~~~
hedgehog
There is a stochastic approach that can be adapted to address this problem, I
think I first saw it at IMVU in 2009 but conveniently Wikipedia has a good
reference now:
[https://en.wikipedia.org/wiki/Cache_stampede#Probabilistic_e...](https://en.wikipedia.org/wiki/Cache_stampede#Probabilistic_early_expiration)

The advantage is less coordination is necessary and you should be able to get
down to a single concurrent rerender per page.

~~~
DanWaterworth
Wow, that's a nice technique, but in this case, isn't the rendered page
invalidated by changes to the source, rather than over time?

I suppose, in this case you could use the time since invalidation as your
input. The downside is that changes in the source aren't immediately reflected
in the rendered output, especially for infrequently updated pages.

------
Buge
Interesting how in the graph it looks like some people found out about 25
minutes before it was more publicly found out.

~~~
erelde
I'm seeing an exponential curve, isn't it what we should expect ?

~~~
duskwuff
Look very closely at the section of the graph starting around 4:20 PM. There's
a small but significant increase in hits before the big spike starts around
4:50.

It's easier to see on the full resolution graph:

[https://upload.wikimedia.org/wikipedia/commons/f/f2/Prince_a...](https://upload.wikimedia.org/wikipedia/commons/f/f2/Prince_article_minute_by_minute_en.wikipedia_page_views.jpg)

------
lordnacho
How are Wikipedia articles kept consistent with each other? Say someone like
Prince dies. His page will instantly change, seemingly while his portrait is
still in the sky and the cannon fires.

But with certain people there's a variety of connected items that need
referential integrity. For instance, I can imagine Prince being on one of
those lists (eg highest grossing) that has bold text for still living artist.
For office holders, they need to be moved from "incumbent" to a box with dates
and the new incumbent needs to be updated. And then there's text snippets that
are in present tense ("Prince and David Bowie are among the greatest living
artists").

And then there's the corresponding pages in other languages.

How's it done?

~~~
CydeWeys
> How are Wikipedia articles kept consistent with each other?

They aren't. There is no transactional/referential integrity on Wikipedia.
When someone famous dies a pretty common pattern is that first the death date
is added, and then in a later subsequent edit someone gets around to changing
the present tense verbs into past tense verbs ("Prince _is_ a singer..." \-->
"Prince _was_ a singer...").

I can tell you all about how category normalization is maintained, though. It
starts with this process:
[https://en.wikipedia.org/wiki/Wikipedia:Categories_for_discu...](https://en.wikipedia.org/wiki/Wikipedia:Categories_for_discussion)

------
chris_wot
I don't think WMF staff are credited enough for the work they do in keeping
Wikipedia running. They seriously know how to scale, I think the only ones
better than them are honestly Facebook and Twitter!

~~~
zappo2938
Can't caching a page with varnish and memcache handle this?

~~~
Washuu
Varnish does handle the mass of logged out cached requests. However, because
they were having such a high amount of page edits per second the cache in
Varnish would only by valid for about a second. Then a flood of logged out
users hit the servers at the same requesting the uncached page to be rendered.
The PoolCounter extensions keeps the web servers under control and by
throttling requests for page rendering.

~~~
danielrhodes
Correct me if I am wrong, but I thought Varnish has support for limiting
concurrent backend fetches to the same resource.

~~~
brianwawok
Across a cluster of varnishes? I think that limit is per varnish.

~~~
danielrhodes
No, definitely not across a cluster (although that would be quite nifty). Even
on a single node that would reduce the thundering herd effect substantially.

~~~
chris_wot
Maybe that's a new feature request!

------
JBReefer
This is so impressive, to see behind the curtains of what has become the
central repository of humanities knowledge, during a moment of loss of one of
humanity's greats.

~~~
aaron695
> during a moment of loss of one of humanity's greats.

Compared to, lets say Bill Gates who's saved millions of lives?

Even artistically, I'm not sure Prince was up there in the top 1%

The power of marketing.....

~~~
quaristice
I totally agree with the sentiment of this comment (though I'm not sure how
much marketing is involved.) Prince was very good at music and was, well, kind
of a dick. I don't quite get the massive outpouring of grief that has ensued.

~~~
ams6110
"Kind of a dick" I think could easily apply to Gates as well, no?

~~~
quaristice
Well, I wasn't really responding to the Bill Gates part in particular. Gates
was a bit of a megalomaniac with MS. But he's doing awesome stuff with the
money now. As person I'm not aware of him ever being a jerk.

~~~
michaelmrose
So if you gain lots of money via semi nefarious means what fraction do you
have to dedicate to good works before the earlier wrong is cancelled?

I know that he didn't gas 6 million people but letting people buy their way
out of moral debt with a fraction of the money they gained still seems
horribly repugnant.

~~~
quaristice
First, what moral debt? Second, I imagine the sum total of his humanitarian
efforts are greater than the total charity if all of those dollars remained in
the pockets of each person who bought windows 95 et al. So repugnant seems
like a real stretch.

------
yeukhon
They mentioned 5M views within 24 hours of Michael Jackson's death. With over
3B Internet users out there, I am actually a little surprised how small the
spike was. Did they only count English Wikipedia? Even so I am quite
surprised. I would expect 10-20M at least. Similarly, many young people like
myself have never heard of Prince, I had to look him up to find out who he
truly was.

~~~
cooper12
They recently overhauled it, [0] but back then the pageviews [1] wouldn't
count mobile users. You can see the old stats for Jackon's page here:
[http://stats.grok.se/en/200906/Michael%20Jackson](http://stats.grok.se/en/200906/Michael%20Jackson).
The actual number is 5,875,404 views within that day (in whichever time zone)
and is for the English version of the article specifically.

[0]: [https://blog.wikimedia.org/2015/12/14/pageview-data-
easily-a...](https://blog.wikimedia.org/2015/12/14/pageview-data-easily-
accessible/) [1]:
[https://en.wikipedia.org/wiki/Wikipedia:Pageview_statistics](https://en.wikipedia.org/wiki/Wikipedia:Pageview_statistics)

~~~
yeukhon
Thanks. Yeah, I think mobile viewer would be a substainal amount, but desktop
user count is still below my personal expectation. We are talking about 850M
English speaking Internet users :( only 6M page view within 24 hours is really
quite low.

------
cmdrfred
What happened at 7:15?

~~~
teamhappy
End of a news broadcast maybe (end of the 10 o'clock news on the west coast
would work I think)

------
EugeneOZ
Even in peak it's just ~800 hits per second - it shows how is irrelevant the
C10k problem (yes, I know it's not exactly about hits per second, but still).

~~~
jsmthrowaway
The C10k problem is absolutely not irrelevant, and certainly not because one
page only saw 800rps during a worldwide event. Two things there:

(1) 800rps TO THAT PAGE is the metric. The entire rest of Wikipedia was still
getting traffic, and as an educated guess I would estimate raw traffic to be
on the order of magnitude of 3-4krps (across editing and views). They are
quite open with operations and if I weren't mobile I could probably find the
accurate answer.

(2) There are much higher traffic properties. I'm aware of one property beyond
200krps in aggregate.

If you had said "most people don't have to worry about C10k," then I'm on
board. That's true. Irrelevant? Far from it.

And yes, query rate and connection count have a complicated relationship. You
need three or four other metrics to explain their relationship, but raw query
rate is a good yardstick for active connections when combined with quantiled
request latency. (Not averaged.) Simple example: a 750ms 95th% page hit 10,000
times per second is almost certainly far > C10k because of the outliers.

Now I will grant that C10k itself is somewhat irrelevant, yes, but not for the
reason you are saying. It was defined in an age when 10,000 active connections
was pretty surprising (didn't it come from FTP or some other heavy eyeball
protocol?). These days with long poll apps, long-running protocols, and so on,
millions of open connections are quite common at consumer scale. I find C1M
far more interesting these days. C10M is still kinda nuts, but does exist in
the magical world of metal and fiber and hot aisles and all those great things
that nobody uses anymore (depressingly).

~~~
yuvipanda
[https://grafana.wikimedia.org/dashboard/db/varnish-http-
erro...](https://grafana.wikimedia.org/dashboard/db/varnish-http-errors) (and
grafana.wikimedia.org in general) have more stats. It stated '13.38 Million
req/min), which I think is ~230,000 req/s

~~~
jsmthrowaway
Ah. So my hunch was that the published number was what is making it through
cache, and that's where my estimate comes from too. That sounds about what I'd
expect for the cached side.

Nice find!

~~~
_joe
The total number of requests that get through the cache to the application
layer can be seen here

[https://ganglia.wikimedia.org/latest/stacked.php?m=ap_rps&c=...](https://ganglia.wikimedia.org/latest/stacked.php?m=ap_rps&c=Application%20servers%20eqiad&r=week&st=1461478823&host_regex=)

which you can see is not showing any substantial increase due to the passing
of Prince. The big hole of two days that ends just before the news broke is
due to wikimedia switching traffic to a second datacenter for two days, see
[http://blog.wikimedia.org/2016/04/18/wikimedia-server-
switch...](http://blog.wikimedia.org/2016/04/18/wikimedia-server-switch/)

------
xrstf
Finally a replacement for "Site got slashdotted": "Site got Prince'd". I like
it.

------
tgb
"He was ... known for, among many other things, ... a performance at Super
Bowl XLI in a raining downpour in front of over a hundred million people."

Typo and/or I call bullshit.

~~~
Amorymeltzer
This[1] seems to indicate viewership of the halftime show peaked at 140M.

1:
[https://web.archive.org/web/20090412054158/http://www.suntim...](https://web.archive.org/web/20090412054158/http://www.suntimes.com/sports/football/bears/243107,CST-
FTR-super05.article)

~~~
tgb
Okay, I can see where you're coming from but the sentence was "raining
downpour in front of over a hundred million people." How does that NOT imply a
live audience of a hundred million? I can't believe people are downvoting me
for this. The sentence is absurd as written.

~~~
jessriedel
Downvotes aren't punishment. Up/down votes indicate "this comment ought to be
more/less prominent on the page". The sentence was somewhat ambiguous and your
misunderstanding is understandable. But your comment is not useful to the
discussion because most people understood and in any case it's tangential.

~~~
tgb
Okay honestly am I the only one who reads "in front of a hundred million
people" and thinks it actually means he was in front of a hundred million
people? I don't see it as an ambiguity, it's simply a wrong statement. I'm
okay with people being uninterested in (and even downvoting) a correction, but
I'm baffled that anyone could read the sentence as anything other than
incorrect. I mean, they even describe the weather of the event as if to make
it sound even more impressive that all these people showed up!

~~~
The_ed17
I've reworded the sentence in the post—does it work better that way? Many
thanks for the feedback, everyone. :-)

~~~
tgb
Thanks, I really do appreciate it. The thought that it could have meant
televised audience never crossed my mind and surely wouldn't have for some
other portion of the audience as well. I was out googling for largest concert
size records to make sure I wasn't crazy.

