
Wikimedia Foundation's runaway spending growth - hk__2
https://en.wikipedia.org/wiki/User:Guy_Macon/Wikipedia_has_Cancer
======
jessriedel
See extensive discussion on HN (1054 points, 406 comments) of this issue back
in May.

[https://news.ycombinator.com/item?id=14287235](https://news.ycombinator.com/item?id=14287235)

~~~
hk__2
Thank you; I looked up that post’s title on HN before posting it but got no
result.

~~~
elect_engineer
I am the author of the Wikipedia essay "Wikipedia has Cancer". The previous
discussion led to a version of my essay that was published with my permission
in the Wikipedia Signpost. That version was dumbed-down a bit by the signpost
editors, In particular they removed most of the references that document that
my claims are factual. This version is the version I keep updated as errors
are found.

For the record, I prefer that this thread have the same title as my essay. The
current title misses several important aspects of my essay, such as
transparency and the dependency on revenues always increasing.

------
film42
> The modern Wikipedia hosts 11–12 times as many pages as it did in 2005, but
> the WMF is spending 33 times as much on hosting...

Wikipedia also keeps "revision history" and a "talk" section for each page.
Those original pages created before 2005 can still be modified, potentially
adding to the overall cost. Breaking news stories can cause massive amounts of
churn which I imagine has increased with the site's popularity [1]. So really
11x pages = 33x higher hosting costs doesn't seem unrealistic considering how
much metadata is associated with each page. That's not to say there isn't a
problem, but "page count" might not be the best metric. I wonder what average
number of revisions per page looks like over the last 12 years.

[1]
[https://xtools.wmflabs.org/articleinfo/en.wikipedia.org/Talk...](https://xtools.wmflabs.org/articleinfo/en.wikipedia.org/Talk:2017_Las_Vegas_Strip_shooting)

~~~
jchw
Since 2005, hosting has gotten much cheaper, and techniques for storing
extremely large amounts of data have gotten much more efficient. But as far as
I know, MediaWiki is no better positioned to begin taking advantage of those
facts. It's still a PHP app accompanied by a huge, vertically scaling MySQL.
There's no telling what might work better for Wikipedia, but there's plenty of
avenues for experimentation. Maybe revision history could be more effectively
stored in Amazon S3-like storage, instead of MySQL. Maybe not, but there's
many such possible optimizations I doubt have ever seriously been considered.

~~~
toomuchtodo
I think the savings would be in having an efficient core with lots of caching,
similar to the Stackoverflow architecture.

Push as much off to the CDN as possible, and dynamically reduce the cache time
as edit velocity increases on a page (with cache time increasing again as
edits fall off).

~~~
zrm
You can do even better than that with something read-mostly like Wikipedia.

Have a bunch of caches (or a cache hierarchy) which request pages on first use
and then cache them _indefinitely_. Then when a page actually changes, you
push notify all the caches to immediately evict that page.

~~~
mpartel
Sounds error-prone if the notification ever fails to reach some subset of the
caches. A long expiration time (e.g. several hours) is probably almost as
good, but also "self-healing".

~~~
zrm
Giving out the wrong page for several hours is obviously better than giving
out the wrong page forever, but it's still broken.

Better would be to re-request the page after several hours and compare it to
the cached copy. Then if they're different, update the cache, but also
generate an alert to notify the administrator that something "impossible"
happened and there is a bug somewhere that needs to be fixed.

------
atdt
I worked for Wikimedia from 2012 to 2016. During that time I successfully
agitated for the creation of a dedicated performance team, which I led for its
first two years. During that time, we cut median page load time by over 40% (
[http://observer.com/2015/08/how-wikipedia-upped-its-page-
loa...](http://observer.com/2015/08/how-wikipedia-upped-its-page-load-speed-
by-roughly-40-percent-and-why/)), and brought median page save time down from
over 6 seconds to 800ms.

My salary was in the 100k to 150k range. That is not a princely sum for
software developers in the Bay Area, but it is a lot of money, especially when
you consider that it came primarily from donations of $50 or less that donors
could have spent elsewhere, on other causes. I was humbled by that fact and
made a point of reflecting on it daily to make sure I was doing my best to
maximize value for donors and users.

------
njharman
"Sounds like cancer doesn't it"

Causes me to immediately stop listening and start doubting everything you said
and claim, questioning your motive.

It is an emotional appeal. Which 1) insults my intelligence, thinking I'm
stupid an unaware enough to fall for that. And one or more of the following 2)
you're too stupid or poor at arguing that you don't know you're doing it 3)
the facts don't support your side so you try to rile up the ignorant and
gulliabl to win 4) you don't care about argument / have other motives like
fame, clicks, distraction from other issues.

~~~
JackFr
It's not an emotional appeal, it's simultaneously a rhetorical device and a
pretty neat analogy.

~~~
thinkloop
I don't like X, and it is growing = cancer, is the full depth of the analogy.
Pretty weak. Unlike cancer, we wouldn't want to eliminate 100% of expenses.
Unlike cancer, expenses are necessary for survival. Unlike cancer, it's
possible expenses should be growing faster, to fuel growth or whatever - this
is the argument that needs to be made. Calling something cancer this loosely,
is very close to the annoyance of invoking Hitler so quickly.

"I don't know about all dem der hosting costs you speak of, but I sure as hell
don't like that cancer"

~~~
aunty_helen
>I don't like X, and it is growing unabated like cancer

I think this is much more closer to the actual point he's making.

Also I'm not sure that condescending the author when they've actually put a
lot of effort into sighting their claims and spelt out what their point is
clearly and coherently, is necessary.

~~~
elect_engineer
I am the author of the essay "Wikipedia has Cancer". I did put a lot of effort
into documenting my claims, and I would encourage anyone interested in
clicking on the links in the essay. In particular, I have documented what
others have said about Wikipedia financials, and some of those pages are
rather good.

Whether you agree with me or not, I would ask everyone to keep an eye out for
any factual errors so that I can correct them.

~~~
thinkloop
Your article is informative and thought-provoking, only the analogy to cancer
is a bit hyperbolic and unnecessary.

------
labster
I wish that the WMF spent some of their mountain of cash on programmer time on
security reviews of the extensions in the Mediawiki ecosystem. Instead, it
feels like it's mainly just me in my free time, as part of my efforts for the
nonprofit wiki farm, miraheze.org. The WMF does a great job on any code used
on or produced for WMF servers, but for the other extensions listed on
mediawiki.org, it's a smorgasbord of XSS and SQLI. A lot of those are git
hosted by the WMF too, so don't use the hosting location as any indication of
security.

------
cagenut
~$160K/month for hosting for a site that big[1] is pretty damn good. I'm sure
they're getting a lot of favors and discounts to get there.

[1] -
[http://www.comscore.com/Insights/Rankings](http://www.comscore.com/Insights/Rankings)

I've worked with a bunch of sites farther down that list that are spending an
order of magnitude more than that.

~~~
tryingagainbro
_I 've worked with a bunch of sites farther down that list that are spending
an order of magnitude more than that. _

If the site was repeatedly down or slow people would complain even more.
Wikipedia is truly global and that adds to costs.

------
TrickyRick
> If we do these things now, in a few short years we could be in a position to
> do everything we are doing now, while living off of the endowment interest,
> and would have no need for further fundraising. Or we could keep
> fundraising, using the donations to do many new and useful things, knowing
> that whatever we do there is a guaranteed income stream from the endowment
> that will keep the servers running indefinitely.

This is probably the best point in the entire essay, if they reduced spending
funding would essentially become obsolete. Kind of clashes with the begging
banners which have made me donate to WMF the last couple of years.

------
Animats
Wales wants a private jet.[1] Not going to happen from Wikia.[2]

[1] [http://gawker.com/5192786/jimmy-wales-definitely-not-
getting...](http://gawker.com/5192786/jimmy-wales-definitely-not-getting-his-
wikipedia-jet-now) [2] [http://www.hoovers.com/company-information/cs/revenue-
financ...](http://www.hoovers.com/company-information/cs/revenue-
financial.wikia_inc.7c0c60deea305554.html)

~~~
snakeanus
I had no idea that he was also the owned of wikia. Quite disappointed.

~~~
dandare
Didn't he establish Wikia after he donated Wikipedia to non-profit? I see
nothing bad in Wales making few bucks form Wikia after what he helped to
create. The Wikipedia spending problem is a different story.

~~~
FireBeyond
In itself, no. But there was a lot of issues and conflict in relation to how
the creation of Wikia affected policy, and suddenly a lot of things that had a
"home" on Wikipedia were now being pushed to Wikia for a variety of reasons,
some valid, and some a lot less so.

~~~
dandare
Can you share more details or resources? I would like to learn more about
this.

------
II2II
The numbers are alarming, but the argument needs to be more persuasive.

For a proposal like this, I would expect to see a breakdown of where the money
has gone and realistic projections of futures expenses. Simply providing
alarmist statements based upon linear extrapolations (after being adjusted for
inflation alone) then proposing an even more alarmist buyout (when the only
thing of tangible value is the domain name) is not enough when proposing
potentially crippling changes to funding.

~~~
hk__2
> For a proposal like this, I would expect to see a breakdown of where the
> money has gone and realistic projections of futures expenses

The Foundation’s financial statements are online:
[https://en.wikipedia.org/wiki/User:Guy_Macon/Wikipedia_has_C...](https://en.wikipedia.org/wiki/User:Guy_Macon/Wikipedia_has_Cancer#References)

~~~
elect_engineer
I am the author of the Wikipedia essay "Wikipedia has Cancer". I have included
links to all of the financials that I could find in my essay, and I have
repeatedly asked for more transparency regarding how the money has been spent.

Just to take one example, Sue Gardner, Wikimedia Foundation executive director
from 2007 to 2014, received $100,000 pay rise and was secretly kept on as a
"special advisor" after we were all told that she stepped down in 2014. This
was kept secret for years, until it turned up buried in a required financial
disclosure.

I also would like a breakdown of where the money has gone and realistic
projections of futures expenses. What you see is my best effort to provide
that. If anyone has more information, I will be glad to add it to my essay.

------
ShinTakuya
I knew a guy who was a software engineer for Wikipedia back in uni. At least
back then (around 2011 or so) they were flying him and other software
engineers working for them to conferences across the world to Europe. Not only
that but the guy booked the tickets on a premium airline business class. He
said that everyone did that. You'd think working for a non profit you'd be
looking at discount airline economy class tickets.

~~~
nebolo
Why? Why would you expect a software engineer working for a nonprofit to look
at discount airline economy class tickers?

~~~
fdej
As an academic who travels a fair amount, I try to pick the cheapest available
flight (within reason), and I've certainly never even considered flying
business class. I think flying business class is permitted under certain
conditions if there's still money in our travel budget, but it wouldn't feel
like a good way to use taxpayer money. I think I would feel the same way
working for a nonprofit funded by donations.

~~~
praneshp
> but it wouldn't feel like a good way to use taxpayer money

Thanks. Curious, because this comes up on HN now and then: Do you make your
research public? I understand this might not just be in your hands though.

~~~
fdej
I do publish in paywalled journals but all my papers are freely available as
preprints from arxiv. Also, all software is open source.

~~~
praneshp
Hey, thanks!

------
tyingq
Wikipedia is also a bit cagey about whether their overall traffic is growing
or not.

Specifically, Google's changes to how they display Wikipedia derived info have
driven traffic down. Links that used to go to Wikipedia often now link to
Google's own sites with copies of the data.

[https://searchengineland.com/wikipedia-confirms-they-are-
ste...](https://searchengineland.com/wikipedia-confirms-they-are-steadily-
losing-google-traffic-228237)

~~~
atdt
What do you mean by "a bit cagey"? The Foundation is pretty open with its
data. [https://analytics.wikimedia.org/](https://analytics.wikimedia.org/) and
[https://stats.wikimedia.org/](https://stats.wikimedia.org/) are good starting
points.

~~~
tyingq
I was referring to this: [http://www.businessinsider.com/wikipedias-jimmy-
wales-slams-...](http://www.businessinsider.com/wikipedias-jimmy-wales-slams-
silly-claim-of-lost-google-traffic-2015-8)

Both of your links seem to confirm declining page views.

------
trothamel
Isn't it as much a matter of what the community gets out of the spending that
the Wikimedia foundation does? It seems like the problem is that the spending
seems divorced from the community of content creators, such that it's hard to
see how the spending has made Wikipedia better.

You have things like the comments system, Flow or Structured Discussions,
which is a step back from the current system and yet still has money being
spent on it.

------
matt4077
The financial statements linked in this rant don't support this sort of of
panic...

The WMF's assets increased by $20,000,000 over the last year, up to reserves
of now $90,000,000.

I'm not in a position to judge the effectiveness of their spending, but this
completely antiquated paragraph about the waterfall model of software
engineering makes me doubt the author's judgement.

~~~
elect_engineer
I am the author of the Wikipedia essay "Wikipedia has Cancer".

No, you are missing the point. I did not imply that Waterfall if better than
Agile (I have managed several Agile projects and really like it when it is
done correctly). What my point is is that the WMF is failing to do things that
Agile requires -- things that have been around so long that Waterfall required
them. Nothing about Agile allows you to build software in secret with no input
from the people who will be expected to use it and then throw it over the
wall.

It would again stress that this is not the fault of the developers. Their
management forbids them from talking to the ordinary Wikipedia editors who
will use the software.

------
tryingagainbro
IMO a lot of these expenses are because they have the money. I am sure they
could cut xx% of their staff plus travel expenses and the average user
wouldn't notice it. Eventually they might have to do it, but Wikipedia is
unique and it's a shame if it wasn't funded by (especially by) those making
tens of billions online.

------
elect_engineer
I am the author of the Wikipedia Essay "Wikipedia has Cancer". For the record,
I would prefer that this thread have that title (it was recently changed to
"Wikimedia Foundation's runaway spending growth".

I would also note that focusing on hosting costs pretty much ignores the point
of my essay, which is the exponential growth of spending and the dependence on
future exponential growth in revenue.

------
acd
What would happen if Wikimedia switched to the content to primarly use IPFS to
distribute the content to reduce bandwidth costs?

------
esturk
I try to alleviate as much bandwidth to Wikipedia as I can by using the Google
cache version whenever I can. I'm not sure if that's a good or bad thing.

~~~
brador
Don't do that. Your time is more valuable than the artificial micro-cent the
wikimedia foundation will save.

~~~
ZenoArrow
Why is using the Google Cache version not as equally productive as using the
main Wikipedia content?

~~~
brador
It might be equivalent, even faster, but that was not the problem he suggested
he was solving for.

~~~
ZenoArrow
To summarise, you suggest it's a waste of time, I ask how is it a waste of
time to click on one version of a link over another, you suggest there's no
difference in efficiency. The point I'm making is, GP is no worse off for
using the Google Cache version. They might miss out on a few edits made after
the page was cached, but in the grand scheme of things that's unlikely to
matter.

------
SomeStupidPoint
Isn't that what kills most charities that don't starve?

~~~
Alex3917
Wikipedia isn't a charity, it's a non profit. Their financials look fine, they
have over a year's budget in reserves. Not seeing the issue at all here.

~~~
fatbird
The issue is that the organization and spending have grown, but the mission
hasn't, and they're not obviously doing better or more than they were a decade
ago, except for a list of failed projects like Knowledge Engine that appear to
be badly managed internal efforts that had little hope of succeeding.

And all this is against a backdrop of continuing to present themselves as a
scrappy do-gooder surviving on fundraising and volunteer efforts.

The threat Macon identifies is that the bottom will drop out of the
exponential growth in revenue and spending, leaving them no option but to sell
out to Google, Facebook, or another party that will simply buy and monetize
them, destroying the basic value that Wikipedia _does_ still offer.

I don't know how worried to be, but I was genuinely shocked to find out how
much money they take in and how many staff they have, without fundamentally
being much different than I remember from a decade ago.

~~~
nqzero
the project that i'd like to see is a beta area where the notability
requirements are relaxed, with some sort of semi-automated process for
promotion, with the ultimate goal of covering a much larger corpus

wikipedia is great for mainstream stuff, but even technologies with vibrant
communities often lack a wikipedia page. and i'm sure the problem extends well
past tech

~~~
WikipediasBad
There's a site that tries to do that exact thing called Everipedia, but they
have been frequently lambasted in the press for being wrong in their articles.
I assume you can't relax notability while at the same time selecting for
accuracy. Still though, they are getting pretty decent traffic, and I like
their site design. Valiant effort, not that great execution.

~~~
Max_Mustermann
Everipedia is a blatant copy cat site run by a guy who doesn't let anyone
correct the BS on the article on him on his site.

See
[https://www.reddit.com/r/wikipedia/comments/74ba05/slug/dnxd...](https://www.reddit.com/r/wikipedia/comments/74ba05/slug/dnxddsy)

~~~
WikipediasBad
What do you mean? I went to that thread and came out more confused than
enlightened. And ya, I know they get stuff wrong from the news, I already said
that in my original post, I wasn't trying to shill for them, I made that
pretty clear.

------
transverse
Wikipedia should be replaced by an AI generated encyclopedia. This is the one
and only answer.

------
snakeanus
Related [http://wikipediocracy.com/2014/09/21/wikipedia-keeping-it-
fr...](http://wikipediocracy.com/2014/09/21/wikipedia-keeping-it-free-just-
pay-us-our-salaries/) and
[https://www.theregister.co.uk/2012/12/20/cash_rich_wikipedia...](https://www.theregister.co.uk/2012/12/20/cash_rich_wikipedia_chugging/)

Also, look at how much they spend for nothing
[https://meta.wikimedia.org/wiki/Wikimedia_Foundation_salarie...](https://meta.wikimedia.org/wiki/Wikimedia_Foundation_salaries)

~~~
stinkytaco
How are salaries nothing? Those don't even seem that extreme compared to what
someone with that skill set might make elsewhere in the private sector.

