
How we made editing Wikipedia twice as fast - ecaron
http://blog.wikimedia.org/2014/12/29/how-we-made-editing-wikipedia-twice-as-fast/
======
bhauer
Great write up. This will be a good point of reference for future debates
concerning the value of selecting high-performance platforms for web-
applications. A common refrain among advocates of slower platforms is that
computational performance does not matter because applications are invariably
busy waiting on external systems such as databases. While that may be true in
some cases, database query performance is often only a small piece of an
overall performance puzzle. Blaming external systems is a too-convenient
umbrella to avoid profiling an application to discover where it's actually
spending time. What to do once you've made external systems fast and your
application is still squandering a hundred milliseconds in the database driver
or ORM, fifty milliseconds in a low-performance request router, and another
250 milliseconds in a slow templater or JSON serializer?

Yes, three seconds for a page render is still uncomfortably slow, but it's
substantially faster than the original implementation, and unsurprisingly
frees up CPU for additional concurrent requests. It's a shame Wikimedia didn't
have this platform available to them earlier.

Today web developers have many high-performance platform options that offer
moderate to good developer efficiency. Those who use low-performance platforms
may do their future selves a service by evaluating (comfortable) alternatives
when embarking on new projects.

~~~
balls187
> This will be a good point of reference for future debates concerning the
> value of selecting high-performance platforms for web-applications.

Keep in mind, Wikipedia is at scale that most web applications will never,
ever get to.

It's important to think about performance and scale, but it's not the only
important trade off engineers should concern themselves with.

~~~
kmavm
The win here wasn't about scale, though. A tiny fraction of wikipedia users
are logged in.

The win here was about individual page load time. And page load time is just
as important, if not more so, for something new trying to vigorously grow as
it is for the big sites.

(Disclaimer: HHVM alum.)

~~~
sbov
Yeah, this. We helped my boss' brother out with his wordpress ecommerce site.

His machine could easily handle the traffic thrown at it, but the page load
time was slow. 2 seconds at best, with all machines involved being almost 100%
idle. With various common tweaks such as caching, we got it down to about
800ms. We eventually replaced it with our own solution and got it to 50ms.

Scale never entered the picture because from our testing, we had the machinery
in place to handle enough traffic to be wildly profitable. The user experience
at 2 second page load times was vastly different from the user experience at
50ms though.

~~~
lonnyk
Could you describe what your own solution was?

~~~
sbov
It wasn't anything special. We used Java because it's what we knew, with a mix
of spring and some home grown framework stuff that we've developed over the
last 10 years.

MySQL over Postgres because that's what we had experience with at the time.
Redis as both a cache and ephemeral store.

------
tokenadult
Several of the previous comments here have quoted a key, interesting fact from
the submitted article: "Between 2-4% of requests can’t be served via our
caches, and there are users who always need to be served by our main
(uncached) application servers. This includes anyone who logs into an account,
as they see a customized version."

That's my experience when I view Wikipedia. I am a Wikipedian who has been
editing fairly actively this year, and I almost always view Wikipedia as a
logged-in Wikipedian. I see the Wikimedia Foundation tracks the relevant
statistics very closely and has devoted a lot of thought to improving the
experience of people editing Wikipedia pages. I can't say that I've noticed
any particular improvement in speediness from where I edit, and I have
definitely seen some EXTREMELY long lags in edits being committed just in the
past month, but maybe things would have been much worse if the technical
changes this year described in this interesting article had not been made.

From where I sit at my keyboard, I still think the most important things to do
to change the user experience for Wikipedia editors is to change the editing
culture a lot more to emphasize collaboration in using reliable sources over
edit-warring around fine points of Wikipedia tradition from the first decade
of Wikipedia. But maybe I feel that way because I have worked as an editor in
governmental, commercial, and academic editorial offices, so I've seen how
grown-ups do editing. I think the Wikimedia Foundation is working on the issue
of editing culture on Wikipedia too, but fixing that will be harder than
fixing the technological problems of editing a huge wiki at scale. Human
behavior is usually a tougher problem to solve than the scalability of
software.

By the way, the article illustrates the role for-profit business corporations
like Facebook have in raising technical standards for everybody through direct
assistance to nonprofit organizations running large websites like the
Wikimedia Foundation. That's a win-win for all of us users.

~~~
kanamekun
<< From where I sit at my keyboard, I still think the most important things to
do to change the user experience for Wikipedia editors is to change the
editing culture a lot more to emphasize collaboration in using reliable
sources over edit-warring around fine points of Wikipedia tradition from the
first decade of Wikipedia. >>

Completely agree with you. Also agree that fixing culture is often harder than
fixing technological scaling problems!

All that said, kudos to Wikimedia Foundation for addressing the speed issues
for uncached pages. Great work!

------
jedberg
> and there are users who always need to be served by our main (uncached)
> application servers. This includes anyone who logs into an account, as they
> see a customized version of Wikipedia pages that can’t be served as a static
> cached copy

I keep hearing this, but it isn't true anymore. For something like wikipedia,
even when I'm logged in, 95% of the content is the same for everyone (the
article body). You can still cache that on an edge server, and then use
javascript to fill in the customizations afterwards. This will get you two
wins: 1) The thing the person is most likely interest in will load quickly
(the article) and 2) your servers will have a drastically reduced load because
most of the content still comes from the cache.

The tradeoff of course is complexity. Testing a split cache setup is
definitely harder and more time consuming as is developing towards it. But
given the page views of Wikipedia, would be totally worth it.

~~~
twic
You don't even need to use JavaScript - you can do it in the cache with edge-
side includes:

[https://www.varnish-
cache.org/docs/3.0/tutorial/esi.html](https://www.varnish-
cache.org/docs/3.0/tutorial/esi.html)

~~~
natrius
After using ESIs with Varnish for a project, I'd never do it again. Cached
static pages with Javascript that pulls in the dynamic parts is an easier
solution to maintain and has better failure modes.

~~~
twic
Interesting! Could you tell me more about the problems you found with ESI?

------
leeoniya
worth mentioning: phpng (PHP7) has cut cpu time in half over the past year [1]
(scroll down), don't know what the mem situation is. i don't know if HHVM has
additional advantages over plain PHP, but certainly the list of major benefits
will be smaller by the next major version.

[1] [https://wiki.php.net/phpng](https://wiki.php.net/phpng)

~~~
jaredmcateer
The main advantage is that HHVM is available now, PHPNG won't be stable until
later this year at best. For my code base HHVM isn't compatible so we're
eagerly awaiting the stable release of NG but damned if we didn't try to
upgrade to HHVM.

~~~
psykovsky
The advantage of PHPNG is that it has a dedicated development team who will
not go away when Facebook goes the way of Myspace. And, yes, I know it is open
source and development can continue. My point still stands.

~~~
renaudg
This is cute wishful thinking, but what exactly makes you think that whatever
company is behind PHPNG will outlive friggin' Facebook ?

~~~
nhtechie
PHPNG is actually the official next version of PHP. If the PHPNG team is dead
that means no active development is being done on PHP (period).

~~~
detaro
If PHPNG dies they simply could elect a different environment to be the
official one. Just because PHPNG is "the future" right now doesn't mean the
fate of the language is tied to it forever.

~~~
TazeTSchnitzel
It's all well and good labelling HHVM the new official environment, but a lot
of existing code does not work on HHVM.

------
Shish2k
> Between 2-4% of requests can’t be served via our caches, and there are users
> who always need to be served by our main (uncached) application servers.
> This includes anyone who logs into an account, as they see a customized
> version

I run a similar site (95% read-only), and have been pondering whether it would
make sense to use something like Varnish's Edge Side Includes (Like SSI,
combining cached static page parts and generated dynamic page parts) -- I
wonder if they've considered that and what the results would be like?

~~~
dugmartin
I've played with this configuration but never put it into production:

Varnish (with ESI enabled) -> Nginx (with memcached module enabled) -> PHP-
FPM.

Most of the page is served with Varnish with the ESI directives hitting the
Nginx server which serves the fragments from memcached if present, otherwise
the PHP-FPM server is hit which then returns the results and sets the fragment
in memcached with a low ttl.

------
nosage
[https://ganglia.wikimedia.org/latest/](https://ganglia.wikimedia.org/latest/)

ooo pretty!

~~~
yuvipanda
And [http://gdash.wikimedia.org/](http://gdash.wikimedia.org/) :)

------
laurencerowe
It's great to see such a significant improvement, but it goes to show just how
limiting the CGI era architecture really is.

A modern persistent web apps running in Python/Java/Ruby/etc is able to
perform preparatory work at startup in order to optimize for runtime
efficiency.

A CGI or PHP app has to recreate the world at the beginning of every request.
(Solutions exist to cache byte code compilation for PHP, but the model is
still essentially that of CGI.) Once your framework becomes moderately complex
the slowdown is painful.

~~~
eru
If you write it right, you should even be able to cache the state of the whole
execution right until your code sees the first byte that depends on the user
request or the environment. Sort of like a copy-on-write, but it's copy-on-
read.

------
ExpiredLink
BTW, what's the relationship between Facebook and Zend? Is PHP now 'owned' by
Facebook?

~~~
smsm42
No, PHP is not owned by Facebook, and wasn't owned by Zend either. Zend
contributed a lot into PHP development (phpng effort is largely sponsored by
Zend) but it doesn't mean it "owns" PHP. Nobody really owns PHP. Facebook owns
HHVM and Hack (which I guess was one of the reasons they created it - with
organization this large, it makes sense to have a platform that is predictable
for them - i.e. owned by them). Some people from Facebook also contribute to
PHP, including those working for HHVM/Hack team.

~~~
TazeTSchnitzel
Also note that the Zend Engine, confusingly, isn't made by Zend, it's just a
component of PHP. Zend Technologies, Inc. came after the Zend Engine, but both
were created by the same duo ( _Ze_ ev Suraski and A _nd_ i Gutmans).

------
virtuabhi
Facebook is doing a lot of awesome open-source work. They already have Open
Compute, many projects on Github -
[https://github.com/facebook](https://github.com/facebook), and sent a
developer to MediaWiki to help with the migration to HHVM. I hope Facebook
keeps open-sourcing their internal projects (in addition to contributing in
existing ones)!

------
B-Con
> Between 2-4% of requests can’t be served via our caches, and there are users
> who always need to be served by our main (uncached) application servers.
> This includes anyone who logs into an account, as they see a customized
> version of Wikipedia pages that can’t be served as a static cached copy,

Is this why we get logged out every 30 days, to boost cache hits for users who
rarely need to be logged in? (It seems like every time I want to make an edit
I have to login again.)

------
drzaiusapelord
Curious why they're using squid and not varnish for caching. Weird how they're
progressive with PHP but still sticking with the antiquated squid.

~~~
neilk
I think Gabriel was referring to a historical moment -- Wikimedia does use
Varnish now.

"We currently use Varnish for serving bits.wikimedia.org,
upload.wikimedia.org, text of pages retrieved from WMF projects, and various
miscellaneous web services. Nothing uses Squid."

\--
[https://wikitech.wikimedia.org/wiki/Varnish](https://wikitech.wikimedia.org/wiki/Varnish)

------
PanMan
Honest questions: What kind of benchmark do others uses for a 'reasonable'
response time? Of course it fully depends on the use-case (rendering a video
can be hard in 500ms), but for user facing stuff? In my previous startup we
tried to stay within 500ms. Not saying this isn't a great improvement, but to
me 3s still sounds quite long? (not saying it's easy to do quicker!)

~~~
sparkman55
There's a difference between requests that the user "expects" to take a long
time, and those that can never be fast enough. For example, POSTs, credit card
transactions, and things like Wikipedia edits generally have lengthy forms
prior to the request, and the user can tolerate a correspondingly-lengthy
response time. I prefer 2s as a target for anything like that, and rely on a
queue for asynchronously processing anything that takes longer.

For GET requests, particularly those reached by clicking a link from elsewhere
on the site, faster is better... Luckily, many of these types of requests can
leverage a cache.

------
illumen
= How to make it take 0.0 seconds to 1.0 seconds.

Save early, before the Editor actually presses save. Only commit the change if
the Editor actually presses save.

This will improve the Editor experience by making save faster at the expense
of CPU time. Predict well enough, or do enough processing client side, then
you won't need extra server side CPU used.

What is more precious to you? The human Editors or some dumb pieces of
silicon?

~~~
RobAley
When you don't have much funding, both are extremely important, and when you
can't buy more silicon, you either need to optimise the code or let the human
wait.

~~~
illumen
Sure.

However, the method I mentioned can be done with zero extra server load if
done correctly. I've done this before with other editors, and it definitely
did not take a team 12 months to complete.

It is a full stack optimisation however, from UI all the way through the DB
and the rest of the stack. Now that they're using only 10% load, there's
plenty of room to breath.

Saving a draft of what someone may have spend hours typing into a crash prone
browser is good UI practice regardless.

~~~
RobAley
Again, "full stack optimisations" cost money, if there's not the money to
throw more machines against it (which is often, though not always, cheaper
than dev time) then there's not likely to be the money for extensive dev time
either.

Saving a draft is indeed good UI practice, but best practice is never
"regardless" in the real world, particularly in cash/manpower strapped non-
profits.

~~~
illumen
There was over a year of dev time effort put into this optimisation by a team
of people.

I think the effort of my optimisation would cost less time and effort, and
give better results.

It would also waste less Editor time.

Mediawiki already allows drafts to be saved.

------
putlake
Here's a video of the author's presentation at Scale Conf on migrating
Wikipedia to HHVM: [http://www.dev-metal.com/migrating-wikipedia-hhvm-scale-
conf...](http://www.dev-metal.com/migrating-wikipedia-hhvm-scale-
conference-2014/)

------
Cherian
Are the Wikipedia server and network performance graphs public?

~~~
mtmail
yes, you can access them on
[https://wikitech.wikimedia.org/](https://wikitech.wikimedia.org/)

~~~
yuvipanda
[http://ganglia.wikimedia.org/latest/](http://ganglia.wikimedia.org/latest/)
and [http://gdash.wikimedia.org/](http://gdash.wikimedia.org/) too

------
dynjo
Still slower than [https://slimwiki.com](https://slimwiki.com)

------
vladmk
Ward Cunningham, now there's a guy who should have played on an nfl team.

------
programminggeek
I feel like 3 seconds to load a page is still slow. I assume that is time to
build a page that doesn't hit cache, and that Wikipedia is using something
like varnish to cache pages most of the time.

Still, 3 seconds to load a page feels like a slow page and should be a lot
faster.

~~~
sp332
Wikipedia only caches the latest version of each page. Older versions have to
be rebuilt on the fly every time.

~~~
sarciszewski
Hmm, this sounds remarkably like a possible DoS vector.

~~~
Zikes
I would be surprised if the servers/instances responsible for rebuilding old
pages had any other responsibilities. If that's the case, DoS-ing by flooding
old page versions would likely only bring down that particular site feature.

