
How Facebook pushes updates to the site - creativityhurts
http://www.facebook.com/video/video.php?v=10100259101684977&oid=9445547199&comments
======
rasmus4200
I loved this video. Gatekeeper blew me away.

Summarized some of the highlights here if you don't have time to watch:

[http://agilewarrior.wordpress.com/2011/05/28/how-facebook-
pu...](http://agilewarrior.wordpress.com/2011/05/28/how-facebook-pushes-new-
code-live/)

~~~
ronnier
I thought he said perforce and git, not subversion and git.

~~~
daveman692
Then corrected himself to subversion and git.

~~~
ronnier
Ahh, thanks. I wasn't listening that closely.

------
akent
I'm pretty impressed by the "push karma" system for gauging how risky
individual engineers' commits are on average.

Here's another (much shorter!) video where Chuck Rossi talks about push karma
very briefly:
[https://www.facebook.com/video/video.php?v=778890205865&...](https://www.facebook.com/video/video.php?v=778890205865&oid=9445547199&comments)

------
madamepsychosis
I like how their entire development cycle revolves around people getting drunk
on weekends.

~~~
kurtsiegfried
It's their entire business model as well.

~~~
cpeterso
JWZ suggests:

 _"How will this software get my users laid"_ should be on the minds of anyone
writing social software (and these days, almost all software is social
software). "Social software" is about making it easy for people to do other
things that make them happy: meeting, communicating, and hooking up.

<http://www.jwz.org/doc/groupware.html>

~~~
e40
This coming from a single person that owns a bar.

There are lots of other types in the world. Married, with children, etc.

------
abi
Anyone know of a more automated system to handle forward/backward
compatibility? Obviously, there's a lot of manual coding work that has to be
done but is there a system that categorizes these various changes (schema or
new URL for a page or change in backend service interface), automatically
tracks and gets rid of these dependencies after a certain period of time? To
give a concrete example, let's say I switched the Facebook messages URL to
"/mail" from "/messages". I would mark the old handler as deprecated and
eventually, after the new changes have been pushed to everyone, the system
will prompt the developer to get rid of essentially the dead code. This is a
very simplified example but I believe such deprecation tracking would be
useful for more complex changes too.

------
chuckr
I just wanted to mention one thing about this video. It's missing the first 3
minutes or so where I introduce myself and my team. It's also missing the part
where I gave credit for some of these slides to John Allspaw and Paul Hammond
from flickr who gave an awesome talk at the 2009 O'Reilly Velocity Conference.
Their talk inspired me to put together this presentation.

------
baby
So facebook is programed in PHP but everything on the server is in C++ thanks
to "hiphop"? mind=blown

~~~
getsat
PHP is actually one of the slowest mainstream interpreted languages. At
Facebook levels of scale, that becomes a serious problem.

~~~
prodigal_erik
It doesn't even take a billion hits. I'd say it's a problem as soon as you
scale to a couple dozen servers. At that point you're wasting enough money
running extra cores that you could have hired another dev instead. We picked
up something like 6x web frontend performance by doing a pretty
straightforward if tedious Java port (64-bit Sun JVM), and it also put us in a
position to start using NIO (with Netty) much more heavily.

~~~
getsat
Interesting, thanks. I don't have experience running PHP apps beyond a few
servers, and the database was always the bottleneck.

------
shin_lao
What he says at the beginning is to me the most important. Having great tool
is great (!), but the most important is to have the right culture about QA and
releases.

------
avar
It's interesting that their entire release architecture seems to be focused on
never pushing bad things out to production, whereas given their traffic they
could probably push things out much sooner (minutes after they're committed)
to small parts of their overall traffic, and slowly increase the traffic on
those pieces of code as they prove themselves to be stable, or quickly revert
them if they're not.

That would mean having a lot of versions of Facebook live at any one point,
but as those parts prove themselves stable they'd gradually be rolled out to
all of their traffic.

One point that also wasn't covered is that as they're pushing things out
they'll only cherry-pick parts of their codebase depending on which engineers
they have around. I wonder if they have a lot of hairy merge conflicts around
release time due to that, and bugs in production resulting purely from those
merge conflicts. Or worse, subtle bugs resulting in change A going out, but
being programmed against a function that was changed in change B, which is not
going out because the author of change B isn't in today.

~~~
jakevoytko
_"they could probably push things out much sooner (minutes after they're
committed) to small parts of their overall traffic, and slowly increase the
traffic on those pieces of code as they prove themselves to be stable, or
quickly revert them if they're not."_

The risk to user data is way too high.

This could have serious consequences. You could push client bugs with
erroneous API calls, or server-side bugs that cause data loss. Rolling back
isn't enough to fix the damage. The user's data has reached a permanent bad
state that they didn't intentionally reach. You could roll back the data of
every person who used the change, which would undo all of their work. You
could analyze the data and try to fix it. This might work, or it might get the
user data into a different bad state.

Plus, bugs in the view of the site might not cause errors that pop up in your
error console, since it's hard to write tests for "looks wrong." Obvious
errors - "when I click on my profile picture my name disappears" - are caught
by external people instead of internal people, which adds a level of
indirection between a problem appearing and a fix being written.

That being said, there are great uses for gradual rollouts. The video mentions
that they do this for mature features with Gatekeeper - the developer can
conditionally enable a Prod feature, and see what it does.

~~~
fizx
This is correct, especially for a company under the level of government
privacy scrutiny as Facebook. An erroneous push that exposes private user data
could lead to a _very_ heavy fine.

------
cpg
Anyone know the URL for the code review tool they said they use and open
sourced? He said "fabrication", however I cannot seem to find it. It's also
not listed in <http://developers.facebook.com/opensource/>

~~~
bbatsell
<http://phabricator.org/>

------
swah
This is how I push updates for now:

    
    
        git pull
        lein uberjar
        sudo restart myprj

~~~
pornel
I like

    
    
        git push web
    

<http://toroid.org/ams/git-website-howto>

~~~
smilliken
Fabric is a nice tool for automating deployments:
<http://docs.fabfile.org/en/1.0.1/index.html>

For example:

    
    
        fab production deploy
    

or:

    
    
        fab web-servers deploy
        fab database backup
    

.. etc.

------
brown9-2
Does anyone know if a non-video summary or set of slides for this video
exists?

~~~
abi
Hacker Newser rasmus4200 took notes here:
[http://agilewarrior.wordpress.com/2011/05/28/how-facebook-
pu...](http://agilewarrior.wordpress.com/2011/05/28/how-facebook-pushes-new-
code-live/)

------
maeon3
Was able to download this video with savevideo.me

~~~
rhizome
This actually wound up being a helpful comment. After switching to HD in order
to see his text examples more clearly, I found out that the session starts
over from the beginning and _you can't fast forward the video_ at all.

This meant that I had to either stop watching halfway through so that I didn't
have to sit through the same half-hour again, or rip the video so that I can
watch it like a human.

~~~
diN0bot
i fast forwarded with no problem. just click to where i wanted to start
watching in the video.

not HD mode. stable chrome for mac 10.6.

------
ignifero
So they don't really test the code, they just push it out, and fix the bugs on
Thursdays. Genius strategy. No wonder their platform breaks so often.

~~~
jpulgarin
The code is tested. There are unit tests, Watir tests, tests written by the
developer for that specific change, and all changes require a test plan from
the engineer.

~~~
aristus
We prefer the term "watirboarding".

