

Functional Programming at Facebook. (30 min video, CUFP talk.) - eelco
http://www.vimeo.com/6699769

======
davidw
Will trade upvote for a summary:-) I know they use Erlang for their chat
system. What else?

~~~
wallflower
Here you go...

It was an excellent presentation by "the Erlang guys" at Facebook. Makes me
wish I had a scaling problem to throw Erlang at. (I wonder if Google App
Engine will ever support Erlang!)

"We like to think of Facebook as a communications platform, enabling
communications between users". Facebook chat allows users to talk with each
other in real-time (vs asynchronous, slow Inbox, Wall, Comments)

"Erlang is the right tool for the job" (they would not change it if they had
to do it all over again) "Engineers are uncomfortable with functional
programming. Facebook does not hire specifically for FP" Internal tech talks
have gotten people excited, even if other FB engineers are not comfortable
diving in - advertises the Chat guys as the go-to people for FP

Divisive in company: Majority of code/coders are PHP/C++ - minority in company
(they are not outcasts but isolated, adoption outside chat is non-existant).
However, the projects at FB tend to be self-contained - as long as it works.
They are the people who maintain the chat - so not enough time to move until
cool stuff with FP (but there is new stuff coming out with Jabber in Erlang).

Erlang and Haskell in use (~2:00)

Polyglot programming: Use the right language for the job

They use Thrift to write interoperable servers and clients (IDL like Corba but
simpler). The Thrift bindings for Erlang were written in-house, as a pre-
requisite for that FB chat servers.

Code gets out to production within a week, if not within the same day

Haskell doesn't really run on FB but in a sense it generates the code that
runs FB (PHP transforms) (via audience question from Haskell language user -
'now I can impress my kids!')

Chat Development Timeline (~5:30)

2007: There weren't many if any _English_ Erlang resources (blogs, message
boards etc.)

Jan 2007: Chat prototyped

Fall 2007: 4 engineers, 0.5 designers

Winter 2007-08: Code, code, code (Learn Erlang - wasn't easy to learn)

Feb 2008: Dark Launch - Simulate load on the Erlang servers

April 6, 2008: First user message

April 23, 2008: 100% roll-out (at the time, 70M users)

Today: 1M messages/day

Peak: 10M active users (Gigabyte of traffic @ peak)

"Workload has increased dramatically. Refined the servers such that we've be
able to outpace the growth of the users with the growth of the efficiency of
the servers"

Chat Architecture (~7:00 on)

Channel servers deliver messages to browsers. 1 active request/browser tab

The greatest sweet spot of Erlang is the ability to harness concurrency and
parallelism on a massive scale.

Very easy to model concurrent interactions: each chat user is independent and
concurrent entity (~9:17)

Create an Erlang process for every user

Distribution (~10:00)

Connected network of nodes. Remote processes look just like local processes.
Naive load balancing - Rest of nodes pick up work - all nodes are trusted and
local and behind firewall

Fault Isolation (~11:00)

Had horrible bugs (process leaks, unintended multicasting, bad return states)
in initial version of chat but they didn't kill the functional system - the
bad Erlang process didn't crash the system, notifies the link processes and
they are able to rebuild the state _right_ away

Error logging

Stack traces from crash reports are very nice.

Since it's a functional language, have _all_ of the arguments that go into
execution (can find bug and _simulate_ if necessary)

Too much error logging runs the Erlang node OutOfMemory

Hot-code swapping. (~13:00)

Can push new functional code in about 20 seconds without anyone losing
state/kicking anyone off-line. Can push old code over new code quickly if
something doesn't work

Also, if code crashes one server (they have 10 servers) - the work can be
pushed to the other servers so they have more confidence in the hot code
swapping features.

Monitoring and Error Recovery (~13:41)

Supervision hierarchy lets you control your processes

Extended Supervisor to quickly discover what each child name's process id is

Hibernation (~14:30)

Long running, idling HTTP request handlers don't use any memory when they are
idling

Erlang hack: Cheating single assignment to allow direct sharing of arrays/mem
structs with C++? (~15:33)

~~~
vicaya
I found 1M messages/day rather too low. So I checked the video, sure enough
they have 1B messages/day using close to 10Gbps at peak, with 100+ servers. A
correctly built (epoll/reactor based) C/C++ based system should be able to
handle that with ~10 servers.

I suspect that erlang is successful despite of the language itself, mostly
because it's also a mature product: a well tested production quality actor
system with decades of tweaking.

I liked the Q&A especially whether a better type system would catch the
"horrible bugs" they had :)

~~~
mononcqc
Presence probably takes more juice than the billion messages, and a good chunk
of these servers can probably be there as fallbacks in case of failures. Just
a supposition.

They do use C++ for the chat system, it's mainly the channels/presence (If
memory serves me right) that are implemented in Erlang.

