
Scaling Facebook Chat to 70 Million Active Users Almost Overnight  - edw519
http://www.facebook.com/notes.php?id=9445547199
======
ComputerGuru
As previous news articles state, Facebook has some implementation of XMPP
going on. XMPP was designed from the ground-up to deal with _exactly_ the
issues that he highlights, and is the ideal real-time implementation for any
system where everyone is expected to be aware of the statuses of all others on
the network (verses the traditional "poll the server every x seconds"
methods).

Even if Facebook isn't using XMPP per-say, they have full access to its
implementations and source code for internal use for sure.

Granted, Facebook _does_ have the "slight" challenge of having 70 million
active users; in light of which near everyone else's IM/XMPP networks are a
mere pittance; but the core framework and algorithms are wholly addressed and
implemented in XMPP standard.

It's one thing to make a more-efficient implementation of an already-existing
standard that scales damn decently verses _designing_ a whole new system to
serve their needs.

Note that the article doesn't once mention XMPP though.

~~~
huhtenberg
Err .. as far as I understood their Jabber/XMPP announcement, these are used
for interoperability and integration with 3rd party products only and _not_
for the internal implementation.

So it's only natural that the article doesn't mention XMPP.

------
9oliYQjP
I can't say that I prefer in-browser chats, but I can appreciate the
complexity of the solution. Scalability is the new manual memory management. I
can't help but think that somebody's going to come up with the scalability
equivalent of a garbage collector and make our lives a lot easier.

~~~
reitzensteinm
That's Erlang in a nutshell, though, isn't it?

~~~
systems
i think instant scalability will come from an application server, something
like glassfish for example

deploy your application to glassfish and let your AS scale it

~~~
aaronblohowiak
That doesn't really handle data access / storage bottlenecks, unless I'm
missing something.

------
alex_c
_The naive implementation of sending a notification to all friends whenever a
user comes online or goes offline_

Did I miss it, or does the note not mention how they actually implemented the
notification?

~~~
dabeeeenster
I didn't see an explanation for an alternative solution. I've been trying to
think of one all morning and can't!

Maybe he just means all friends whether they are logged in or not?

~~~
carterschonwald
no, they're not sending notifications at every event. As far as I can tell,
they're using an asynchronous algorithm that lazily propagates events and
provides no responsiveness guarantees. (sort of ultra mushy stretchy unreal
time guaranteed)

~~~
dabeeeenster
Sorry, but how is that different to sending all notification events to all
users? You are still sending all notification events to all users, whether you
do it lazily or not!

~~~
huhtenberg
I guess they meant they don't send a notification per message, but rather
batch them somehow.

~~~
carterschonwald
exactly! This leads to people having inconsistent (with reality) information
about the availability of other people who have just logged on or off!

------
aschobel
The dark launch idea is neat:

"The secret for going from zero to seventy million users overnight is to avoid
doing it all in one fell swoop. We chose to simulate the impact of many real
users hitting many machines by means of a "dark launch" period in which
Facebook pages would make connections to the chat servers, query for presence
information and simulate message sends without a single UI element drawn on
the page. With the "dark launch" bugs fixed, we hope that you enjoy Facebook
Chat now that the UI lights have been turned on"

Oh, they spilled the beans on using Erlang two weeks ago:

<http://news.ycombinator.com/item?id=179064>

------
dhotson
This is quite a remarkable piece of software engineering, very impressive
stuff. I'm really glad they're open enough about it to share their techniques.

Also, I'd never heard of doing a "dark launch" before, but it sounds like a
fantastic way to get early feedback from users.

~~~
tlrobinson
It sounds like the "dark launch" wasn't visible to users, it was simply to
test the capacity of their servers. It's a very interesting idea.

They did roll out the UI over the course of several hours though.

~~~
ryan
Also they did stage the launch over a few weeks. I noticed it appear on my
facebook page (because I'm in the Stanford network) well before it appeared on
most of my friends'

------
siculars
i think the take-away here is the "dark launch" mentioned in the last
paragraph, not necessarily the behind the scenes tech. although nice win for
erlang here.

first time i have heard a company mention, publicly, about pushing features
behind the scenes and testing in realtime. ajax makes this functionality
possible nowadays.

good job facebook.

~~~
interknot
Dark launching is cool, but Google did it last year with Gmail Chat:
<http://video.google.com/videoplay?docid=6202268628085731280>

------
tlrobinson
Very interesting stuff. I wish more companies would post details like this.

~~~
lyime
That indeed is amazing innovation. Who would have thought zukerbergs team
would have been open about their innovations. They are usually quite secretive
about their future plans. I think having this kind of conversation with their
user/developer community is amazing. More companies need to do this and dirty
with the technical stuff not just a high level talk.

------
neilk
This development story is so awesome it saddens me that it's such a terrible
idea for a product.

The world really doesn't need another proprietary chat standard, especially
one that locks people into a _website_.

~~~
kobs
[http://developers.facebook.com/news.php?blog=1&story=110](http://developers.facebook.com/news.php?blog=1&story=110)

------
tdavis
I correctly guessed at the use of Erlang for the web servers; persistent
connections and pushing is a must and Apache is hardly designed for so many
persistent processes. Thrift was also a pretty easy call considering it's a FB
project; I still want to check that out, too.

The information wasn't incredibly in-depth but it's very cool and useful
nonetheless to read about implementations like this on such a large scale. The
chances of me ever creating something with the scale and resources that FB
requires is pretty slim, but it's gratifying to know I've at least got a rough
idea of some good ways to do it.

Now, if we could just get Twitter to do the same, perhaps someone could give
them a few pointers... ;)

 _"Did I miss it, or does the note not mention how they actually implemented
the notification?"_

No, it doesn't go over that implementation, though it piqued my curiosity
nonetheless. I would assume it's a time-based check on status rather than a
real-time representation.

------
brlewis
As I look at this article now, it reads "userbase" not "active users." Did an
earlier version try to claim facebook has 70 million active users?

------
axod
Considering they have 10,000 servers+, I don't think "scaling" is that big a
feat.

So say they have every active user on at the same time (70million), and say
they have 10,000 servers. That's only 7k users per server??? Not great IMHO

Maybe if they only had 1,000 servers, then it'd be a little more impressive.

~~~
fendale
Yea, 10,000 servers is a serious number - is that actually an accurate figure?

~~~
axod
"Facebook does not disclose the number of servers it operates. But research
firm Data Center Knowledge puts the tally at about 10,000. The slug of cash
will help Facebook buy approximately 50,000 more servers"

60,000 servers? Jesus christ. Are they planning to scale to take account of
Alien users or something?

~~~
fendale
What I don't understand is why Facebook feels slow each time I logon with
these 10K servers - its often one of the slowest sites I visit!

If they are intending to have 60K servers, they better keep a slug of that
cash around to pay the electric bill!

------
volida
also the 3 guys in garage don't have the luxury of access to so many servers
and money...

------
keating
The only reason this is considered impressive is because so many other
services have set the bar so _low_. This isn't rocket science, it just
requires thinking ahead and designing for scale.

~~~
neilk
What you would call rocket science? This is pretty impressive.

Erlang does a lot of the heavy lifting but even if it did everything out of
the box (and it didn't), gluing it all together is no small feat.

~~~
keating
A chat server? _Yawn_. And it's not for 70 million people at once, that's
_user base_ , not _simultaneous logins_.

AIM, Bonjour, Gadu-Gadu, Google Talk, GroupWise, ICQ, IRC, MSN, QQ, SILC,
SIMPLE, SameTime, XMPP, Yahoo, _this is a solved problem_.

~~~
ggrot
Agreed. I find it ironic since a piece like this is likely written at least
partially in an effort to attract programmers to facebook. To me it reads:
"Come work for facebook and re-invent online chat. Again." Now, if the article
had been about how they pushed the state of the art, I would be pretty
interested, but none of this not new technology.

As for the dark launch thing, it is a fancy trick, but there are ways of doing
load testing in an automated system by having test servers simulate the load
from real users. This usually can give you much better data without wasting
bandwidth, slowing down users' experiences, etc.

