
Scaling to millions of simultaneous connections (2012) [pdf] - pitchups
http://www.erlang-factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf
======
biokoda
Video of talk [http://vimeo.com/44312354](http://vimeo.com/44312354)

~~~
ddoolin
Thanks for sharing this.

------
ballard
Given the same problem, I'd look at a reactor pattern MQTT mosquitto like
pubsub protocol in Go, Erlang or something JVM/CLR. The beauty of publish
subscribe of many low bandwidth clients is that they need tiny bits low
latency work in a sizable bandwidth app. Could run BINC/CBOR RFC 7409 over
zeromq on lower-end infiniband between boxen. Millions of IM users are just
similar enough to tons of smart meters that the solution overlap is enough,
but obviously people are more bursty with jagged sinewave daily traffic loads.

Let the clients handle end-to-end encryption with something zk DH/OTR/SMP.

I'm curious how they later monitored this and did other common SRE stuff. Also
deployments.

~~~
lmb
You could use also Apollo, ActiveMQ's successor, which comes with an MQTT
adapter and runs on the JVM. Has the added benefit of supporting other
messaging protocols as well. I'm not sure how well mosquitto would hold up, do
you have any experience with it?

As an aside, MQTT has a constrained device optimized sibling MQTT-SN, which
works over UDP and supports sleepy nodes, etc. Unfortunately that one's stuck
in licensing hell and does not yet have good tooling, at least last time I
checked.

~~~
nivertech
My company selling Telecom-grade distributed MQTT broker and MQTT-SN gateway.
Both written in Erlang/OTP and come with 1M concurrent clients SLA (was tested
with even more).

I gave a talk about MQTT-SN Gateway at Erlang User Conference 2013:

[http://www.erlang-
factory.com/conference/ErlangUserConferenc...](http://www.erlang-
factory.com/conference/ErlangUserConference2013/speakers/ZviAvraham)

[http://www.erlang-
factory.com/upload/presentations/807/ZviMQ...](http://www.erlang-
factory.com/upload/presentations/807/ZviMQTTS_for_EUC2013.pdf)

------
skywhopper
Truly impressive scaling and efficiency. This presentation is two years old.
I'd love to hear how far they've come since. My limited experience dealing
with many many orders of magnitude fewer users has found the same thing this
presentation stresses, though: today's hardware is outrageously fast and
powerful. The slowdowns are all contention contention contention.

------
ajtulloch
An interesting historical note is that Facebook's chat system was originally
written in Erlang
([https://www.facebook.com/note.php?note_id=14218138919](https://www.facebook.com/note.php?note_id=14218138919)),
before being replaced with a C++/Java system.

~~~
gaius
I am more interested in why they switched away from Erlang, that is the
opposite of the usual story.

~~~
biokoda
They had trouble finding/training enough Erlang engineers.

~~~
vladimirralev
It's hard to believe Facebook can't find Erlang engineers. May be their
interview process got in the way. If you use standard algorithm/datastructures
interview questions to hire Erlang devs you will have a hard time. All normal
data structures in Erlang are immutable and algorithm complexity is a
controversial topic there. Using Mnesia/ETS is a bit of a taboo for interviews
(like using SQL on your coding task in a C/Java interview). Many Erlang devs
would dismiss such questions as irrelevant and try to move on asap which is a
big no-no in interviewing.

~~~
rudiger
Sorry, but any good programmer will know the conventional data structures and
algorithms, regardless of whether they write programs in Erlang, C or some
other language.

~~~
mamcx
That is a bad bias. "Data structures & algorithms" could be a footnote in the
history of a developer, plus, which "data structures and algorithms" will you
use to hire one? The ones relevant to a erlang dev are not the same to a
C/embeded one.

------
gordonguthrie
From [http://sequoiacapital.tumblr.com/post/77211282835/four-
numbe...](http://sequoiacapital.tumblr.com/post/77211282835/four-numbers-that-
explain-why-facebook-acquired)

32\. Even by the standards of the world’s best technology companies, WhatsApp
runs lean. With only 32 engineers, one WhatsApp developer supports 14 million
active users, a ratio unheard of in the industry. (WhatsApp’s support team is
even smaller.) This L E G E N D A R Y crew has built a reliable, low-latency
service that processes 50 billion messages every day across seven platforms
using Erlang, an unusual but particularly well-suited choice. All that, while
maintaining greater than 99.9% uptime, so users can rely on WhatsApp the way
they depend on a dial-tone.

~~~
Aloisius
Is it really that unheard of ratio for the backend? Some of those developers
must be client engineers, so their ratio is probably a wee bit better than 14
million to 1, but really the backend service is rather simple and shouldn't
change much whether you have 5 million or 50 million users. Further when you
control the client and the server, hiding downtime is stupidly easy.

The same is true of client engineers too I suppose. One developer putting out
an app to 1 person is identical to putting one out for 100 million. The work
is related to the number of clients, not the number of installs.

~~~
gordonguthrie
When the guy from Sequoia, who has the inside numbers on the top companies,
says it is unheard of, its probably unheard of.

The backend service is rather simple for 50 billion messages a day? hmmm not
so much, I'm thinking.

~~~
Aloisius
With the right architecture, it is largely a horizontal scaling problem.

I ran and scaled the Napster server to 80 million users. We pushed about 70
billion "index file" commands an hour alone at peak. We had a chat and
messaging system that also pushed millions of messages an hour.

There were some pieces of data that had to live on all the servers (what user
was connected to what server) which would have caused some problems eventually
(though with how much memory you scan stick in today's machines, not for a
_very_ long time), but from experience, 10k really was the same as 80 million.

The code to do this was a relatively small amount of C++ code. Most of our
problems with scaling had to do with bugs in Linux when it was younger and
"line-speed" Cisco switches that would blow up just because you pushed a few
million packets per second through them.

~~~
azth
Wow, you worked on Napster? You mention the server was written in C++. Out of
curiosity, do you know what the client was written in? I found a random forum
post by an anonymous user that said it was written in Delphi.

~~~
Aloisius
The Windows client was written in C++ (MFC iirc).

------
sandGorgon
Would something like golang even need lock-counting like BEAM ? I think the
idiomatic way to work with golang is goroutines rather than mutex, etc.
However, I'm not sure if any goroutine mailbox profiling tools are available.

~~~
masklinn
> Would something like golang even need lock-counting like BEAM ?

BEAM locks are within the VM's implementation[0]:
[http://www.erlang.org/doc/apps/tools/lcnt_chapter.html](http://www.erlang.org/doc/apps/tools/lcnt_chapter.html).
BEAM lock counting is about introspecting runtime state and capabilities.

I doubt Go's runtime is lockfree. The answer would thus be "yes, most likely".

> I think the idiomatic way to work with golang is goroutines rather than
> mutex

1\. depends: [http://stackoverflow.com/questions/10728863/how-to-lock-
sync...](http://stackoverflow.com/questions/10728863/how-to-lock-synchronize-
access-to-a-variable-in-go-during-concurrent-goroutines)
[https://code.google.com/p/go-
wiki/wiki/MutexOrChannel](https://code.google.com/p/go-
wiki/wiki/MutexOrChannel)

2\. I doubt channels implementation is lock-free, though I could well be wrong

[0] and maybe in NIF[1] as well

[1]
[http://www.erlang.org/doc/tutorial/nif.html](http://www.erlang.org/doc/tutorial/nif.html)

~~~
AndreasFrom
You're right and they're not currently a goal:

"Non-goals: make channels completely lock-free (this would significantly
complicate implementation and make it slower for common cases)"

[http://talks.golang.org/2014/go1.3.slide#8](http://talks.golang.org/2014/go1.3.slide#8)

------
peterwwillis
571k pkts/s is pretty respectable, probably nearing or at the bandwidth limits
of their network interface. but i'd like to know how many new connections per
second can they do (avg), and how long does it take to reach 2.8M connections?
(if it takes a long time to open a connection, it could take a while to
establish all those open connections, and a reset of all conns could create a
long wait for new users to connect)

------
ralphc
Could similar scaling be achieved with Scala & Akka or did I just make Erlang
devs snicker.

------
Helianthus
If you want evidence that HN has a fetish for money, it's that the technical
details of Whatsapp's success are only closely inspected now that gratuitous
cash has been thrown around.

You may view this comment as too cynical for this _particular_ topic, but the
ridiculous surge in Whatsapp stories doesn't lie. And of _course_ we should
inspect the action of successful companies.

I am merely pointing at the crude vulgarity of the Valley's barometer for
success.

~~~
creamyhorror
Is it really vulgar to desire money and take it as indicator of success, or
are you holding people here to rather high standards?

~~~
Helianthus
>Is it really vulgar to desire money and take it as indicator of success

Yes? It is not only vulgar, it is utterly clueless.

~~~
x0054
Money is certainly not an indicator of true success, other than, of course,
success of acquiring money. However, why is money vulgar. Sure, there are many
vulgar things one can purchase with money. But money in and of it self isn't
vulgar.

Many people here want to understand how WhatsApp managed to convince someone
to pay so much money for it, perhaps so they can replicate this success and
also become rich. What's so wrong with that?

~~~
toomuchtodo
> perhaps so they can replicate this success and also become rich. What's so
> wrong with that?

My theory: Perhaps because an ever growing population of tech professionals
are disgusted with the race for more wealth?

I will be the first to admit, I'm an Elon Musk fanboy, although I think its
justified. Instagram. Whatsapp. Messaging apps! Billions of dollars! For
messaging! With that same amount of cash, Musk has delivered goods to an
orbiting space station and has succeeded in changing the direction of
transportation, something an entire auto industry wouldn't (not couldn't,
wouldn't) do.

I could be wrong. Perhaps money is still the end goal for everyone. If its
just to have money for money's sake, I feel sad for you. If its to pursue your
passion, I think we can all empathize with that.

~~~
gaius
Musk had to make his fortune first...

~~~
toomuchtodo
I get making your fortune. My point was: Use the opportunity you get from it
wisely.

