
Ejabberd Massive Scalability: 1 Node – 2+ Million Concurrent Users - kungfudoi
https://blog.process-one.net/ejabberd-massive-scalability-1node-2-million-concurrent-users/
======
losvedir
Impressive! One thing that caught my eye was:

> _In the process, we also optimized our XML parser, released now as Fast XML,
> a high-performance, memory efficient Expat-based Erlang and Elixir XML
> parser_

The shout out to Elixir is interesting. I glanced through the linked GitHub
code and as far as I can tell nothing is written in Elixir, although it's
still a trivially true statement since Elixir can easily use Erlang libraries.
I guess Elixir is a big enough force in the Erlang community now that they
want to highlight that their software works fine for Elixir users as well!

~~~
mickael
Actually, Elixir support comes from API changes to be more Elixir friendly
(for example order of parameter to be "pipe" friendly). This is also about
submitting the component on hex.pm package manager to be easily integrated in
Elixir mix tool. And finally, this is about writing some part in Elixir. For
some projects we are even starting writing code directly in Elixir.

------
tiffanyh
No surprise here.

Whatsapp is based in Ejabberd.

Whatsapp has repeatedly documented how they achieve 2-3M connections per node.

[http://www.erlang-
factory.com/upload/presentations/558/efsf2...](http://www.erlang-
factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf)

~~~
zwily
I believe they use erlang, but not ejabberd.

~~~
tiffanyh
Whatsapp uses a modified version of ejabber. Ejabber is written in erlang.

[http://highscalability.com/blog/2014/2/26/the-whatsapp-
archi...](http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-
facebook-bought-for-19-billion.html)

------
phillu
Can someone point me to information on how to tune the linux kernel or OS to
handle such large amounts of TCP connections?

I encountered problems with the default kernel configurations during the
writing of my bachelor thesis and wasn't able to really read up on what could
have been done to solve those problems. For example i observed high cpu usage
only on the first core (similar to what the author describes in the article)
due to a high amount of network interrupts. I tried to make those interrupts
to be handled by all cores, which was neither recommended (as i remember) nor
did it really help.

~~~
tostitos1979
I suggest you look for slides from the Whatsapp folks on the kernel and Beam
vm tuning they did to get similar numbers. Don't have the link handy but
fairly easy to find.

What is more problematic IMHO is generating the load to saturate said server.

~~~
davidw
IIRC, the Whatsapp folks use FreeBSD.

------
ex3ndr
Not a surprise that just moving packets from one socket to another is fast. In
2016s you need to store messages on server, do big file transfers. Presence
need to be update every couple seconds. (like whatsapp) You need to have large
groups (500+ members) for your tests.

How it will work then?

~~~
mickael
You will know soon, as we are working on such a higher load benchmark. From
existing production service thought we expect very good result, especially as
the test server was not fully loaded.

------
xnyhps
Would be nice if they documented if they used TLS, and if so, how that was
configured. Even if the 28kb per user is a typo and they mean 28kB then that
would be impressive with TLS enabled.

~~~
mickael
This benchmark was done without TLS because we wanted to demonstrate the
performance of ejabberd and not the performance of OpenSSL itself. TLS is
still something you can offload to a load balancer however.

I updated the post to fix typo and mention that test was not over TLS.

------
tim333
Similar to other Erlang implementations like Pheonix I guess
[http://www.akitaonrails.com/2015/10/29/phoenix-experiment-
ho...](http://www.akitaonrails.com/2015/10/29/phoenix-experiment-
holding-2-million-websocket-clients)

~~~
hackerboos
Phoenix is Elixir not Erlang. Although they both run on the BEAM VM.

