
Achieving 100k connections per second with Elixir - slashdotdash
https://stressgrid.com/blog/100k_cps_with_elixir/
======
dzik
This article is quite good, especially part about bottleneck caused by single
supervisor in ranch. However I have to say that title is a bit misleading
because all of this has nothing to do with Elixir, it's all about Linux kernel
and Erlang, cowboy and ranch are written in Erlang.

Having said that, I will add that I think it is good to have Elixir.

~~~
csisnett
"is a bit misleading because all of this has nothing to do with Elixir"
Stressgrid is written in elixir though,
[https://gitlab.com/stressgrid/stressgrid](https://gitlab.com/stressgrid/stressgrid)

~~~
dzik
Point taken and I am already looking at stressgrid, "millions of users" is
definitely a selling point to me. It is actually quite hard to generate enough
and correct traffic to stress test large distributed systems.

------
rargulati
I'd love to see data on the average on-call incidents for an application
written in language X (say Go) vs those written in Elixir.

Concretely, its it the case, for an application where Elixir/Erlang/Beam are a
great choice, but also, another language would be fine, that the equivalent
Elixir application results in less downtime/pages than the alternative.
Anything from the perfect app to something with a ton of races/leaks.

Is this a fair question (maybe I'm presuming too much of BEAM/supervisor
pattern, I zero experience with it)?

~~~
rdtsc
> I'd love to see data on the average on-call incidents

Don't have any hard data to compare but having been involved in debugging
running Erlang systems. It's very nice having the ability to restart separate
supervisors while the rest of the processes handle requests. Being able to do
hot code loading to say fix bugs or add extra logging. And my all time
favorite -- live tracing after connecting to a VM's remote shell. You can just
pick any function, args, and process and say "trace these for a few seconds if
a specific condition happens". None of those individually are earth shattering
but taken together they are just so pleasant to use. I wouldn't enjoy going
back to anything didn't those capabilities.

And yes, that restarting of sub-systems (supervision trees) happens
automatically as well. There were a number of cases were it turned a potential
"wake up 4am and fix this now, cause everything crashed" into a "meh, it's
fine until I get to it next week" kind of a problem.

~~~
brightball
Is there a good write up of how to do that somewhere?

~~~
rdtsc
Which part or just in general ops with Erlang?

Overall I would say this book is a good start [https://www.erlang-in-
anger.com/](https://www.erlang-in-anger.com/)

Supervisors are just a general pattern in Erlang. Any book will have something
about it. I like this one:
[https://learnyousomeerlang.com/supervisors](https://learnyousomeerlang.com/supervisors)

Restarting frequency and limits are just one of the parameters you specify. So
don't need to do anything fancy or special there.

Hot code loading might not be as obvious:
[http://erlang.org/doc/reference_manual/code_loading.html](http://erlang.org/doc/reference_manual/code_loading.html)
but is essentially just compiling the module on the same VM version (or close
by, no more than 2 version away), copying it to the server in the same path as
the original. The original could be save to a backup file. The do
`l(modulename)` to load it.

For tracing I recommend
[http://ferd.github.io/recon/](http://ferd.github.io/recon/). Erlang in Anger
book will also have example of tracing.
[http://erlang.org/doc/man/dbg.html](http://erlang.org/doc/man/dbg.html) has
some nice shortcuts too, but be careful using it in production is it doesn't
have any overload protection. So if you accidentally trace all the messages on
all the processes, you might crash your service :-)

~~~
brightball
Tracing is mainly what I was going for. I'm very familiar with the various
patterns and the run time, but I haven't seen the tracing aspects referenced
in as much detail.

Thanks!

------
vasilia
I can handle 120k connections per second with my custom made, highly optimized
multiprocess C++ server. But the main problem is business logic. Just make 2
SQL queries to MySQL on each HTTP request and look at how it will degrade.

~~~
repsilat
There are simple tricks to make those queries not kill performance. Here is a
dumb proof-of-concept I made a few months ago:
[https://github.com/MatthewSteel/carpool](https://github.com/MatthewSteel/carpool)

The general idea is combining queries from different HTTP requests into a
single database query/transaction, amortising the (significant) per-query cost
over those requests. For simple use-cases it doesn't add a whole lot of
complexity, can reduce both load and latency significantly, and doesn't lose
transactional guarantees.

Not 100k/sec writes on my laptop, mind you :-).

~~~
Perseids
Since looking into modern concurrency concepts I've always thought such (in my
opinion obvious) batching should be part of sophisticated ORM frameworks such
as Rails' Active Records. Alas, their design decisions always seem to cater
for making the dumb usages more performant (sometimes automagically, sometimes
adding huge layers of cruft) than rewarding programmers who are willing to
learn a few concepts by creating interfaces with strong contracts with better
safety and performance.

E.g. please give me guidance on how to better structure my database model so
that it doesn't effectively end up as a huge spaghetti heap of global
variables. My personal horror: updating a single database field spurs 20
additional SQL queries creating several new rows in seemingly unrelated
tables. Digging in I find this was due to an after_save hook in the database
model which created an avalanche of other after_save/after_validation hooks to
fire. The worst of it: Asking for how this has come to be I find out that each
step of the way was an elegant solution to some code duplication in the
controller, some forgotten edge case in the UI, some bug in the business
logic. Basically ending up with extremely complex control flows is the
default.

So of course, if your code has next to no isolation, batching up queries
produces incalculable risks.

/rant, sorry.

~~~
repsilat
I agree that with that kind of complexity (or with the belief that that kind
of complexity is inevitable) it isn't a great idea. You lose isolation, and if
you can't predict which rows will be touched you're hosed.

One mitigating factor, this sort of optimisation should be applied to
_frequent_ queries more than _expensive_ queries. In some use-cases the former
kind may be simple ("Is this user logged in?") even if the latter is not.

And on keeping that complexity down: the traditional story has been "normalise
until you only need to update data in one place," but often requirements don't
line up well to foreign-key constraints etc. The newer story can work, though:
"Denormalise until you only have to update in one place, shunt the complexity
to user code, and serialise writes." It's anathema to many, but it is becoming
more common (usually in places that don't use RBDMSs though.)

------
dnekencjfkerf
> What this means, performance-wise, is that measuring requests per second
> gets a lot more attention than connections per second. Usually, the latter
> can be one or two orders of magnitude lower than the former.

does anyone know how does 100k connections compare with other servers?

~~~
Thaxll
It's probably easy to do with Java / C# and Go, they're using a 36 cores
machine to achieve that with fast CPU, meaning that you need 3000conn/sec per
core, very doable with recent frameworks.

~~~
ralusek
Should be possible just fine with NodeJS, so long as it's clustered to run an
instance per core.

The order of magnitude(s) differentiator for server performance really comes
down to whether or not the architecture is blocking or non-blocking.

~~~
holoduke
We run about 20k connections per second with nodejs on a 12 core machine. All
node is doing is parsing cached JSON, modifies it and serve it back to the
client. One server has an uptime of 560days without any memory/performance
issues.

------
holtalanm
im a simple man. i see Elixir, i upvote.

that being said, this article was pretty informative. The bit about the
proposed SO_REUSEPORT socket option was really interesting. Really fun to read
about performance bottleneck detection and improvement.

edit: wow, downvoting for making a simple joke about liking elixir. Cool.

~~~
mrinterweb
I've found that humor in comments on HN is usually not well received. Not sure
why, just an observation.

~~~
dang
[https://news.ycombinator.com/item?id=18817249](https://news.ycombinator.com/item?id=18817249)

Maybe we should add something about this to
[https://news.ycombinator.com/newsfaq.html](https://news.ycombinator.com/newsfaq.html).

------
supermatt
Id like to see memory consumption charts for this. It seems you miss this on
all your posts. Not a criticism (and thank you for what you have done), its
just something I (and others) would like to see, and if you are running the
tests its just another metric to log :D

Also, any update on your previous article?
[https://news.ycombinator.com/item?id=19094233](https://news.ycombinator.com/item?id=19094233)

~~~
kt315_
We are preparing new benchmark test for major platforms. Among other
suggestions it will include memory consumption.

------
Leace
ejabberd [0], XMPP server is written in erlang and powers chat in some of the
biggest MMORPGs [1].

[0]:
[https://github.com/processone/ejabberd](https://github.com/processone/ejabberd)

[1]: [https://xmpp.org/uses/gaming.html](https://xmpp.org/uses/gaming.html)

~~~
77pt77
By used in MMORPGs you mean the chatting component, not the actual game-play
network protocol.

~~~
yawaramin
That's exactly what they said.

------
confounded
Is Elixir/Erlang considered superior to Go for writing high concurrency web
servers?

~~~
rakoo
Not na expert in any of the languages by any means, but Go and Erlanger/Elixir
focus on different things:

\- Go wants to be performant at high concurrency scale

\- Erlang/Elixir wants to keep running at high concurrency scales, whatever
the issues are in your application code. Performance comes second.

There's no clear cut answer to your question; I guess if you trust yourself to
write servers that will hold a large number of connections while doing a lot
of processing then Go has an advantage, otherwise you should probably trust
the man-centuries behind the BEAM VM and follow the various blog
posts/presentations explaining how you can fine-tune your machine to get to
super large scales.

~~~
anthony_doan
> Performance comes second.

I want to state that performance is too generalize here.

BEAM VM also have a goal of low latency which can be consider as performance.
I'm not entirely sure if GO is aiming for that or not. I would never do any
numerical stuff on BEAM though, it's very slow.

This article is a bit dated but is interesting between Go and Erlang:

[https://www.theerlangelist.com/article/reducing_maximum_late...](https://www.theerlangelist.com/article/reducing_maximum_latency)

~~~
rakoo
Very true, thanks for the article. Go also wants to minimize GC duration by
making it per-goroutine and some fancy algorithms to make it as short as
possible, so I'd say it's part of it's goals too.

------
muststopmyths
>Finally, the connections per second rate reaches 99k, with network latency
and available CPU resources contributing to the next bottleneck.

Can someone educate me on what they might talking about here ? CPU is ~45% in
their final graph. I don't know what network latency means in this context
though. Roundtrip for a TCP handshake ? That seems unlikely.

~~~
Qwertystop
The CPU graph peaks near 97% (teal line) at the time when connections-per-
second are highest. Are you looking at the red? That's the version without the
two patches.

~~~
muststopmyths
oh yeah, you're right. I reversed the two in my head somehow.

------
makkesk8
Even if connections per second can be a magnitude or two lower than requests
per second this result is still quite off by today's alternative.

14 core machine comparing .net core with other top webservers:
[https://www.ageofascent.com/2019/02/04/asp-net-core-
saturati...](https://www.ageofascent.com/2019/02/04/asp-net-core-
saturating-10gbe-at-7-million-requests-per-second/)

~~~
benwilson-512
A lot of folks are failing to read the article. They're intentionally holding
each connection open for 1 whole second. This is a whole different ballgame
than benchmarks where each connection is allowed to terminate as rapidly as it
can send back a plain text response.

~~~
muststopmyths
Good point. At first glance, holding the connection open for one second seemed
a bit meaningless if they're touting connections/sec.

But since they are benchmarking Elixir, there is some amount of overhead
involved in that framework's management of connections and requests. If I knew
Erlang/Elixir, that would be a fascinating thing to explore.

Edit: I'm assuming the saturated CPU comes from Elixir and not the OS. It
would be strange for 100k/sec to saturate the TCP stack with 36 cores.

------
dclusin
Would be helpful to know the hardware/instance size they used for these tests.
TFA doesn't explicitly state it.

~~~
zambal
_We used Ubuntu 18.04 with the 4.15.0-1031-aws kernel, with sysctld overrides
seen in our /etc/sysctl.d/10-dummy.conf. We used Erlang 21.2.6-1 on a 36-core
c5.9xlarge instance._

 _To run this test, we used Stressgrid with twenty c5.xlarge generators._

~~~
lstodd
omg.

100K/sec was achieved by yours truly 10 years ago on a contemporary xeon with
nothing but nginx and python2.6 - gevent patched to not copy the stack, just
switch it. (EDIT: and also a FIFO I/O scheduler)

Why does this require 36 cores today??

~~~
rozap
If you read the article, in the third or so paragraph.

> What this means, performance-wise, is that measuring requests per second
> gets a lot more attention than connections per second. Usually, the latter
> can be one or two orders of magnitude lower than the former.
> Correspondingly, benchmarks use long-living connections to simulate multiple
> requests from the same device.

~~~
lstodd
Your point being? I was talking of single-request connections.

~~~
jasonlotito
> I was talking of single-request connection.

Yes. Which is not what's being discussed here.

~~~
lstodd
Yeah, what's being discussed here are connections without any i/o over them.
Just an fd lingering somewhere in an epoll pool. Which obviuosly is even less
taxing. So your point is?

~~~
kierenj
..that you are comparing apples and oranges, like he said

------
fabioyy
Opening a conection and closing after a while is not very good example os
"scalability" ... the kernel does the opening part... reply

------
cutler
Great, I can use this for that blogging app I've been meaning to write and
sleep at night knowing I won't run out of connections.

