
How Discord handles over a million requests per minute with Elixir’s GenStage - Sikul
https://discord.engineering/how-discord-handles-push-request-bursts-of-over-a-million-per-minute-with-elixirs-genstage-8f899f0221b4#.lb9vf1xt5
======
jtchang
The most important part of this article is the concept of back pressure and
being able to detect it. It's common in a ton of other engineering disciplines
but especially important when designing fault tolerant or load balancing
systems at scale.

Basically it is just some type of feedback so that you don't overload
subsystems. One of the most common failure modes I see in load balanced
systems is when one box goes down the others try to compensate for the
additional load. But there is nothing that tells the system overall "hey there
is less capacity now because we lost a box". So you overwhelm all the other
boxes and then you get this crazy cascade of failures.

~~~
user5994461
Yes. You need to adapt the capacity of the system to handle the full load with
-N- boxes dead.

Corollary: If you have 2 boxes, each of them has to be able to handle all the
traffic, so you can't save money by using smaller boxes :D

Corollary #2: If you have 2 datacenters, each of them has to be able to handle
all the traffic, so you burn a lot of money :D

~~~
rahimnathwani
Instead of 2 boxes that can handle all your load, why not 4 that can each
handle 1/3rd of the load?

~~~
user5994461
Let's say that one box is down (don't care why, boxes die all the time).

You have 3 boxes, they're all running at 100%, or worse if the load is not
perfectly balanced (it never is). You've got bad latency and you're up for
cascading failures. Your system is in bad shape.

Now, supposing you have to deploy your application. Gotta take a server
offline temporarily, the load will be 100% on the last 2 servers left... that
can only take 66%. Booya! You're down :p

That's what happen when people are cheap on servers, in high performance
systems.

~~~
vidarh
Yes, but his point is still valid: You can to an extent reduce the
overprovisioning you need by adding more, smaller units.

In your example, two boxes would have meant downtime no matter the capacity of
each box too. All your example demonstrates is a reason why you might want to
have capacity that allows taking more than one box offline....

Which could be handled with bigger servers, or more smaller servers.

~~~
user5994461
Yes to all of this.

I write this to illustrate that you must have a significant batch of servers
(about 5) in order to achieve costs savings (by using smaller servers) WHILE
not affecting reliability.

Well, that is, if you care about rare cascading failures and performances
issues.

------
jondot
Hate to be a party pooper, but I'd like to give people here a more generic
mental tool to solve this problem.

Ignoring Elixir and Erlang - when you discover you have a backpressure
problem, that is - any kind of throttling - connections or req/sec, you need
to immediately tell yourself "I need a queue", and more importantly "I need a
queue that has a prefetch capabilities". Don't try to build this. Use
something that's already solid.

I've solved this problems 3 years ago, having 5M msg/minute pushed _reliably_
without loss of messages, and each of these messages were checked against a
couple rules for assertion per user (to not bombard users with messages, when
is the best time to push to a a user, etc.), so this adds complexity. Later
approved messages were bundled into groups of a 1000, and passed on to GCM
HTTP (today, Firebase/FCM).

I've used Java and Storm and RabbitMQ to build a scalable, dynamic, streaming
cluster of workers.

You can also do this with Kafka but it'll be less transactional.

After tackling this problem a couple times, I'm completely convinced Discord's
solution is suboptimal. Sorry guys, I love what you do, and this article is a
good nudge for Elixir.

On the second time I've solved this, I've used XMPP. I knew there were risks,
because essentially I'm moving from a stateless protocol to a stateful
protocol. Eventually, it wasn't worth the effort and I kept using the old
system.

~~~
Vishnevskiy
I think you misunderstand the problem we are solving here. We are not trying
to solve this because our system can't handle it. We are protecting it from
when Firebase decides to slowdown in a way that causes data to backup and OOM
the system. Since these are push notifications that have a time bound on
usefulness we don't care about dumping to an external persisted queue like
RabbitMQ or Kafka (we rather deliver newer notifications faster, than wait for
the backed up buffer to flush). Firebase also only allows 1000 concurrent
connections per senderId with 100 inflight pushes (that have not received an
ack) which means that only 100,000 can be inflight. Ultimately if a remote
service is providing backpressure because it is having a struggle no amount of
auto scaling on your end is going to help you.

This service buffers potential pushes for all users being messages, that then
watches the presence system to determine if they are on their desktop or
mobile (this is millions of presence watchers and 10s of millions of buffered
messages), and users are constantly clearing these buffers by reading on the
clients and finally when a user is offline or goes offline we emit their
pushes to them (which is what this article talks about). This service was
evolved from our push system from the game we worked on and when it just did
pushes only and no other logic it could push at 1m/sec in batches, but its
responsibility has changed.

Context matters :)

~~~
metafunctor
Could you not reach pretty much the same result with a queue, though?

For example, workers could discard messages older than some threshold, quickly
emptying the queue if there are expired messages. Clients might not even queue
messages if the queue is currently too long, perhaps even providing a
convenient signal for them to back off from their most chatty behaviour.

Some messages will not be delivered on time if there is significant
backpressure. There is not much you can do about it, apart from avoiding
choking yourself.

Perhaps the queue could work with a LIFO policy, to help at least some
messages go through in time instead of having most messaged delayed near to
the expiration threshold.

~~~
rozap
Erlang _is_ a series of queues.

~~~
jondot
You are hand-waving many properties of purpose-built queue systems.

~~~
rozap
Erlang _is_ a purpose built queue system.

------
coverband
Quick serious question: How does this company plan to make money? They're
surely well funded[1], but what's their end game?

[1] "We've raised over $30,000,000 from top VCs in the valley like Greylock,
Benchmark, and Tencent. In other words, we’ll be around for a while."

~~~
meddlepal
They have an awful lot of information about video gamers in conversation
history. They could mine that data for game companies and sell it as a way to
help companies build better, more addictive and mechanically pleasing games.

~~~
b1naryth1ef
We've adamantly stated many a time that we will never sell users data, or put
ads in the app.

~~~
heroprotagonist
Isn't it a little irresponsible to make that claim while operating on VC
funding with no monetization strategy?

I don't mean to imply that you yourself are being dishonest or that you will
go back on your word at some point. I'm sure you have noble intentions and
want the best for your users. But it might be beyond your control.

If the company gets sold or goes public, you could have drastic changes in
management and their philosophy. You acknowledge this outright in your privacy
page, under the "OUR DISCLOSURE OF YOUR INFORMATION" heading:

[https://discordapp.com/privacy](https://discordapp.com/privacy)

Discord has a policy in place that explicitly says they can do whatever they
want with the information without limit, including selling it or transferring
it without consent. This directly conflicts with employees' public claims
about intentions.

The difference is that one of them is a legal acknowledgement end users must
make before they use the software and the other is a 'feel good' thing to hear
on someone's blog or forum post.

If they just wanted protection against an accidental slippage of data then
that privacy page could be changed substantially. Instead, they pave the road
for the explicit sale of data at a later time.

Or, if they wanted to leave the option open in the future they could say "This
privacy policy is subject to change" and give users an opportunity to opt-out
when it changes without historical data being subject to undisclosed future
use. But this weakens their value in an eventual acquisition by someone who
wants to monetize the data.

As it stands right now, in an eventual acquisition or even just some internal
shifts of philosophy in the organization, all historical data is up for grabs
for any potential use.

Discord really shouldn't have employees state that they will never sell users
data when they explicitly allow and plan for that option. It may not be
intentional dishonesty, but it comes close.

~~~
lightedman
"The difference is that one of them is a legal acknowledgement end users must
make before they use the software and the other is a 'feel good' thing to hear
on someone's blog or forum post."

No, if it's from a known employee, it counts as a legal advertisement.

~~~
lmm
Only if that employee is a VP or above AIUI.

------
poorman
That's awesome and it just goes to show how simple something can be that would
otherwise involve a certain degree of concurrent (and distributed)
programming.

GenStage has a lot of uses at scale. Even more so is going to be GenStage Flow
([https://hexdocs.pm/gen_stage/Experimental.Flow.html](https://hexdocs.pm/gen_stage/Experimental.Flow.html)).
It will be a game changer for a lot of developers.

------
hotdogs
"Obviously a few notifications were dropped. If a few notifications weren’t
dropped, the system may never have recovered, or the Push Collector might have
fallen over."

How many is a few? It looks like the buffer reaches about 50k, does a few mean
literally in the single digits or 100s?

~~~
Sikul
Good question. We don't have metrics on the exact number dropped. We're using
an earlier version of GenStage that doesn't give any information about dropped
events. Once we upgrade we'll have a better idea.

~~~
Matthias247
There's another important question: How will the clients deal with the fact
that they did not get a notification delivered? Will that mean they probably
never receive a chat message? That could in some cases be catastrophic for the
user. Or would it only mean that they may not get something instantly, which
would not be too bad if the client would also poll the server or also try to
catch up on notifications on reconnects.

~~~
estel
When the push notifications hit FCM, Firebase do not guarantee delivery of
those messages to clients (usually iOS or android devices). There are quite a
few reasons that FCM/APNS might fail to deliver a message, so applications
almost never have functionality depend on them.

As you say, you might not get the notification pushed to the device, but you
should still see the message if you open the messaging app as normal.

~~~
jhgg
This is indeed the case. Our real time system is outside of firebase and APNS
and it handles the actual real time updates of chat state once the app is
launched. We also have a delivery system that accounts for network
cuts/switches and the like.

------
erikbern
"requests per minute" is such a useless unit of measurement. Please always
quote request rates per second (i.e. Hz).

Makes me think of the Abraham Simpson quote: "My car gets 40 rods to the
hogshead and that's the way I likes it!"

~~~
hueving
Here's a cool trick I figured out. If you have something measured in units per
minute, you can divide it by 60 to get units per second. I won't even charge
you to use the method even though I'm in the process of patenting it.

~~~
user5994461
Actually. The conversion doesn't work.

The requests per minute number is an average.

The requests per second number should be given for peak load. That is a very
important metric, a system has to be scaled to sustain the peaks, not the
average.

We'd need to know the traffic pattern to know the multiplier, that is
certainly not 60 :p

~~~
hueving
Units per minute gives you a lower bound on units per second. You can't reach
an average of X/min without achieving at least (X/60)/sec.

------
pwf
50k seems like a low bar to start losing messages at. If this was done with
Celery and a decently sized RabbitMQ box, I would expect it to get into the
millions before problems started happening.

~~~
Vishnevskiy
These machines do more than just push. They also buffer messages for each
individual user to "potentially" push if they don't read them on the desktop
client. This happens before the flow this article talks about.

We currently have 3 machines doing this for millions of concurrent users. At
the writing of this article it was 2 machines.

~~~
jsjohnst
What size machines are these? I'm shocked that this volume is your max
handling with Erlang unless your using a smaller T series AWS instance for
this.

~~~
Vishnevskiy
These are n1-standard8 on GCE.

These are getting easily over 30,000 requests a second each about updating
queues for new messages. And also are subscribed to presence events from our
presence system to millions of people. It is a very busy service ensuring we
only deliver messages to people not at their computer.

~~~
jsjohnst
So if they are delivering 30k a second per box and the max "backlog" you allow
to build is ~50k, then you cap your backlog at under 2sec worth of delivery?
Or am I missing something?

------
bpicolo
I love Discord, and love Elixir too, so this is a pretty great post.

Unfortunate that the final bottleneck was an upstream provider, though it's
good that they documented rate limits. I feel like my last attempt to find
documented rate limits for GCM/APNS was fruitless, perhaps Firebase messaging
has improved that?

~~~
chatmasta
It's not the final bottleneck, it's the first constraint. ;)

~~~
bpicolo
Hah, fair. It's always unfortunate when it's hard to address the real
limitation though :)

------
dimino
What is up with Discord? I feel like it's quietly (maybe not so quietly) one
of the bigger startups to come out in the last two years.

It seems to have totally taken over a space that wasn't even clearly defined
before they got there.

~~~
HCIdivision17
It does, doesn't it? I used to use Ventrillo, but then they screwed our small
group out of our server connection. And we happily used Dolby Axon for a
while. We tried Google Hangouts... for a while; until it just really didn't
work well (it just disconnected and crapped out a lot). We tried using the
Steam client's chat, but while ok for screensharing, it wasn't so great for
chat.

But at some point we heard of Discord, which posed itself as a chat/vent
replacement, started using it, and it _just works_. Which is huge, since the
other stuff generally didn't (Axon was actually good).

~~~
Fnoord
Ventrilo is laggy compared to TS and Mumble. It won't show the lag as ms, but
it is there, and it is real. Its due to the way the protocol works, or it is
the server. It is no longer in development, and you can't even run your own
server on your own hardware. The interface is from the 90s. You don't want to
use Ventrilo for gaming in 2016.

TS supports plugins. No lag issues, can run on your own server. Closed source.

Mumble is open source, no lag issues. Interface is slightly less good than TS.
Supports SSL.

Discord, like you say, Just Works (tm). It is very easy to use, the interface
is amazing, its in active development, and setting up a server is free. It
also works in the web browser.

If you're into Blizzard games, Battle.net recently added native VoIP in their
client. The advantage that has as Blizzard gamer is you don't have to install
any 3rd party software.

------
user5994461
I'd like to say that the official performance unit is the "request per
second". And its cousin, the requests per second in peak.

The average per minute only gets to be used because many systems have so
little load that the number per second is negligible.

------
AgentK20
Anyone know of a equivalent libraries like GenStage for other languages?
(Java, NodeJS, etc)

I'd definitely be able to put to use things like flow limiters and queuing and
such, but none of my company's projects use Elixir :(

~~~
bpicolo
ReactiveX seems to have documented notions for it:
[https://github.com/ReactiveX/RxJava/wiki/Backpressure](https://github.com/ReactiveX/RxJava/wiki/Backpressure)

Highly recommend the Reactive series of libs. They're typically very well
done.

The guy below is right that Akka is perfectly suited.

------
mevile
I spend a lot of time in the PCMR Discord, which is pretty lively. The
technology seems to be solid, while the UI has issues (notifications from half
a day ago are really hard to find for example on mobile devices). Otherwise
I'm on Discord every day and love using the service. I miss some slack
features, but the VOIP is very good.

~~~
b1naryth1ef
What features in particular? The most common one we hear is search, which is
actually implemented and undergoing internal testing before a public preview
soon.

~~~
mevile
It's just what I mentioned. I'll get a notification, and I just can't find
where I was notified from. Like on Android, if I click on the notification I
would expect it to take me to where the conversation happened where I was
notified. It would take a really long time of scrolling to try and find the
notification given the volume of discussion that happens. Can I just like
click on something to see all my notifications from android, click on them and
go to the conversation?

------
snambi
million requests per minute, is this a big deal?

~~~
user5994461
16k per second. 83k per second during peak (assuming 80/20 default traffic
rule).

\- 100 /s = typical limit of a standard web application (python/ruby), per
core

\- 1.000 /s = typical limit of an application running on a full system

\- 10.000 /s = typical limit for fast systems (load balancers, DB, haproxy,
redis, tomcat...).

\- Over 10.000/s You gotta scale horizontally because a single box [shouldn't]
can't take it.

The difficulty depends on the architecture and what the application has to do
(dunno, didn't go through the article). You make something that can scale by
just adding more boxes, then it's trivial, just add more boxes. Well, it's
gonna costs money and that's about it.

So no. Not a big deal at all... if you've done that before and you've got the
experience :D

~~~
manigandham
While 1k/sec seems to be an average throughput for most web apps due to all
the logic, 10k/sec is nowhere near the limit for fast systems, many can do
well into 6 figures per second with some now doing millions/sec.

~~~
user5994461
Right. 10k is not a hard limit. It's the standard I expect, for real world
applications, on classic server hardware, with limited tweaking.

The 6 figures benchmarks that send/receive 1 byte data with all unsafe flags
enabled are not representative of real usage.

~~~
manigandham
Your description for fast systems refers to high-performance software like
load balancers and databases. In this case, 10k is nowhere near the limit on
modern machines, they all do 6 figures per second.

~~~
user5994461
With good hardware and good tuning, possibly. Again I'm not saying it's a hard
limit, I'm saying it's a reasonable expectation for real world production
usage ;)

At 100k requests/s on a load balancer. The HTTP headers alone (450 bytes) with
zero content are more than 1300 MBps of traffic. And well, the CPU will
bottleneck long before that amount of requests to parse.

------
manigandham
Akka(.NET) or any actor system is a perfect fit for this and brings the same
functionality to other languages and frameworks.

~~~
brightball
Not exactly. Without running on the BEAM you're left with cooperative
scheduling (handing back control to the scheduler) of processes instead of
pre-scheduling (the scheduler can stop you).

That makes it possible for one processor heavy operation to take over and slow
down everything else. BEAM ensures that if you have millions of request coming
through and suddenly 1000 4 day long operations kick off on the machine, that
the millions of normal, smaller operations continue responding and performing
as expected.

Fairly critical for the stability of real time systems.

The other piece here is that these processes are cheaper on the BEAM than any
other platform in terms of RAM cost.

.5Kb / process on the Erlang VM. A goroutine in Golang is the next closest at
2kb.

The two combined are one of the big reasons why benchmarks don't tell the
whole story with Erlang/Elixir. It's harder to measure consistency in the face
of bad actors.

------
sbov
Is the number of Push Collectors to Pushers constant or can it vary based upon
notification load?

~~~
jhgg
It is constant - but iirc, it'd be trivial to make a dynamically scaling pool.
At the end of the day, a pusher is just a TCP connection. Keeping a pool of
fixed size and planning capacity around scaling horizontally is a perfectly
acceptable approach - given you know the potential throughput for each pusher.

------
rv11
just wondering, what is the difference if I use two kind of [producer,
consumer] message queues (say rabbitmq) instead of this? Does genstage being a
erlang system makes a difference?

~~~
di4na
RabbitMQ is written in erlang. So basically you use it natively instead of
bringing and configurating a big dependency. It just come with your language
for free without needing another process, etc etc.

------
sandGorgon
how does one achieve this in Celery 4? I remember there was a celery "batch"
contrib module that allowed this kind of a batching behavior. But i dont see
that in 4

------
IOT_Apprentice
Why not use Kafka for back pressure?

------
imaginenore
> _" Firebase requires that each XMPP connection has no more than 100 pending
> requests at a time. If you have 100 requests in flight, you must wait for
> Firebase to acknowledge a request before sending another."_

So... get 100 firebase accounts and blast them in parallel.

