I see https://blog.discordapp.com/scaling-elixir-f9b8e1e7c29b and https://blog.discordapp.com/how-discord-handles-push-request... ... Interesting.
We've been able to handle millions of concurrent HTTP(!) connections on a single machine for years; it feels like a pretty solved problem. Although, a lot of that involved userspace TCP stacks and really high-end networking hardware, so if you want to stay within saner territories you can scale that number back a bit.
> Have we really regressed to the point where simply relaying data with reasonable performance is considered impressive?
Sure but it's like saying Facebook is just a silly PHP app to share posts with friends and family, and Tesla is just an electric car those have been around for 100 years.
If you read their page, they do more than just serve static pages to users. It is a distributed systems problem, solving that in a performant and cost-effective way is not as easy.
Their blog (the part GP was referring to anyways), which I'll admit I've only read a portion of, seems to mostly talk about the message-shuffling portion of it though, and a lot of it is just discusses working around their architecture being utterly ridiculous. Once you've figured out where the messages actually need to go though, chucking them out (or the fanout, if you want to call it that) is pretty clearly a trivial operation. And, at least in theory, the routing would only change when a user/node joins/leaves, so the volumes involved there aren't quite as heroic as the message volumes. Handling a few thousand join/leaves per second doesn't sound quite as... scaling, though. I don't think they even bother trying to keep them in perfect order.
Again though, I'm not trying to say that it's not impressive that they got it to work, I just wanted to point out how we seem to have gone backwards / forgotten in terms of handling large volumes of traffic.
E: You're definitely right that HTTP connections are a pretty poor choice of comparison for messaging though.
That becomes an even more difficult problem with deployments that in any other environment would disconnect and reconnect everyone from the nodes in the middle of active conversations.
Elixir/Erlang give you the ability to solve both of those problems in this domain. No central relay point and hot upgrades to live servers without any disruption to the millions of existing connections, in progress messages or in route audio conversations.
Doing all that while being able to dedicate a process to each one of those millions of users that both maintains their state between messages and handles monitoring their connection on reconnect attempts is also non-trivial. This is possible with Elixir and Erlang because those processes cost 0.5kb of RAM and the BEAM ensures responsiveness to all of them in the face of a piece of heavier/runaway code that would otherwise monopolize resources on the machine.
Go is the next closest option at 2kb / RAM per goroutine but Go also doesn't provide any type of ID mechanism for those routines so the closest equivalent that you'd get to be able to send a message from one to the other would be creating a channel for each routine to listen on.
Beyond code, OS level threads start at about 1mb, so the entire architecture has to change in order to even attempt to accomplish the same thing.
Please try again, and stop spouting obviously false facts.