Hacker News new | comments | show | ask | jobs | submit login

It's cool seeing an Elixir company have such explosive growth.

Interesting that 2 success stories in this space (the other being WhatsApp) are built on Erlang/OTP.

Elixir is part of their great performance! Their engineering blog is really interesting and full of neat information scaling Elixir (which is really hard to find since Elixir scales for a VERY long time).

They also have open sourced a number of tools that offer tremendous value to anyone scaling Elixir.

Absolutely! For example: https://github.com/discordapp/fastglobal

Have we really regressed to the point where simply relaying data with reasonable performance is considered impressive? Figuring out where to relay everything and keeping it all in sync is obviously hard, but that's a distributed systems problem, not (strictly) a performance problem.

We've been able to handle millions of concurrent HTTP(!) connections on a single machine for years; it feels like a pretty solved problem. Although, a lot of that involved userspace TCP stacks and really high-end networking hardware, so if you want to stay within saner territories you can scale that number back a bit.

> We've been able to handle millions of concurrent HTTP(!) connections on a single machine for years;

> Have we really regressed to the point where simply relaying data with reasonable performance is considered impressive?

Sure but it's like saying Facebook is just a silly PHP app to share posts with friends and family, and Tesla is just an electric car those have been around for 100 years.

If you read their page, they do more than just serve static pages to users. It is a distributed systems problem, solving that in a performant and cost-effective way is not as easy.

I guess what I'm really trying to say is that the performance of the "distributed" part has more to do with the design of the algorithm and less to do with the implementation (i.e Elixir). It's like saying Teslas have really impressive electric motors and entirely ignoring the rest.

Their blog (the part GP was referring to anyways), which I'll admit I've only read a portion of, seems to mostly talk about the message-shuffling portion of it though, and a lot of it is just discusses working around their architecture being utterly ridiculous. Once you've figured out where the messages actually need to go though, chucking them out (or the fanout, if you want to call it that) is pretty clearly a trivial operation. And, at least in theory, the routing would only change when a user/node joins/leaves, so the volumes involved there aren't quite as heroic as the message volumes. Handling a few thousand join/leaves per second doesn't sound quite as... scaling, though. I don't think they even bother trying to keep them in perfect order.

Again though, I'm not trying to say that it's not impressive that they got it to work, I just wanted to point out how we seem to have gone backwards / forgotten in terms of handling large volumes of traffic.

E: You're definitely right that HTTP connections are a pretty poor choice of comparison for messaging though.

The flip side is that scaling the problem requires central relay points or lookups when the natural ability to cluster isn't there.

That becomes an even more difficult problem with deployments that in any other environment would disconnect and reconnect everyone from the nodes in the middle of active conversations.

Elixir/Erlang give you the ability to solve both of those problems in this domain. No central relay point and hot upgrades to live servers without any disruption to the millions of existing connections, in progress messages or in route audio conversations.

Doing all that while being able to dedicate a process to each one of those millions of users that both maintains their state between messages and handles monitoring their connection on reconnect attempts is also non-trivial. This is possible with Elixir and Erlang because those processes cost 0.5kb of RAM and the BEAM ensures responsiveness to all of them in the face of a piece of heavier/runaway code that would otherwise monopolize resources on the machine.

Go is the next closest option at 2kb / RAM per goroutine but Go also doesn't provide any type of ID mechanism for those routines so the closest equivalent that you'd get to be able to send a message from one to the other would be creating a channel for each routine to listen on.

Beyond code, OS level threads start at about 1mb, so the entire architecture has to change in order to even attempt to accomplish the same thing.

Actually, I have to call this out as false - you can't handle millions of HTTP connections on a single machine, because there are only 65,535 available TCP ports. I think you meant that you can handle millions of HTTP connections on a few dozen machines...

Please try again, and stop spouting obviously false facts.

Ports are not used up by connections. One port can support 65K simultaneous connections from a single IP. If a thousand machines connect, each with their own IP, one port can handle 65 million connections. If you decide to accept HTTP requests on all ports, then each port of your 65K ports supports millions of connections or more. Suddenly you're talking about a total of billions or trillions of connections all going to one machine. And that's still only using a thousand machines to connect. Open up to the entire internet and you can push that to quintillions of theoretically possible simultaneous connections.

A port doesn't actually get consumed by a TCP connection. Connections are uniquely identified by a (host, port, host, port) combination; i.e. my machine with IP and yours with can have only 2^32 connections between us (slightly less in practice). Even assuming I have a webserver, so my port is fixed to 443, your machine can connect from each of those 2^16 ports - as can every other of the 2^32 machines on the network. In practice, you're limited by memory, not port numbers. Many machines I know of regularly have connection counts in the hundreds of thousands.

You can. Have many IPs that all "route" to the same machine. You get 65535 ports per IP, not per kernel or whatever. It's the reason why ex. hosting companies can give virtual servers on their dedicated hosts and not worry about port conflicts or anything.

Unless, somehow, every one of those sockets happened to share the same port on said machine, say... port 80. So yes, you need to use the mythological TCP server to pull this one off; it's true that you can't have more than 2^15 outbound connections from the same IP though.

I forgot about that part, but that just makes them even more dear to my heart.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact