Hacker News new | past | comments | ask | show | jobs | submit login
How Discord Scaled Elixir to 5M Concurrent Users (discordapp.com)
802 points by b1naryth1ef on July 11, 2017 | hide | past | favorite | 251 comments

This writeup make me even more convinced of Elixir becoming one of the large players when it comes to hugely scaling applications.

If there is one thing I truly love about Elixir, it is the easiness of getting started, while standing on the shoulders of a giant that is the Erlang VM. You can start by building a simple, not very demanding application with it, yet once you hit a large scale, there is plenty of battle-proven tools to save you massive headaches and costly rewrites.

Still, I feel, that using Elixir is, today, still a large bet. You need to convince your colleagues as much as your bosses / customers to take the risk. But you can rest assured it will not fail you as you need to push it to the next level.

Nothing comes for free, and at the right scale, even the Erlang VM is not a silver bullet and will require your engineering team to invest their talent, time and effort to fine tune it. Yet, once you dig deep enough into it, you'll find plenty of ways to solve your problem at a lower cost as compared to other solutions.

I see a bright future for Elixir, and a breath of fresh air for Erlang. It's such a great time to be alive!

I read it a little differently. The whole article is about how Erlang/Elixir fails at its core reason for existence (fast message passing between distributed processes) and all the complicated work-arounds they had to implement to avoid actually using this core feature of Erlang.

People tend to forget that scalability is not a binary property. You always scale up to some users, up to some architecture, up to some amount of nodes. There is no system that will scale to infinity without requiring developer intervention once business needs and application patterns start to settle in.

Distributed Erlang/Elixir has known limitations. For example, the network is fully meshed, which gives you about 60 to 200 nodes in a cluster. Or don't send large data over distribution, as that delays the other messages, etc. Some of those are easily solvable. For example, you can rely on your orchestration tools to break your clusters in groups. Or you can setup an out of band tcp/udp socket for large data. Others may be more complex.

The important question, however, is how far you can go without having to tweak, and, once you reach those roadblocks, how well you can address them. In many platforms, writing a distributed system is a no-no or, at best, they require you to assemble and tweak from day one. In this case, the ability to start with Erlang/Elixir and tweak as you grow is a feature.

And if you never run into those roadblocks, then you can happily continue running on the default stack. Just look at the many companies using Phoenix PubSub and Phoenix Presence, both distributed, without having to worry about fine-tuning the distribution.

Thanks a lot for the very insightful reply. I'm learning a lot about Erlang from the responses, and a lot of respect for the community as well. Not bad for an on reflection somewhat inflammatory comment :)

This is a great point! What would you say is the best book to really learn OTP?

Designing for Scalability with Erlang/OTP is pretty good so far.


I'd read up on the elixir language itself from the official website, then look into the "Little Elixir and OTP Guidebook"


Shut up and take my money! I'm sold. Thank you for this explanation.

What difference does it make? Erlang/OTP distribution doesn't have pluggable architecture. Sooner or later you will reach a point that you have to modify it. Then you are diverging from the original branch which makes it even more difficult to maintain it. You have to merge your additions into every release (minor or major) and test it thoroughly.

A better architecture for a distributed system has a strong composability property. It should be possible to modify every possible aspect of it on a running cluster without introducing downtime.

Write your own standard well documented distribution layer and become independent of underlying technologies.

What do you mean by architecture here? If you mean the roles different nodes take and their topology, I actually doubt you can be decoupled from your architecture in a distributed system because they directly affect your design and capabilities.

You can move away from fully meshed for topologies but how does this choice affect node ups and node downs? Rebalancing can affect how you store data in the cluster.

How many connections should you have between nodes? A single connection makes the ordering guarantee straight-forward. Multiple connections is more performance but requires care if you need ordering and is more efficiently done along side your application.

And what about process placement? On which side of CAP do you want your registries to sit?

If you and your team is capable of "writing your own standard well documented distributed layer" upfront, then you are in a better position than most to take those decisions. But writing a distributed system is hard, so I will gladly start with a well-designed system, especially at the beginning of the project, when it may be unclear which patterns I will need as my application and business grow.

And most times, it will be good enough.

As far as OTP goes, you can plug your own discovery/topology mechanism as well as your own module for handling the connection between nodes. But, as mentioned in my previous reply, some of those issues may be better solved on the side, e.g. a different tcp/udp connection for data transmitting.

I did also mention that for fast time to market it is indeed a good idea to use already available tools. By the way writing a distribution layer is not that difficult. Many companies/startups with scalable back-ends that are not using Erlang/OTP have already done it. The point is scalable software requires knowledge and Erlang/OTP is not going to magically solve it. But it seems many fans are trying to promote it such that it is a magic tool that is going to make a shit software become a hyper scalable one. Just look at the comments below that how people have gone crazy.

Erlang (and by extension Elixir) definitely provides a set of tools which are good for building highly concurrent distributed systems. And your systems are likely to have few errors as well as being resilient, if you know what you are doing.

But indeed, you need to know your tools like every other part of computer science. Storing 9 billion elements in an array, doing linear search and then complaining your linear search is too slow and needs to run faster will be disastrous to your architecture. Likewise, assuming communication is free is equally disastrous.

The problem with building a distribution layer in another language is the effort involved. Most of the companies which pull that off and get stability in addition are usually large, multinational, and has ample amounts of engineering resources to throw at the effort required.

Consider for some reason (either technical or political), this company decides to migrate their web socket servers to Akka/Rust/Go/NodeJS. Integrating these new servers into their core cluster is going to be deadly painful. They rely heavily on Erlang/OTP internal clustering. This is not even considered as a challenge in distributed systems but still really painful to implement because of their design decision. This is what they did wrong. Clustering and message routing part of the application should be technology agnostic.

It isn't that hard. I have code which lets an OCaml program speak the Erlang distribution protocol. It can more or less stand-in for an Erlang node in the distribution layer. Writing the same for another language shouldn't be that hard: the protocol spoken is a simple prefix-protocol which can be parsed with LL(1) from the head plus an atom cache that is easy to maintain.

That said, good Erlang architectures definitely knows when you should fan out and use a common layer for distribution. At $work we have a Kafka cluster for some event handling simply because it is the right tool for the job, and because it makes it easier to interface other programs on top of it.

In my experience, the distribution features are best used as orchestration however. Your distribution layer is Erlang, and then you use other languages for "leaves" in the architecture. If the right library is present in e.g., Go, then by all means use it. Erlang easily speaks protobuf for instance.

That depends what you're going for. Moving web sockets elsewhere are pretty simple regardless of whether you're using Erlang/OTP or not because it's a frontal layer. If you do that all you are doing is creating another layer to hold connections that needs to relay to the rest of the application. The application behind it still has the responsibility to receive and relay everything from those web sockets, determine what message goes to who, etc.

You can move it out just the same as you can elsewhere, you just add an unnecessary layer of complexity and lose a lot of the capabilities already built into the language. It boils down to a simple question of, what would be the purpose behind that decision? What improvement would you get from deciding to move any of that elsewhere? Do you gain speed at the cost of complexity?

Clustering is a non-trivial problem. It can't happen naturally in any language that includes mutable data without explicit oversight and thus most of the needed tooling can't be built into the system and guaranteed to work everywhere. As soon as some part of the system goes from "send message, get response" to "send message, change variable referenced in memory on this machine, get response" you break your ability to naturally cluster.

That leads to a dependency in central relay points for things like setting up web sockets and then having code behind them communicate. The code behind them in whatever language is usually going to be talking to other central relay points like load balancers or pub sub databases rather than directly to other specific servers.

As soon as you have central relay points, they become something else that you have to monitor and scale even if they are very high volume.

There's nothing wrong with this approach and it will certainly work but if you want to avoid that and build a distributed system in another language, you've got to invest a lot in other areas.

> Moving web sockets elsewhere are pretty simple regardless of whether you're using Erlang/OTP or not because it's a frontal layer.

It is supposed to be easy. But they are not using language agnostic messaging. Instead they rely on Erlang distribution protocol. This is what makes it difficult to integrate.

> what would be the purpose behind that decision?

1. To achieve composability. I can replace one web socket server with a C++ implementation to see if I can handle 10 million connections per server. If I fucked it up return back to Erlang implementation otherwise fuck BEAM. I'm moving to native code. The beauty of it is that I can do it one step at a time. Even I can implement web socket layer in multiple languages and experiment all possible options in production and no one even notices it.

2. I don't have to pay for 20 Erlang guys who are mostly senior devs. Get 5 of them to write the core and the rest of the team NodeJS guys.

You're acting like the distribution and serialization formats of Erlang are not open or documented.

If you're going for your C++ layer, try https://github.com/saleyn/eixx for example.


you're talking about undoing an architecture that lifts the benefits of an entire platform having done the work for you, in order to pick one that is probably easier to migrate to in the future.

You prefer doing all the work mostly from scratch, upfront and straight away, instead of having to maybe adapt a library later if you actually need to migrate.

I think your cost analysis is off.

I think,

1) you are misunderstanding the grand parents points. Better to have primitives that scale you to X rather than having to get to X on your own.

2) The grand parent wrote the Elixir language.

3) Making sweeping statements like "A better architecture for a distributed system has a strong composability property" is very easy.

> 1) Better to have primitives that scale you to X rather than having to get to X on your own.

For a fast time to market, I do agree with you. But if you are well established company in the market, then in most cases "do it yourself" is the best idea. Unless you have the money to call for experts to fix the problem for you.

> 2) The grand parent wrote the Elixir language.

I didn't know him, nor I care.

> 3) Making sweeping like "A better architecture for a distributed system has a strong composability property" is very easy.

I'm considering this as an excuse, than a technical argument.

> But if you are well established company in the market, then in most cases "do it yourself" is the best idea. Unless you have the money to call for experts to fix the problem for you.

This requires having distributed system experts on your team or having the money to call them from day one, before you even reach the market and before you are even sure you will have custom needs. If you have the need, the expertize and the time, then surely. But writing a distributed system from scratch should certainly not be the first choice (IMO).

> A better architecture for a distributed system has a strong composability property.

Sure, you want a beautiful lover, but is your object of desire within your reach? One needs to be practical when playing engineer.

Let's try this: Composibility is a highly desired property for distributed systems however correctly scaling composible semantics is a hard problem.

This sort of duality props up all over the place in general engineering, and certainly in software systems engineering.

Good luck writing something on your own that scales better.

Erlang's core reason for existence is to control telephone switches, which had two independent general purpose computers connected to the physical switch. So reliability, redundancy, recovery, and fault isolation were the core needs; that drove the design for isolated processes with message passing between them. Because Erlang was in the control plane, and only managing the signal path, not passing the signals itself, there wasn't a big need for speed, as long as it wasn't too slow

Fast forward several years, and isolated processes turns out to be a great fit for large SMP systems, and Erlang/beam is now doing signal path work in a lot of places. Erlang tends not to put explicit limits, but some techniques are going to fail at large scale; ex: if you have 50,000 processes across many nodes, sending the same message to each of those processes is going to be slow; sending one message to each node and fanning out from there is going to be faster; in no small part because you've reduced the network bandwidth you're using.

The nice thing when you hit Erlang scaling limits is that almost everything you need to fix is going to be in a pretty simple state. You're not going to find many things that are layers of optimizations on top of hacks on top of optimizations --- they do a good job of keeping things simple, and not optimizing until it's needed (and even then, they usually pick simple optimizations). Keeping things simple goes a really long way (especially with today's enormous servers).

Edited to add: I don't think they've even needed to tweak the vm yet either, just their user space code. That's pretty huge too.

That, indeed. When I compare Elixir/Erlang to some other systems I worked on, "shallow" is the word that pops up. You hit a limitation, you dig into some source code, and you find out that it's pretty simple to understand and to fix it. It feels manageable, I've yet have to run into frustrating roadblocks, and that all gives me the confidence that when I do need to scale up, I have a system I will understand and will be able to adapt. It looks like Discord's story confirms that.

It sounds like the main benefit to Elixir is that message handling is built into the language. How does that compare to using a message queue service like zeromq?

As macintux, said they don't really compare. Messaging is everywhere in Erlang, in a way that nobody would do with a message queue. For example, you don't read or write to a tcp socket; you receive and send messages to a 'port'. The same is true for file i/o. Rather than calling a method on a shared object, you generally would send a message to a process that owns the state (or a process that manages the state in a database).

Sending messages to processes on other nodes has the same syntax as sending to a process on your node, which makes it easy to run a distributed system. (Ports are different, you'd have to setup a proxy process on the remote node in order to send/receive from that).

Of course, with the base of process to process messaging you can build a higher level messaging queue (see RabbitMQ for a popular message queue built in erlang).

Messaging is implicit in everything Erlang & Elixir do. Bolting on a message queue to software written in another language isn't really comparable (not a value judgement, it's just not really useful to compare them).

>I don't think they've even needed to tweak the vm yet either, just their user space code.

We haven't really had to. Really only args we use are "+sbt db +zdbbl 32000 +K true" and increasing the default process limit.

I have never seen a language or framework used at a large scale that did not require digging, understanding, and tweaking of the underlying machinery for its specific use case.

The true failure of the platform happens when you cannot do these tweaks and adaptation, or that their cost is shadowed by having written it in a more appropriate technology at a lower time/cost/effort budget.

That's a good point. The article is certainly a good summary of what's needed to make Erlang/Elixir scale and a reminder that there are no "magic bullets".

The solutions they presented were all Elixir + Earlang. It just took some rewrites to get there.

In truly amazing open source fashion, they also made some libraries for other companies to leverage!!! Super big props to Discord for that one. Seriously can't thank their team enough for going above and beyond.

I'm an erlang fan but I wouldn't claim message passing is "fast" in the sense that you might be thinking of (though copying data does have some GC and CPU cache performance benefits that are harder to reason about.) There's no magic in Erlang; copying data is copying data. The benefits of Erlang lie elsewhere, and I've heard Joe Armstrong, when asked about BEAM performance, say something along the lines of "why do you care about performance [in this day and age]?"

With OTP20, copying data isn't always necessary anymore :)

Copying data is usually necessary in OTP 20 as well I'm afraid. That new optimization doesn't trigger for most of this. But binary data is not copied and hasn't been since at least OTP11 :)

The release says "Erlang literals". Wouldn't that be things like atoms, integers, booleans, tuples, and the like? That plus binary data should cover a good deal, unless I'm reading too much into the blurb on the release notes.

Under what circumstances does it get triggered (since you seem to be more knowledgable about this than I am!)? I expect records would not fall under this.

A "literal" in this case is a constant value defined in a module. Those live in a separate space in the VM and are referenced directly because they are immutable and can be shared. If you sent such a literal before OTP20, it would be copied into the heap of the target process. Not anymore.

But it doesn't help with cases where you are constructing a term (dynamically) in a process and sending that term. There is more meat in the blog post of mine: https://medium.com/@jlouis666/an-erlang-otp-20-0-optimizatio...

I read it as "Elixir was a good choice, but it isn't a silver bullet." The fact that they spend most of the article talking about the gotcha's makes their statement about going with Elixir given the opportunity to repeat more favorable in my mind.

For a scalable 5M connected users system no language is a silver bullet. Some tools are just better suites for some jobs. Elixir/Erlang OTP just happen to be suited very well for these kind of jobs.

I don't think that's what Erlang was made for at all?

Erlang was never about speed, it's about reliability, availability, and ease of concurrency. And Elixir just makes it easier to access those tools.

It may well be a misconception on my part - I assumed that as such message passing is such a core feature of using Erlang for distributed systems one should be able to treat it as a low-cost operation.

You can generally, just Discord is getting to a level of scale where the issue has more to do with architecture than language. All of their solutions in this post were Elixir-based solutions, and have very clean, easy APIs(and so far not very much code- all those libraries that solved their problems came out to ~400 LOC).

I guess the message here is, "There's never a magic bullet, but you've got everything you need to make yours."

Me too. I mean, 5 millions is not that many users. It requires work, but it's not Google size by any mean. It's just a successful service.

Give that the whole selling point of Erlang/Elixir is scalability at the price of the rest, the article is really telling me to avoid the tech.

5 million concurrent users, connected at the same time, sending and receiving data through a persistent connection. Phoenix itself got 2 million on a single node: http://www.phoenixframework.org/blog/the-road-to-2-million-w...

They likely have much more than 5 million users.

To put this amount in perspective, you get 5 million connections after receiving 3000 connections per second, which are never dropped and remain connected, for ~28 minutes. The majority of websites do not get even 300 requests per second.

It's 5M Concurrent Users. And Google-size services can't be built with just a language. It needs much more work to do.

> I see a bright future for Elixir, and a breath of fresh air for Erlang.

Is Erlang really that much of a barrier? Erlang was a touch odd, but I didn't find the language itself that mind-bending. Wrapping my head around the proper way to structure things and the proper use of the OTP libraries was much more time-consuming. That level of architectural thought doesn't magically go away because you changed the language.

About the only thing I found irritating was a lack of language support for mutable hash tables, but I thought they fixed that at some point.

And Erlang's bit packing/unpacking borders on magical in terms of expressiveness and speed. I wish other languages would adopt it.

It became open source in the 90s. It's been used by several high-traffic sites/apps to support chat clients (Riot Games, WhatsApp, and I think Fetlife), but outside of that has seen very little adoption.

I think it's a lot harder to sell people on a language that looks syntactically unintelligible to a lot of web developers. Most people I know working with Elixir now are/were Rubyists, so it's easy for them to make the leap from that and start to understand all of the awesome things they get from BEAM and OTP.

Elixir/Erlang is great, but in addition to knowing the language, there is a huge library of tools within OTP/ERTS that you need to understand to effectively use it and skills learned from other languages don't map too well to Erlang/Elixir as they would to other languages (like going from Python to Go for example). In addition to learning OTP, you also have to learn how a functional language works if you're not familiar. A lot of things to learn to adopt Elixir/Erlang, but it's well worth it! I just don't think there's much reason for people to make that leap as there are much more familiar languages that can be used to solve the problems most developers are trying to solve.

I mean, personally I'm using Elixir more for the language itself than OTP. I have a basic knowledge of OTP now after using the language for 2 years, but FP was my main reason for coming and staying, which for me has been the real fun of it.

Then again, FP for me was not difficult at all, and actually made several of the issues I had with my Ruby code not being expressive/obvious enough, or difficulty in composition, disappear overnight.

> and actually made several of the issues I had with my Ruby code not being expressive/obvious enough, or difficulty in composition, disappear overnight.

I've noticed that pattern-matching and guards just completely eliminate a stupid* amount of boilerplate logic in Ruby code

* "stupid" in hindsight, of course

I fell in love with the language, which is why I kept using it to begin with - working in a functional language is great, and it's very readable and powerful (pattern matching <3).

With that said, I could write basically anything Elixir gives me with Ruby. The real value I get from elixir is the blazingly fast performance of BEAM and the speed at which I can build complex and stable distributed systems with OTP.

The value to me (other than OTP + friends) is not what Elixir gives you, but what it takes away. Not being able to monkey patch things and change mutable state is really freeing. You can do it if you're diligent, of course (I code Java with as many static methods and final, immutable fields as I can and what I know about Ruby, it has things like freeze), but it's hard to never reach for a tool "just this once" if you're in a time crunch.

Syntax is a funny thing. It seems that some developers (perhaps most) care about it (disclaimer: myself included) but there's a large number I've encountered who do not. It seems to be a subjective preferential thing, because most rational discussions I've attempted to have about it result in circling around a drain and coming to no conclusion. I'd say that they're simply blind to it (similar to how a colorblind person can't tell when their clothing color choices just "look wrong"), but they could (perhaps justifiably) argue that they're simply agnostic to it. It's just an aspect of a language that they don't care about and which doesn't influence whether they wish to work with the language or not.

... I'm not like that. I took one look at Elixir and said "HOLY CRAP IT'S A FUNCTIONAL SCALEABLE RUBY!" (of course that was merely scratching the surface, but...)

I tried translating some of the book "The Handbook of Neuroevolution Through Erlang" into LFE (Lisp Flavored Erlang), and Elxir (somebody's done this already I believe).

Robert Virding, a co-creator of Erlang, creator of LFE, also wrote Luerl - Lua implemented in Erlang.

So if syntax is important you have many choices to use the BEAM VM/OTP in a Ruby-like language (Elixr), a Lisp (LFE) or in Lua (Luerl)!

I am now looking at Robert's videos of a game whose logic is written Lua, but it runs an Erlang process for each ship of thousands he spawns in the demo. Pretty cool. There's even Torchcraft - Luerl and Torch ML for learning AI/ML in Starcraft.

I still prefer LFE because I like Lisp syntax, biased as yourself about such things, however, I still think Elixir macros are not as far reaching as LFE's since they are handled after parsing in Elixir vs. LFE [1], although I wouldn't mind being proved incorrect on this.

  [1]  https://groups.google.com/forum/#!topic/lisp-flavoured-erlang/ensAkzcEDQQ

How far did you get with the "Neuroevolution Through Erlang" book? I happen to own it (there must be like 10 of us!) but haven't gotten around to it yet...

I am more than a third of the way through. I got distracted, and need to pick it up again. I like the writing style, and that if focuses on TWEANNs and practical examples rather than broad ML. I like the BEAM/OTP but Erlang syntax rubs me the wrong way. I've tried Elixir and LFE, and now I might try Luerl. Not enough time in the day...

For note, luerl is more of a library for erlang/elixir/BEAM and not a full language, as it is basically just lua's interpreter rewritten in erlang with some modifications to make it fit the system a bit better than normal Lua (there is normal Lua available via a Port though).

Syntax (and to some extent, expressiveness/conciseness) is superficial. How the code feels under change matters more.

> Syntax (and to some extent, expressiveness/conciseness) is superficial

I think this kind of attitude is unhelpful at best, and alienating at worst. There are a ton of "serious" programmers, myself included, who care quite deeply about the enjoyableness of the tools we use, and aesthetic/expressive qualities absolutely come into that.

I believe the recent popularity of elixir kind of proves the case. There are many improvements to the package managers & tooling, etc, but the most obvious is to the syntax - which transforms what previously seemed unapproachable into a genuine option. Dismissing any and all such interest as merely "superficial" seems uncharitable, to say the least.

I feel they both matter (and more specifically I don't think a well-thought-out syntax is merely superficial), but as this is just a feeling with an anecdotal datapoint of quantity 1, and there being a dearth of evidence for or against "good" syntax (highly subjective, of course)... we're probably at an impasse lol

The problem is not and has never been the syntax. It is the tooling. No build tool. No way to generate a new project. Do it yourself docs. No templating system. Macros ala C. No package manager. Building releases was a dark art.

It is now getting better. But it was a really steep curve to adoption.

> The problem is not and has never been the syntax

Call me unserious or shallow or whatever, but for me - it was a tiny bit about the syntax. More than a tiny bit, actually. The other things you mention, all valid, were just additional barriers to something already fairly aesthetically unpleasing.

It's improved and improving rapidly, as you say, which is really great to see.

Aside from macros, pretty much all of these have been covered by build tools for a few years now.

I know but iirc your own talk on tooling "we had to steal elixir package management to get one"

Huh? rebar, and now rebar3, have existed for a long while. rebar 2.0 was released in 2012.

> Is Erlang really that much of a barrier?

It's not. Not for me at least. I prefer Erlang. Maybe I am strange like that.

I found initially the core concepts are hard - that is using using processes for concurrency, functional (immutable) data structures, functional patterns like recursion instead of for loops, the library ecosystems those are harder things. Those are the same in Elixir as well.

Erlang the language itself also simple. Think a bit like C and C++ if you're familiar. Erlang is like C, the language spec is small. Elixir has additional features which make it more expressive but also more complicated (macros, pipes). It's a bit like C++ having templates and classes. You can do more and with them, but it's a also a bit more to learn. I am exaggerating as Elixir is a lot more elegant and consistent than C++, I am just using the analogy to illustrate the idea of simplicity vs power.

Unfortunately, a lot of people appear to hate Erlang's syntax to the point that it's a barrier to adoption - or Elixir never would've picked up at all. I don't understand it myself, but there we have it.

I've seen this argument about the syntax countless times and during my first looks at Erlang code I couldn't understand much of it, but after sitting down and deciding to learn it it's an extremely simple syntax. The hard/long thing to fully understand is truly OTP.

I have never had an easier job writing an MPEG-TS parser than when I used Erlang. It was incredibly easy and concise.

The challenge is going to be adoption, plain and simple. I worked two years on an Erlang project, and most of our frustration came from the lack of community to lean on when we needed help understanding best practices. Also, when you find only one library to solve your problem that has been abandoned for at least two years that is never fun.

Few projects actually need the level of scalability that Erlang brings, and the cost of building and running an Erlang project as an enterprise application is non-trivial. People need to understand this if they want Elixir to succeed where Erlang hasn't.

The key benefit of Erlang isn't the scalability in my experience. It is resilience. Once your system is deployed, it tend to run with 0 fatalities for months on end. We have systems where we've had 0 maintenance days on them for months and even years.

In turn, you can focus your development effort on new systems, new features, and scaling existing infrastructure since you are not bogged down in maintenance mode. Even better, you can often postpone errors in a system which are non-fatal. I often wait until I have a couple of bugs in a system and then I'll work on that system for a few days fixing them all. In systems where even the smallest amount of brittleness destroys them, this isn't really an option.

Another experience of mine is that Erlang systems tend to take a bit longer to write compared to cranking out a fast solution in another language. But the payoff is that your Erlang solution tend to have better robustness as load increases and people start using the system.

The community is small however, so you'll need to write more code yourself in-house as a result. For some problem spaces, this will hurt a lot because there are off-the-shelf solutions in other languages. On the other hand, larger projects which are more specialized can benefit from you knowing your own code better.

For anyone reading this, I worked in a shop where we built an Erlang system, choosing the language specifically for the high uptimes we needed. We nailed it so well that IT proactively reboots the system just because they don't trust that it's still working properly when it runs without issue for months on end. Done right, Erlang can be insanely resilient.

I agree with what you said. What I was really trying to get across was the need for a real community and wider adoption.

We're an Elixir shop and I wouldn't change it for the world. We're betting on it heavily.

I think the next big push in elixir is tooling around metrics. Once there are hard numbers to market to non-techies I think we'll see a large shift towards elixir.

I've recently launched https://pryin.io, an application performance monitoring tool made for Elixir and Phoenix. It hooks into Phoenix and gives you insights into how long your request / channels take and what Ecto queries are run / how long those take. You can also manually augment pretty much anything else (background jobs, API calls, ...). Plus it keeps track of some important BEAM metrics like memory consumption.

Looks cool. Might want to have a demo site running for people to poke around.

Just look for Erlang tools, there are loads available. The only problem with Elixir right now is sometimes people forget to look at Erlang for already solved problems/tooling/etc.

Erlang/BEAM has been alive and well in some very serious mission critical applications for 20 years. Just look at how telecoms used it (its origin).

We use Elixir at work, and Erlang tools have a bit of a discovery problem. You can peek at download counts on Hex, if the author uses it, but that doesn't take account CI systems and such that inflate DL counts, or GitHub only repos that are not on Hex (see things like e.g. shell history prior to OTP20).

That said, when there is no Elixir tool to do what I want, the very next thing I do is look for an Erlang tool, and there is often one that gets me most of the way to where I need to go. Wrapper libs are nice sugar, but they're almost always unnecessary when you can just call Erlang directly (String v charlist conversions notwithstanding!).

Download count is a weak metric for evaluating the quality of a library. I don't think it's a good idea to skip the few minutes it takes to read parts of the documentation and critical parts of the code.

I agree completely. I never said that it should be used for evaluation or that you shouldn't read the code or docs.

But then the question is... how do you find high quality libraries without something like download count? The issue is signaling good libraries, with poor discovery. I subscribe to many "libraries worth looking at" mailing lists, the awesome lists, etc. but that is still a pretty coarse net. Searching for libraries on GitHub if you don't know the name already or don't really know what you're specifically looking for is an even worse experience.

Combine that with relatively poor fuzzy search on Hex, where even if you DO know what you're looking for, it leads to a very subpar experience. For example, "e-commerce" leads to different results than "commerce". [0][1] These then exclude [2], which is an ecommerce library available on Hex without that keyword.

[0] https://hex.pm/packages?_utf8=%E2%9C%93&search=ecommerce&sor...

[1] https://hex.pm/packages?search=commerce&sort=downloads

[2] https://hex.pm/packages/ryal_core

Search on hex.pm is not great and can definitely be improved, but we are currently at the limit of postgres full text search afaik, unless there are some magic things to tweak. This means we need to switch to other technologies to improve the search which is a bigger task.

But regardless of how good search is it wouldn't find ryal_core when searching for "commerce" if that word or any similar is nowhere in the metadata. The description for the package ryal_core is "The core of Ryal." which doesn't do the search any favours.

Absolutely, and I'm thankful for all the hard work you all put in. I sound more annoyed than I am, and I know it was worse years ago before hex came around. Why can't Hex mind read yet?! :P

I think indexing GH READMEs would go a long way to helping the discovery problem, perhaps utilizing the new GH tags thing in some capacity, even letting authors tag their own packages. That would help in this specific case though this is of course a pathological example I happened to find recently. Maybe also having a "people who have X as a dependency also tend to download Y", though I'm not sure if you have that information outside of direct dependency graphs on Hex packages.

I feel like many newer languages have this problem, but the weird thing about Elixir is the 20-30 years of Erlang libs that are out there that are great that people have issues finding. It's worse if the libs aren't on Hex, and anecdotally I think people have been moving to Hex more lately for Erlang libs.

I don't think that's the problem for metrics. Exometer is the go-to Erlang library, but its dependencies are a nightmare (e.g. https://github.com/Feuerlabs/exometer/issues/154 or https://github.com/Feuerlabs/exometer#dependency-management), it's essentially unmaintained, and integrating it with Ecto or Phoenix is roll-your-own. I think there is very much a need for a more modern library that plays nicely with Hex.

> I think there is very much a need for a more modern library [...]

Building a solid standalone monitoring/metrics package is apparently next up on the "todo list" for Chris McCord (creator of Phoenix): https://www.youtube.com/watch?v=pfFpIjFOL-I

We've had success with Elixometer, though the dependency pinning was a pain. All but two are fully on Hex now.


Thanks, I might have to use this soon :)

Truly the strenght of elixir comes for Erlang. I wonder if Scala is also as good as erlang for creating robust and fail proof distributed systems?

No, because the JVM currently lacks features to truly isolate threads of execution, which is important for both performance and reliability. AFAIK, the JVM folk aren't even talking about addressing this, which is disappointing.

Scala paired with Akka is definitely good for that, I don't know how it compares performance wise with Elixir, would be interested to know. We're building a game server backend with lots of messaging in Akka.

Scala/Akka is a lot faster than Erlang, but it also lacks per-actor GC, which means that latency is going to be higher in a Scala/Akka system. Also, the per-actor heaps in Beam means there's no risk of two Erlang actors having a reference to the same memory, whereas that's easy to do in Scala/Akka.

Actually you'd be surprised at just how fast the BEAM passes messages, especially as it also includes linking, monitoring, cross-process, cross-system, among other features. When creating a multi-thread atomic messaging system in C++ long ago I was able to out-perform the BEAM (though I'd not rank it as significantly out-perform), but once I added failure handling features among others then it never got close to BEAM's speed again in raw message passing.

However, the BEAM is not very fast on executing actual code, it is like Python in that way where it is good to slave out CPU-heavy work; the BEAM is built for async IO, and yes, running a JVM (or C++) system as a node on a BEAM mesh is fantastic for that (or a port, or NIF for small work).

The point of Erlang is that it makes writing robust distributed systems easy and fast. The language itself is tiny.

Scala on the other hand is not remotely small and easy to understand.

It convinces me that with a selection of good devs you can hugely scale anything.

I'm continually impressed with Discord and their technical blogs contribute to my respect for them. I use it in both my personal life (I run a small server for online friends, plus large game centric servers) and my professional life (instead of Slack). It's a delight to use, the voice chat is extremely high quality, text chat is fast and searchable, and notifications actually work. Discord has become the de facto place for many gaming communities to organize which is a big deal considering how discriminating and exacting PC gamers can be.

My only concern is their long term viability and I don't just mean money wise. I'm concerned they'll have to sacrifice the user experience to either achieve sustainability or consent to a buyout by a larger company that only wants the users and brand. I hope I'm wrong, and I bought a year of Nitro to do my part.

A closed source walled garden chat service that survives purely on the free flow of VC capital 100% has no long term viability. No federation means as soon as the "next thing" shows up and wins way the VC dollars they'll disappear. It's so incredibly frusterating that so many companies and users make/support these closed environments even as we enter a new golden age of open sourced and federated technologies.

If there was an open source and federated equivalent to the features Discord provides I'd use it. There is no such product. Matrix is interesting, but the experience is no where near as polished as Discord and friends and that matters for mass adoption.

>If there was an open source and federated equivalent to the features Discord provides I'd use it.

But will you pay for Discord though? The features and quality Discord is able to provide are artificially propped up by VC funding. When it runs dry, we will be left with open source offerings, or the next product to take it's place and repeat the cycle.

I already pay $5/mo for Discord. Animated avatars and cross-server custom emoji were enough to entice me

I would if they made the corporate / generic edition ;) It's better than slack

We are working as hard as we can on polishing Matrix. Making it mass user friendly is simply our #1 priority. Folks can help directly (beyond PRs, bug reports and feature requests) by donating at https://patreon.com/matrixdotorg.

Maybe Matrix can learn from Discord's technology choices? :)

They could side step to corporate slack on-prem chat style solutions and be a pretty strong contender. They were scaling huge chat rooms and such way better than slack was a year ago.

But they seem to refuse suggestions to do so continuously, so they seem to have some business plan somewhere.

I discussed moving my companies chat to discord because its a better experience than Slack in every way. They wouldn't go for it because it's marketing is so gamer oriented and wouldn't look professional when we invite clients onboard :(

I have the same problem. It would be great if they released the same product without the gaming brand. I would use that for all my personal and professional chats.

Slack has great UI but the app just feels sluggish compared to Discord, esp initial load times.

The real concern should be an acquisition by a large tech company. Reminds me of Microsoft buying Skype. A lot of enterprise companies could be interested in selling discord to their corporate clients...

I know that the JVM is a modern marvel of software engineering, so I'm always surprised when my Erlang apps consume less than 10MB of RAM, start up nearly instantaneously, respond to HTTP requests in less than 10ms and run forever, while my Java apps take 2 minutes to start up, have several hundred millisecond HTTP response latency and horde memory. Granted, it's more an issue with Spring than with Java, and Parallel Universe's Quasar is basically OTP for Java, so I know logically that Java is basically a superset of Erlang at this point, but perhaps there's an element of "less is more" going on here.

Also, we're looking for Erlang folks with payments experience.


> Java is basically a superset of Erlang at this point

It's not a superset until it has a non-sharable memory heaps between threads, complete and easy hot code reloading, dynamic tracing (being able to log into a node and update code at will as it the application is running).

The safety and fault tolerance is the #1 advantage Erlang has and that it's hard to get with other frameworks that claim to be Erlang-like. Almost all of them focus on "We have a thread and a queue and we send messages between them so we have 90% of Erlang but faster". Sure they can spawn OS processes to get the same effect or even whole servers but it's more awkward and can only be done so many times before memory or CPU resources are exhausted.

Oh you can think of it another way. There is no point in having a distributed system with 5M concurrently connected users if it crashes 2 or 3 times per day and it has to be restarted and all those users lose their connections. So as the system gets more distributed and more scalable the fault tolerance aspect starts to move to the front alongside speed and performance. And that's just where Erlang starts to shine so to speak.

> It's not a superset until it has a non-sharable memory heaps between threads, complete and easy hot code reloading, dynamic tracing (being able to log into a node and update code at will as it the application is running).

The first two I can definitely see -- particularly for robustness and debugging -- but I'm a bit surprised by the last one. Do people actually really log into running production systems and update code like this? It seems like it would be an incredibly dangerous thing to do. (Akin to using direct DB connections and typing in DELETE statements directly rather than e.g. putting them in SQL scripts first.) It could potentially also make it extremely hard to know what's actually running in production.

> Do people actually really log into running production systems and update code

Yes, I've done it once in a while. Cases could be is to deploy a fix and the customer's system is up and running. If say it's something urgent that can't wait until it goes through the full deployment pipeline. Because hot code reloading works so well in Erlang it's not risky as doing it in Java for example.

In fact upgrading by hot code reloading is also a common thing Erlang world. So there are cases where it is done routinely. It takes some preparation and so on:


Another case is if you see an issue happening but don't have enough logging or tracing ability in that part of code. You can upgrade the code with an additional log statement or save extra info to a file for debugging. Then remove the patch. The alternative is to try to replicate that on a separate system which sometimes might not be easy - don't have the exact access pattern, exact data and other factor that that would duplicate the original environment.

But you're right doing it haphazardly and just sprinkling hot patched code updates everywhere is a path to disaster. So it's possible to monitor and record these updates to them them visible and managed better. It's up to the team / organization to handle that.

The bottom line don't do it routinely, but when you have it can really save the day. And it's something that many (most!) frameworks / runtimes / languages don't support as well as Erlang does.

You mentioned tracing. I'd like to expand on that a little.

What many people may be interested to know about Erlang is that you can log in to a production system, start a new shell running a tracer that listens on a localhost TCP socket[1], and use the dbg module in the production VM to trace calls and messages (and more) between any functions in any processes - in the running production system - and send them to the tracer node.

Done judiciously, the overhead is negligable, and the benefits are great. You can zoom in on bugs in real time.

I find the syntax of dbg match specs to be ugly, but it has saved my bacon so often it is so worth it, and it doesn't get mentioned that much, even though I feel it is almost as much a superpower as hot code loading.

[1] You use the separate shell to avoid accidentally crashing the production VM; if you do something boneheaded in the port-based shell, you can kill it and the production VM will just stop sending trace data to the dead TCP socket.

Ah, right, I guess the "add logging" case is a pretty compelling and not-too-scary (for my sensibilities) one. Point also well taken about it being safer because of the shared-nothing nature of processes.

Thanks for the perspective!

> Parallel Universe's Quasar is basically OTP for Java, so I know logically that Java is basically a superset of Erlang at this point

It's not really a superset. The JVM has a single global heap, whereas BEAM has per-process heaps. Global state is the enemy of concurrency, and the best GC algorithm is one you don't need to run at all. The "several hundred millisecond HTTP response latency and horde (sic) memory" problems you note are in part a result of having a global heap.

I agree. There is some work to go off heap with Java and supposedly you can get incredibly results. Unfortunately the libraries that seem to help doing this are either proprietary are not very well support or stalled.

Of course I think part of the problem is most people in Java land just don't have the 5m concurrency requirements so it doesn't get the love it probably should.

> while my Java apps take 2 minutes to start up, have several hundred millisecond HTTP response latency and horde memory

To be fair, there's no reason JVM servers can't have sub 10ms responses, that sounds like a problem at the application level like you mentioned (spring). Nothing wrong with a JVM language if it solves your problem.

> Java is basically a superset of Erlang at this point

Both VM are completely different beast.

Erlang is one of the few language that does Preemptive scheduling.


It was built from the ground up to do concurrency in an arguably superior way than Java. Java can't do this.


Wow Erlang have a really active community from the notable people that are commenting here.

> while my Java apps take 2 minutes to start up

This is the classic case (and I'm/we are guilty of it as well) of Java apps that are probably traditional monolithic apps that use frameworks with extensive reflection and class loading (ie spring component scanning).

You can get ridiculously fast loading time with Java if you use Dart2, a reasonable web framework and no ORMs. I'm talking 500ms... sort of depends on your JVM settings. Sure not 10ms but still way faster than 2 minutes.

500ms is not "ridiculously fast".

For service startup time? Sure it is.

How often are you booting your webservers that 500ms isn't fast enough?

in the JVM starting world it is..

> while my Java apps take 2 minutes to start up, have several hundred millisecond HTTP response latency and hoard memory

yeah this is pretty much exactly why I turned away from Clojure and Scala and went for Elixir. Also, super ugly Java stacktraces that were not hard to trigger at all. The vast majority of errors in Elixir are wonderfully explanatory.

A lot of the Erlang-y error messages that bubble up into Elixir can get pretty gnarly. Earlier today I accidentally tried doing `Map.get(key, map)` (map should be first arg) in a `&handle_call/1` callback on a GenServer module, and it got ugly fast. I knew what the problem was from experience, but the Erlang-y error can be intimidating to see for the first time.

Sometimes you have to jump into the source of some dependency (or even OTP itself) to really understand what's causing the error.

Thankfully, Erlang and Elixir tend to make this a lot less painful, in no small part due to their respective declarative/functional programming traits.

Off topic, but what's the story behind base-64 encoding your email? Is that just a spam prevention measure?

I think it is also sometimes a simple barrier to entry to weed out some candidates (think of it as a super simple fizzbuzz).

I also see a lot of job postings asking people to decode some string for bonus "points"... 90% of the time its just rot13.


I actually really like this idea and am going to steal it. :)

> I know that the JVM is a modern marvel of software engineering, so I'm always surprised when my Erlang apps consume less than 10MB of RAM, start up nearly instantaneously, respond to HTTP requests in less than 10ms and run forever, while my Java apps take 2 minutes to start up, have several hundred millisecond HTTP response latency and horde memory.

That's not the platform's problem, you're doing something wrong. I run my web services on Spray[1] and they start in maybe 10s, average HTTP request latency is 4ms, and they run forever at ~100-200MB RAM. That's without putting any effort into tuning.

[1] now replaced by akka-http but I haven't ported yet, and it works

Good stuff. Erlang VM FTW!

> mochiglobal, a module that exploits a feature of the VM: if Erlang sees a function that always returns the same constant data, it puts that data into a read-only shared heap that processes can access without copying the data

There is a nice new OTP 20.0 optimization - now the value doesn't get copied even on message sends on the local node.

Jesper L. Andersen (jlouis) talked about it in his blog: https://medium.com/@jlouis666/an-erlang-otp-20-0-optimizatio...

> After some research we stumbled upon :ets.update_counter/4

Might not help in this case but 20.0 adds select_replace so can do a full on CAS (compare and exchange) pattern http://erlang.org/doc/man/ets.html#select_replace-2 . So something like acquiring a lock would be much easier to do.

> We found that the wall clock time of a single send/2 call could range from 30μs to 70us due to Erlang de-scheduling the calling process.

There are few tricks the VM uses there and it's pretty configurable.

For example sending to a process with a long message queue will add a bit of a backpressure to the sender and un-schedule them.

There are tons of configuration settings for the scheduler. There is to bind scheduler to physical cores to reduce the chance of scheduler threads jumping around between cores: http://erlang.org/doc/man/erl.html#+sbt Sometimes it helps sometimes it doesn't.

Another general trick is to build the VM with the lcnt feature. This will add performance counters for locks / semaphores in the VM. So then can check for the hotspots and know where to optimize:


It isn't that likely the OTP20 optimization helps here. If the process never sends a message containing the literal value, then there is no benefit in the optimization. What `mochiglobal` and friends are good at is when you have a large set of data (A ring, say) which update rarely, so you can treat it as semi-static data in the system. But then you shouldn't really send that ring data around in the system too much, although it will now be free. [There is a nice subscription-based approach to updates which are now feasible in OTP20, but that is more for convenience]

if send/2 takes 30us to 70us, I'm guessing blocking as well, either on distributed communication or something else along those lines. For local message passes to take that long, my something-is-amiss-sixth-sense is tingling.

> It isn't that likely the OTP20 optimization helps here

Ah good point. I didn't look at the code much. I was thinking of cases of passing any of those literals in gen_server calls and such and just getting extra performance from upgrading to OTP20 as a side-effect.

Are there good resources / books on advanced BEAM? I'd love to know more of the nitty-gritty details of the VM, and how to make the best use of it.

https://github.com/happi/theBeamBook for the VM

http://erlang-in-anger.com/ for introspecting it in production

I would like to subscribe to your newsletter

Seriously, though, I might need your services in the future, if you're available.

Also, I'm pretty sure the security of your career is pretty guaranteed at this point lol (looking at Indeed data, interest in Elixir has risen 20fold in the past 3 years and the slope of that line is far steeper than all other languages in that space)

I don't have a newsletter. I thought of having a blog for a bunch of stuff like this but I like writing software more than blog posts and I knew it'd abandon it after a while. Maybe there is "turn all your hn posts higher than X upvotes into a blog post series" script somewhere.

However you can subscribe to the Erlang mailing list. That's often a good place to start for tricky or interesting questions you might have:


To contact me directly check my profile.


This is one of those few instances where getting the technology choice right actually has an impact on cost of operations, service reliability, and overall experience of a product. For like 80% of all the other cases, it doesn't matter what you use as long as your devs are comfortable with it.

Not sure why this comment saw a couple downvotes earlier. mbesto is correct: for most startups, most of the time, competitive advantage doesn't come from the underlying tech stack. To make a general statement, most things could be done similarly on any of several platforms. However, when product requirements match exceptionally well with a specialized technology, you can see things that would simply be infeasible or extremely tough using a different stack.

WhatsApp + Erlang was one of those cases (watch this talk and imagine trying to recreate that system with only a handful of server engineers using any other tech: https://www.youtube.com/watch?v=c12cYAUTXXs). Discord + Elixir appears to be another.

Curious if anyone has any examples that spring to mind from outside the highly concurrent messaging space.

A fun idea is to do away with the "guild" servers in the architecture and simply run message passes from the websocket process over the Manifold system. A little bit of ETS work should make this doable and now an eager sending process is paying for the work itself, slowing it down. This is exactly the behavior you want. If you are bit more sinister you also format most of the message in the sending process and makes it into a binary. This ensures data is passed by reference and not copied in the system. It ought to bring message sends down to about funcall overhead if done right.

It is probably not a solution for current Discord as they rely on linearizability, but I toyed with building an IRCd in Erlang years ago, and there we managed to avoid having a process per channel in the system via the above trick.

As for the "hoops you have to jump through", it is usually true in any language. When a system experiences pressure, how easy it is to deal with that pressure is usually what matters. Other languages are "phase shifts" and while certain things become simpler in that language, other things become much harder to pull off.

The true evil approach is to send the socket around, not the message, so that there is no copying required no matter what ;)

Wah. Easy there, Satan :-)

That is cool trick though. So it's basically sending the port itself around and changing its ownership, with something like port_connect(Port,NewOwner)?

And btw, thank you for writing https://www.erlang-in-anger.com and http://learnyousomeerlang.com !

The trick is more commonly used when writing to sockets. A socket owner is required for reading, not for writing.

The trick then is that when you need to write lots of data to a socket to just send a copy of it to the writer so they can dump all their data for cheap, but without changing ownership (which is costly).

Also recently I've gotten http://propertesting.com/ out, you might enjoy it :)

Thanks for explaining. I'll have to remember the socket trick.

> Also recently I've gotten http://propertesting.com/ out, you might enjoy it :)

It might be just what I need to understand and start using property tests. I've tried twice and gave up.

Oh and recon! Thanks for that too. Use it almost every day.

According to Wikipedia, Discord's initial release was March 2015. Elixir hit 1.0 in September 2014 [0]. That's impressively early for adoption of a language for prototyping and for production.

[0] https://github.com/elixir-lang/elixir/releases/tag/v1.0.0

If I remember right, they were using Erlang at the beginning, and moved slowly to Elixir as they got more comfortable, and the ecosystem built up around them.

We started with Elixir, some of our engineers have used Erlang prior though.

So, at this point, every language was scaled to very high concurrent loads. What does that tell us? Sounds to me like languages don't matter for scale. In fact, that makes sense, scale is all about parallel processes, horizontally distributing work can be achieved in all language. Scale is not like perforance, where if you need it, you are restricted to a few languages only.

That's why I'd like to hear more about productivity and ease now. Is it faster and more fun to scale things in certain languages then others. Beam is modeled on actors, and offer no alternatives. Java offers all sorts of models, including actors, but if actors are the currently most fun and procudctive way to scale, that doesn't matter.

Anyways, learning how team scaled is interesting, but it's clear to me now languages aren't limiting factors to scale.

> So, at this point, every language was scaled to very high concurrent loads. What does that tell us?

Just like any language vs language debate each one has benefits for various particular use-cases. Any meaningful comparison of languages must be prefaced with the use-case scenario.

One of the strongest use-cases of Erlang/Elixir has always been building large distributed apps that need to scale (async web apps, telecom, chat servers, messaging mobile apps, etc). The ability to build these large distributed systems are baked into the very primitive parts of the language and standard library - to a degree that few other languages can compare to it, if any.

With Erlang/Elixir you design ALL applications in a way where scaling is rarely an after thought but rather a natural extension of the program.

> Beam is modeled on actors, and offer no alternatives. Java offers all sorts of models, including actors, but if actors are the currently most fun and productive way to scale, that doesn't matter.

People often make the mistake of trivializing Erlang/Elixirs as merely programming with actors. It's development not only predated the actor model but it also goes well beyond that to being the standard programming style you use when developing any program when using the language - the same way Rails embraces MVP. When this is fundamental part of every Erlang application then the means of scaling to a large distributed system are also a fundamental part of each program.

This built-in scaling is gained without any significant costs in terms of development time but also provides many benefits beyond scaling, such as highly modular and extensible code. There are real benefits even if you don't plan to scale to a large distributed system. similar to Rails it creates a predictable program design which makes joining new projects easier and deters NIH syndrome that is far too common in Java/C++/etc. And ultimately, regardless of what you are building, it provides very high performance by default for the type of async style applications that are popular on the web today.

So the key point here is not that the end goal was achieved (that you can scale) but how you get there.

While you don't need language X to scale, certain languages can definitely make it _easier_ and more cost-effective to scale. So it can matter depending on what you're trying to achieve.

Yes, but I can run a Elixir app at scale far far far cheaper than I can a Ruby app.

> Sounds to me like languages don't matter for scale.

If by languages you mean syntax, perhaps. If you mean platforms, then it does matter. And it's non obvious things such as fault tolerance for example - ability not just to have lots small concurrent processes but that they have isolated heaps.That's not just gimmick but it allows designing systems and operating them in a different way. For example having whole subsystems crash and restart safely without affecting the rest of the service.

Or even silly things like being able to hot reload code or log into a live node and add a dynamic trace or hotpatch a module to get extra debugging info without stopping.

Now you can sort of do that with other frameworks, it's just it's much nicer in Erlang because it comes built-in and it just feels like using the right tool for the job. As in using a hammer to hammer nails vs say using the pliers to hammer nails.

If you have a share-nothing architecture, then yes, any language can scale to any load, some with more hardware than others.

Great to see more posts like this promoting Elixir. I've been really enjoying the language and how much power it gets from BEAM.

Hopefully more companies see success stories like this and take the plunge - I'm working on an Elixir project right now at my startup and am loving it.

Thanks for putting this writeup together! I use Elixir and Erlang every day at work, and the Discord blog has been incredibly useful in terms of pointing me towards the right tooling when I run into a weird performance bottleneck.

FastGlobal in particular looks like it nicely solves a problem I've manually had to work around in the past. I'll probably be pulling that into our codebase soon.

Note that Erlang 20 may have solved the problem that FastGlobal tries to fix (that of not copying large amounts of data unnecessarily)

Erlang 20 fixes the case where you're copying a constant literal, but unfortunately won't help if you're sharing a dynamically generated, but infrequently modified, term; like Discord does in this post.

Elixir was one of the reasons I started using Discord in the first place. I figured if they were smart enough to use Elixir for a program like this then they would probably have a bright future ahead of them.

In practice, Discord hasn't been completely reliable for my group. Lately messages have been dropping out or being sent multiple times. Voice gets messed up (robot voice) at least a couple times per week and we have to switch servers to make it work again. A few times a person's voice connection has stopped working completely for several minutes and there's nothing we can do about it.

I don't know if these problems have anything to do with the Elixir backend or the server.

EDIT: Grammar

The messages struggles have been sadly due to issues with Cassandra and GC pauses caused by bugs within it. We have been trying to work with the Cassandra developers to resolve these.

Voice issues should not be happening. Please contact our support with more information and we will gladly investigate.

Thanks for the response, it's good to know what's causing the problems with messages and that it's being worked on. I'll try to contact support next time I have voice issues with my group.

:) np

We love Cassandra, and hate it at the same time.

Check out this nasty bug we got. https://issues.apache.org/jira/browse/CASSANDRA-13004

Two things are true about Cassandra:

It is by far the best at doing what it does

It has plenty of room for improvement

You are 100% right that it is the best at what it does.

And thank you again with helping us with that bug :) we really appreciate it.

I'm currently far down the database rabbit-hole and have to ask: What's so great about Cassandra that you can't get with CouchDB or other AP (yeah, I know...) databases?

Solid ingestion story. Very very good write throughout. Linear scaling. Easy expansion / contraction. Complete flexibility in consistency vs availability tradeoff.

And most importantly:

It actually works at scale. Huge scale. Thousand node cluster and hundreds of thousands of instances scales.

Because a good chunk of the active maintainers actually run this shit in prod.

You might want to have a look at ScyllaDB.

We are currently in the process of testing ScyllaDB with double writes for our fixed data clusters. It is very scary to transition to something so new :)

Our message storage clusters have a very large set of data that keeps increasing and using Scylla without incremental repair will suck so we are waiting on that.

I do not see there any Elixir specific, it is all basically Erlang/Erlang VM/OTP stuff. When you using Erlang, you think in terms of actors/processes and message passing, and this is (IMHO) a natural way of thinking about distributed systems. So this article is a perfect example how simple solutions can solve scalability issues if you're using right platform for that.

You're right. Elixir doesn't pretend to do anything except make using the wonderful erlang VM/OTP stuff easier.

The VM is an absolute marvel of engineering, and it's insane to me that it doesn't have more adoption yet in big tech companies.

My best theory is that engineers in top engineering companies are actually not the best engineers but simply career engineers that learn one skill (python/java/C++) and then explain to every employer that this technology is the best for the problem they have, over and over again.

Coordination problems get more difficult in larger teams/companies. Getting everyone to use a particular non-standard language is a coordination problem. Thus large companies are unlikely to experiment with languages.

It makes sense too - say it's 10x easier to write something in language X than Y. If there's only 10 other people that might interact with the thing / have to read the sources, that's a great tradeoff. If there's a thousand other people that might have to at some point understand how some part of your code works, suddenly all of them have to learn the new language X.

> and it's insane to me that it doesn't have more adoption yet in big tech companies.

elixir (and winter) is coming

> My best theory is that engineers in top engineering companies are actually not the best engineers but simply career engineers that learn one skill (python/java/C++) and then explain to every employer that this technology is the best for the problem they have, over and over again.

you conflated "top" with "large".

It seems awkward to me. What if Erlang/OTP team can not guarantee message serialization compatibility across a major release? How you are going to upgrade a cluster one node at a time? What if you want to communicate with other platforms? How you are going to modify distribution protocol on a running cluster without downtime?

As soon as you introduce standard message format, then all nice features such as built-in distribution, automatic reconnect, ... are almost useless. You have to do all these manually. May be I'm missing something. Correct me if I'm wrong.

For a fast time to market it seems quite nice approach. But for a long running maintainable back-end it not enough.

You are missing that major-release compatibility is something which is taken seriously. Usually you have opcode stability for at least one release so OTP20 can run OTP19 bytecode. But not vice versa, naturally.

The same is true for serialization: new features are often introduced and then put to use a couple of major releases later. This ensures backwards compatibility. If you couldn't upgrade your cluster one node at a time, safely, then you would have to stop the system. The serialization format is also built to be machine-agnostic: You can run data from a 32bit little endian windows machine to a 64bit big endian sparc machine if you want, and it will work seamlessly. Of course that flexibility doesn't come for free and has an overhead. Another benefit is that data-at-rest can be safely decoded for every Erlang version back to at least release 6. This is quite useful in many situations.

So the path is usually to upgrade the OTP version first and then start using new features once the cluster is upgraded.

In the OTP20 release, a change happened in late RCs of the release because serialization stability was brought into jeopardy. It was reported by RabbitMQ and the serializer was changed so it is properly backwards compatible. Upgrade paths are important.

The erlang team has been through 20 major releases, I think I've run all of them since r13 or r14? Being able to upgrade your distributed system is important and they care about making it possible. Generally you upgrade one node at a time, until they're all upgraded and only then can you use new language features. Sometimes you find a build you like and stick with it until something comes up that causes you to upgrade.

There are many ways to interoperate with other languages. Including libraries for other languages to claim to be erlang nodes (they will have to upgrade too when you want to upgrade to a newer version of erlang with distribution protocol changes, for example when maps we're introduced in r17).

You can easily do standard dist messaging within your erlang cluster and serialize to whatever makes sense at the boarder. You can serialize to erlang term format, if you like; it's well specified, but not terribly compact.

You're going to have the same questions with any other language too. Very few companies get to write clients, servers, and everything else in one language that never updates.

> You're going to have the same questions with any other language too. Very few companies get to write clients, servers, and everything else in one language that never updates.

Then I don't see much difference between Go, Java, ... vs Erlang except they are simpler to learn plus finding Java, NodeJS, ... devs is much easier. What was the point of using Erlang? It was supposed to solve a problem for us, but we end up of solving the problem ourselves.

What I'm really trying to say is: integrating a mainstream language like Go, Java, C++, ... with a messaging layer like ZeroMQ (or something else) and adding some reliability features is going to be easier than introducing a totally new language (with a totally different paradigm) into the stack.

> Then I don't see much difference between Go, Java, ... vs Erlang except they are simpler to learn plus finding Java, NodeJS, ... devs is much easier.

For the first several years, my team of Erlang devs had zero Erlang experience prior to being hired; they were just smart and flexible developers. I was ahead of the curve because I sort of remembered seeing a Slashdot post about Ericson opening the language way back when.

A lot of people end up using RabbitMQ as their distributed message queue, which is built in Erlang. If you went with that instead of ZeroMQ; and then slowly added more things, there's a reasonable path to writing more Erlang.

I'm not sure if you can really bolt on messaging and reliability features and get the same results; just like bolting on security later, if it's something that you need, it works better if you have it from the beginning.

But certainly, if you're happy with your stack, don't change it.

Distributed Erlang compatibility is guaranteed for not just one, but two major versions. You can do a standard rolling upgrade and the newer nodes will talk to older nodes just fine.


Foreign nodes in C, Java, Python, etc. can also join an Erlang cluster:



That being said, a more typical architecture is to have Erlang spawn external processes and talk to them via stdio.

It's great that they support C and JVM stack. But for other major platforms such as NodeJS, Go, ... we still need to approach the goal with some C extensions integrated in both platforms. Erlang/OTP distribution/clustering is really not designed for heterogeneous environment which is fine, since it is used in telecom industry with nice commercial support contracts backed by Ericsson. However I don't consider it as an amazing piece of engineering from a perspective of a system designer who is dealing with many teams with different language/stack preferences.

Erlang is what you would use to implement a single service (which may run on a cluster of multiple nodes). It's not a generic messaging layer to use between services, you can use RabbitMQ or ZeroMQ or whatever you like for that.

Really liked the blog post. Elixir and the capabilities of the BEAM VM seems really awesome, but I can't really find an excuse to really use them in my day to day anywhere.

Whatsapp's story is somewhat similar. Relevant read to this subject.


I love discord's posts they are very informative and easy to read.

5 million concurrent users is great and all, but it would be nice if Discord could work out how to use WebSockets without duplicating sent messages.

This seems to happen a lot when you are switching between wireless networks (E.g. My home router has 2Ghz and 5Ghz wireless networks) or when you're on mobile (Seems to happen regularly, even if you're not moving around).

It's terribly annoying though and makes using the app via the mobile client to be very tedious.

Coming from IRC with netsplits and stuff, this seems like a first world problem to me hehe

You know there are discord bots for bridging between IRC and Discord?

Pretty funky :-)

I really like elxir the language, but find myself strangely hamstrung by the _mix_ tool. There is only an introduction to the tool, but not a reference to all the bells and whistles of the tool. I'm not looking for extra bells and whistles, but simple stuff like pulling in a module from GitHub and incorporate it. Is there such documentation? How do you crack Mix?

The docs for Mix are decent. You can start here:


When you are trying to get help w/ a specific task, you check checkout the mix tasks docs:


And honestly, I often just look at the source code:


The Elixir Slack channel is pretty amazing, too:


It looks like they have built an interesting, robust and scalable system which is perfectly tailored to their needs.

If one didn't want to build all of that in house though, is there anything they've described here that an off the shelf system like https://socketcluster.io doesn't provide ?

Yes. Discord is served over HTTPS. :P (your link is broken; socketcluster.io doesn't serve over HTTPS)

But seriously, Discord actually benchmarked 5 million concurrent users, horizontally distributed, and having to ferry messages across the cluster, with specifically tailored fanout patterns (rather than just a global pub/sub. I.e., who a message goes to varies, rather than just "everyone").

Socketcluster.io only has benchmarks for a single machine, capped at 42k concurrent connections (though to be fair that was due to them running a single client, rather than a limitation of the server). They don't out of the box support horizontal scaling; you're required to spin up your own message queue solution for that.

So, basically, you're advocating a technology that solves -the simplest part of the problem-, and nothing else. Whereas Phoenix + Elixir, even without any of the custom tweaking Discord describes, solve that AND more of the actual problem Discord had. So...yes, and no. Yes, there is plenty here they've describe that is not available in socketcluster.io, but no, nothing they've done here is no generally solved by an off the shelf system, because they're -using- an off the shelf system, Elixir + Phoenix.

Hi, main author of SocketCluster here.

SC does support automatic horizontal scaling across any number of machines out of the box if you're running it on Kubernetes.

There's also a CLI tool to deploy it automatically to any Kubernetes cluster: https://www.npmjs.com/package/baasil

See https://github.com/SocketCluster/socketcluster/blob/master/s...

I just meant you still need a third party MQ to be spun up (per docs here - http://socketcluster.io/#!/docs/scaling-horizontally). Without that, there is no distribution happening.

From my understanding, you're basically saying "You can combine SocketCluster with the MQ of your choice (the installation and configuration of which is left as an exercise to the reader) and then between Docker, Kubernetes, and Baasil you can orchestrate and deploy it across a cluster". That sounds a bit more complex than just using SocketCluster, which is what the OP seemed to be indicating was all you needed, and is also including the DevOps story, which I don't think either he or I was intending to include.

I was not trying to indicate that SocketCluster can't be -used- to scale websockets horizontally, but that it's not just an off the shelf solution that would have solved Discord's problem either. It requires other parts, as both the docs and you mention.

I'll also reiterate from my post, SocketCluster has no benchmarks pertaining to what happens when you -do- scale horizontally (per docs here - http://socketcluster.io/#!/performance ). That lack alone would kill my interest in it (as would scc-state being a single instance, which would make fault tolerance a real concern to me, but it looks like you know that already). Is performing horizontal scalability tests on the roadmap?

If you use SCC, then you don't need a separate MQ - That is only if you want to do things yourself manually. I will update the docs to make that clearer.

It should only take a few minutes to deploy a cluster across hundreds of machines. The only limit is the maximum number of hosts that Kubernetes itself can handle (which is I think is over 1000 now)? SCC is self-sharding and runs and scales itself automatically with no downtime.

You can easily handle 5 million concurrent users with a small cluster. SC's problem isn't scalability, it's marketing.

That's perfectly fair; fix the marketing then. :P In evaluating a solution, the marketing is the -first thing- anyone looks at. And how it currently reads, "SocketCluster only provides you the interface for synchronizing instance channels - It doesn't care what technology/implementation you use behind the scenes to make this work" definitely reads as "You need a technology/implementation behind the scene" rather than "we provide you a default one, and you can swap it out".

For me to pick Socketcluster for a distributed solution (or more broadly, what I'd want for -any- technical solution) I'd want to know what else I need to pair it with (which the docs actually mislead me on), what else I can benefit from (which the docs don't tell me, but which does exist per your links), and what benefits I stand to get from using it (the docs tell me only marketing claims, but with no metrics, performance, data, etc, for what happens in a distributed context, well, I would avoid it).

Ideally, set up a clustered performance test, and then make as many of the artifacts (docker images, configs, readme, etc) available so others can conduct the same performance test themselves (as well as have a reference architecture for their own solution). Heck, if you're doing it in AWS, consider making the AMIs available along with whatever modifications need to happen. -That- would be very convincing for someone looking to adopt a solution in this space, if they could literally just spin up some EC2s and immediately start throwing load at a fully configured cluster.

Also, to make it clear, is this handling message passing between instances in the cluster?

Thanks for the advice.

Yes, it handles message passing between instances in the cluster. That means if you publish a message on a channel whilst connected to one host, the message will also reach subscribers to that channel which are on any other host in the cluster. It shards all channels across available brokers, when you scale up the number of brokers, it will automatically migrate the shards across available brokers with no downtime.

Okay, nice. Then yeah, given some performance benchmarks showing some numbers at increasing number of nodes, with messages being broadcast across 1-to-1 channel pairings (i.e., direct message), 1-to-100 (for groups), and 1-to-all, I think it could sell itself as a pretty compelling turnkey solution (barring the scc-state concern which you're aware of).

Possibly also consider some testing and documentation around geographic distribution; what happens if the nodes are located in different datacenters with non-trivial latency between them? Is that an issue? In the event of netsplits, does it split brain (probably not, given scc-state, but addressing that might cause it to)? That might be fine, it might not, depending on the use case, and just documenting what happens (by default, at least, if it's to be tunable) would be helpful as well.

Reading posts like this about widely distributed applications always gets me interested in it as a career path. Currently I'm working as a front-end dev with moderate non-distributed back-end experience. How would someone in my situation, with no distributed back-end experience, break in to a position working on something like Discord?

I think while this is great, it is good to remember that your current tech stack maybe just fine! after all, Discord start with mongodb[0].

[1]. https://blog.discordapp.com/how-discord-stores-billions-of-m...

Is there any update on BEAMJIT?

It was super promising 3 or so years ago. But I haven't seen an update.

Erlang is amazing in numerous ways but raw performance is not one of them. BEAMJIT is a project to address exactly that.


Still ongoing work. My personal bet is a bit more on modernizing HiPE however (by using the LLVM backend more).

Amy ETA on when we can start using beamjit?

Given that it has been postponed a couple of times, no. JITs are hard to pull off and it will probably have a period of worse stability as well before it matures. Another problem is getting a JIT to be faster than the interpreter. Erlang's BEAM is threaded code and also macro-instructions, so it almost looks like a JIT internally.

The big gains would be in inlining across module boundaries and type speculation. But I hold that if we could compile bundles of modules in HiPE, we would have the same gain for a fraction of the development and maintenance effort.

The biggest lure of native code generation would be that we could get rid of a lot of C code in the system as the native cogen would be able to rival the C code in speed. Many Erlang programs spend shockingly little time in the emulator loop, especially if they are communication heavy.

If you need speed today, don't underestimate a port-program. My test is that you can pipeline about a million requests back and forth to an OCaml program per second per core. So if your work is on the order of 1+ milliseconds, this is usually a feasible strategy. Espcially because OS isolation means you can handle exceptions in the OCaml program from the Erlang side by restarting the port.

Would you say the OCaml-port-as-an-optimization-strategy only makes sense if the program is compute bound, though?

The reason I ask is because we're running a Erlang+JInterface program and the performance advantage the JVM has over BEAM is less than I would have expected. Even batching requests up into big pieces, we still see it's about 30% slower than running the same stuff in Elixir, without so much copying. But the reason we're doing JVM stuff at all is so we can re-use a whole bunch of code we already had written, and I would have expected it would have been a marginal performance win as well, but it's not.

Perhaps we're doing it wrong, too.

Very interesting article! One thing I'm curious about is how to ensure a given guild's process only runs on one node at a time, and the ring is consistent between nodes.

Do you use an external system like zookeeper? Or do you have very reliable networking and consider netsplits a tolerable risk?

We use etcd.

It's interesting how on StackOverflow Jobs Elixir knowledge is required more often than Erlang.


Despite its appeal to HN geeks I doubt if Elixir will ever achieve mainstream adoption. Searching Indeed.co.uk's API by title, there are only 5 Elixir jobs in London, compared with 445 Python and 171 Ruby. I also attended a Silicon Roundabout jobs fair recently and was disappointed to find Elixir wasn't even listed in the literature.

It's still early. I've bet on the wrong horse (stack) before, so take my comments with a grain of salt, but I know several companies that are currently adopting Elixir/Phoenix with the same level of excitement that I recall from the early days of Ruby/Rails. It may never be "Rails big", but there is definitely some momentum building.

Erlang has been around a lot longer, and is much more established. That could both support Elixir and work against it -- there are probably more Erlang jobs, but Elixir will benefit from the long-term demonstrated competence of the Erlang VM. I still wonder if it's just a fad. Personally I prefer Erlang, though I don't have really deep experience with either language.

When Rails arrived on the scene it was very different from everything else, but also there weren't 1000 new languages/ frameworks popping up all at once.

I'm not implying in anyway that Elixir is bad. I just think there are too many horses these days to know which to bet on. Elixir? Node? Go? Rust? Something else?

Depends on what you're doing, really.

Rust is lovely for security-sensitive code - I'm writing a customer identity management system with Rust+Postgres+Redis.

Go is... acceptable... for "glue" code where PHP would've been used a decade ago and Perl before that - all the successes of it I've seen fall into that sort of pattern.

Elixir is great when you need to think about networked, stateful systems on the scale of a rack of machines - it provides many of the components to help you design systems at that scale.

So... as ever, they all do quite seriously different things. I don't think many people need to build the sorts of systems Elixir is good for - it'll always have its niche in large-scale communications systems, though. A good many webapps fall into the patterns that Go is good for - take user input, munge it, send it to some backend system. A fair bit of code that drives the foundations of what those webapps are built on will eventually be written in Rust.

I understand where they are supposed to fit, but the problem is there are so many existing tools to fill the same needs already.

As an example Erlang and Elixir both fit essentially the same bill so it seems to me what was already a small niche is just getting fractured.

Elixir is just alternative (Ruby-ish) syntax for Erlang, as much as people have hyped it up - Erlang code can call Elixir code and vice versa with essentially no abstraction cost. In fact, the most popular web framework for Elixir is heavily built on Erlang code.

That was my point exactly. Before if you really needed the benefits of running on BEAM then you would have chosen Erlang. Now that subsection will fragment between Erlang and Elixir. Today Elixir developers still will lean heavily on the interaction of Erlang libraries until someone in Elixir-land needs something that isn't supported so then they will re-invent that particular wheel in Erlang and thus the cycle continues ever on.

Then as you stated yourself Go is being used in areas where PHP & Perl were used before (as were also Ruby, Python, and Lua). So now there is just one more fracture there. Of course this is especially interesting since Go was supposed to be a better C or C++ then that should have placed it pretty squarely in systems programming land, yet is has managed to gain more traction as a glue and web services languages, ergo it ends up competing in this space when it probably should have competed more with C/C++/Rust.

Rust; as a systems programming language still has to compete with C/C++/ObjC in this space and considering that "all the systems" already run on these languages that is one huge challenge to change the guard there. I'm not saying that Rust is bad, rather just that I think it's long term outlook may actually be rather bleaker than other languages due to these challenges. Quite frankly D had some great concepts and improvements as a systems language over C++, but it is still barely a blip on anyone's radar.

If an individual is just trying to learn and expand their horizons then any of these languages are great to pick up. If, on the other hand, they are banking their future career on one then none is a sure fire bet right now. One would honestly still have more luck with a tried and true like Java.

Personally I enjoy picking up new languages just to see things from a different perspective and continue learning. Heck I spend much of my free time coding in Crystal which hasn't even hit a 1.0 launch so that adds pretty much zero improvement to my career prospects. :)

I wouldn't bank a career on a language anyway, so.

However, as Elixir is just another skin for Erlang, switching from Elixir to Erlang or LFE or whatever the next language for that runtime is will be simple - the issue is really BEAM and OTP, as few of the skills learned around them are cleanly transferrable to other runtimes and frameworks.

I suspect that Go will be around for a while by sheer force of inertia at this point - we're 5 years since 1.0 now and it still seems like there's new major projects being started in it every day.

Rust is... interesting, but I think it'll always have its niche. As much as people like to riff on the RIIR crowd, there's actually a somewhat decent ecosystem of reimplemented libraries and systems in Rust already, and now that Rust code is part of Firefox it seems unlikely that Mozilla will stop supporting it for a long time to come. It also seems to not be a zero-sum game - rather than stealing devs from C++, it's brought a lot of developers from "non-systems" languages into the "systems" space.

Exactly. If Elixir didn't have to compete with Clojure, Rust, Go, Node, Scala and Kotlin it might have had a chance to become mainstream. Beyond the hype job stats are the ultimate indicator.

I was not talking about the job opportunities per se, just commented how in just about two years Elixir surpassed Erlang in its own niche. Just goes to say how an alien(prolog like) language syntax can really cripple the spread of state of the art technology. Thats why we never going to see Scala or Clojure be more popular than Java on JVM or Elm/Purescript/etc in browser. Because java and js are tolerable and have familiar Algol syntax... Not to mention that it can be said that most of the alternatives I have listed have unusual syntax.

Just saying but most companies that search elixir dev do not go in this website. They immediately go for the community. Faster, easier and you get more consistent results.

Difference between hiring "someone that knows it" and "a good dev interested by it"

That's probably also true of Ruby and Python re hiring so the numbers are still relevant.

Following this logic it'd only ever be possible for mainstream technologies to ever achieve mainstream adoption.

Just as an aside how would people build something like this if they were to use say Python and try to scale to these sort of user levels? Has anyone succeeded? I'd say it would be quite a struggle without some seriously clever work!

Yeah pretty much impossible with python.

Hi community, Let me share my experience with you. I'm a hardcore Rails guy and I've been advocating and teaching Rails to the community for years.

My workflow for trying out a new language involves using the language for a small side project and gradually would try to scale it up. So, here's my summary, my experience of all the languages so far:

Scala - It's a vast academic language (official book is with ~700 pages) with multiple ways of doing things and it's attractiveness for me was the JVM. It's proven, robust and highly scalable. However, the language was not quite easy to understand and the frameworks that I've tried (Play 2, Lift) weren't as easy to transition to, for a Rails developer like me.

Nevertheless, I did build a simple calendar application, but it took me 2 months to learn the language and build it.

GoLang - This was my next bet, although I didn't give up on Scala completely (I know it has its uses), I wanted something simple. I used Go and had the same experience as I had when I used C++. It's a fine language, but, for a simple language, I had to fight a lot with configuration to get it working for me - (For example, it has this crazy concept of GOPATH where your project should reside and if your project isn't there it'll keep complaining). Nevertheless, I build my own (simple) Rails clone in GO and realized this isn't what I was looking for. It took my about a month to conquer the language and build my (simple) side project.

Elixir - Finally, I heard of Elixir on multiple HN Rails release threads and decided to give it a go. I started off with Phoenix. The transition was definitely wayy smoother from Rails, especially considering the founding member of this language was a Rails dev. himself (the author of "devise" gem). At first some concepts seemed different (like piping), but once I got used to it, for me there was no looking back.

All was fine until they released Phoenix 1.3, where they introduced the concept of contexts and (re) introduced Umbrella applications. Basically they encourage you to break your application into smaller applications by business function (similar to microservices) except that you can do this however you like (unopinionated). For example, I broke down my application by business units (Finance, Marketing, etc.). This forced me to re-think my application in a way I never would have thought and by this time I had finished reading all 3 popular books on this topic (Domain Driven Design). I loved how the fact that Elixir's design choices are really well suited for DDD. If you're new to DDD I suggest you try giving it a shot, it really can force you to re-think the way you develop software.

By the end of two weeks after being introduced to Elixir, I picked up the language. In a month and a half, I built a complete Salesforce clone just working on the weekends. And this includes even the UI. And I love how my application is always blazing fast, picks up errors even before it compiles and warns me if I'm no using a variable I defined somewhere.

P.S there IS a small learning curve involved if you're starting out fresh:

1) IF you're used to the Rails asset pipeline, you'll need to learn some new tools like Brunch / Webpack / etc. 2) Understand about contexts & DDD (optional) if you want to better architect your application. 3) There is no return statement in Elixir!

As a Ruby developer, here are my thoughts:

1. So, will I be developing with Rails again? Probably yes, for simpler applications / API servers. 2. Is Ruby dying? No. In fact, I can't wait for Ruby 3.

Some drawbacks of Elixir: 1. Relatively new, so sometimes you'll be on your own and that's okay. 2. Fewer libraries as compared to the Ruby eco-system. But you can easily write your own. 3. Fewer developers, but should be fairly to onboard Ruby developers.


When have you ever read, "How Acme scaled J2EE to 5M Concurrent Users"? I became an IT architect in 1998 at IBM, the year Sun released j2ee and IBM released Websphere. I have experienced 20 years of enterprise Java and object oriented computing, and I was thrilled when Elixir came out. I was a mainframe programmer before OO became all the rage, so I never really felt at home doing objects. Functional programming feels completely natural to me though.

What I like about this article is that they shared everything they learned with the community. Thank you for that excellent experience report.

"Discord clients depend on linearizability of events"

Could this be possibly be the cause of the message reordering and dropping that I experience when I'm on a spotty connection?

I realize this is off topic but how does Discord make money? I can't figure out their biz model (I'm not a gamer so I didn't even know about them).

Anyone know if Phoenix/Elixir have something similar to Ruby's bettererror gem? I see Phoenix has a built-in error stack trace page which looks like a clone of bettererror but it doesn't have the real-time console inside of it.

Also, I wish they had a ORM like Sequel. These two are really what is holding me back from going full in on Elixir. Anyone can care to comment on this?

Ecto is hands down the best "ORM" I've used. At no point has it ever gotten in the way of us dynamically creating queries on the fly, just following the Ecto internals (not recommended, but totally doable) and has an extremely expressive syntax. The team behind it has been rolling out new features and are extremely responsive to requests on the mailing list. I encourage you to give it a chance.

The mental shift is that "schemas" are not objects in an ORM sense, but instead are just data, or views over data. The functions come from taking in data or a Changeset and manipulating those, versus calling a function on a class.

As far as errors, Elixir 1.4.5 has much better error messages, specifically printing the args to crashes and such, and OTP 20/ Elixir 1.5 should drastically improve Dialyzer error messages. I am not a Ruby guy, so perhaps you can say what you feel is missing from Elixir's messages and how the Ruby gem improves it. Also you can literally inspect running state on the fly with the BEAM tooling, so the need for things like debuggers goes down.

Could you explain which changes to Elixir 1.5 and OTP 20 should result in better Dialyzer error messages? I didn't find anything relevant to that in the respective changelogs.

(For context, Dialyzer is a static analysis tool detecting type errors which is part of the core Erlang distribution.)

It's kind of cryptic, but the Erlang release notes say:

  OTP-14369    Application(s): compiler, dialyzer, stdlib
               Related Id(s): PR-1367

               The format of debug information that is stored in BEAM
               files (when debug_info is used) has been changed. The
               purpose of the change is to better support other
               BEAM-based languages such as Elixir or LFE.

               All tools included in OTP (dialyzer, debugger, cover,
               and so on) will handle both the new format and the
               previous format. Tools that retrieve the debug
               information using beam_lib:chunk(Beam, [abstract_code])
               will continue to work with both the new and old format.
               Tools that call beam_lib:chunk(Beam, ["Abst"]) will not
               work with the new format.

               For more information, see the description of debug_info
               in the documentation for beam_lib and the description
               of the {debug_info,{Backend,Data}} option in the
               documentation for compile.

There aren't specific notes in the Elixir 1.5 release notes, but given that it is OTP20 compatible, I assume it would be able to leverage this, perhaps with some work by Dialyxir.


> Also, I wish they had a ORM like Sequel. These two are really what is holding me back from going full in on Elixir. Anyone can care to comment on this?

May be a matter of semantics, but the concept of ORM simply can't exist in a functional language as there aren't objects. That said, Ecto is pretty much the go-to in Elixir. It's extremely powerful, and in my opinion provides the right amount of abstraction without going too far.

Curious about your thoughts on Ecto: https://github.com/elixir-ecto/ecto. Seems to be the most popular solution for database access.

Phoenix/Elixir will likely never have an ORM because, well, the O... But I think Ecto is pretty cool, working with it right now on my first Phoenix website, actually

> like Sequel

Ecto can do a reasonable impression with its DSL & direct use of tables. You don't need to go through the whole stack of schemas and such.

What in particular do you like about Sequel (over what Ecto is providing)?

Just use IEx.pry if you want a debug console.

Compared to slack discord is a much better service for large groups . Facebook uses them for react.

I just wish they were more professional in their communications. Service went down hard this weekend and their response was a cat GIF. Hard to justify to anyone in business that they should take it seriously. A shame when their tech is such a great match for professional collaboration, and all it would take to grow it organically would be less marketing copy not more.

We take outages seriously internally, but users generally don't care about the nitty-gritty. For a full, detailed postmortem on that outage you can check out our status site: https://status.discordapp.com/incidents/ywdwttd6b0hg

Sad to see some people taking raw and insignificant benchmarks to evaluate a language[0].

[0] https://news.ycombinator.com/item?id=14479757

Really lovely post!

I wonder how Cloud Haskell would fare in such a scenario

I so appreciate write ups that get into details of microsecond size performance gains at that scale. It's a huge help for the community.

Erlang and Elixir measure in microseconds, so it'd be them throwing away information if they did otherwise!

What is the business model behind Discord? They boast about being free multiple times, how do they make money? Or plan to make money?

They currently have a Nitro service that's $5/month for little features, but I don't know how far that takes them towards profitability.


What is Discord and Elixir?

"How Discord Scaled Elixir to 5M Concurrent Users"

click link

[Error 504 Gateway time-out]

only on Hacker News

Unlike Discord's design team who seem to just copy all of Slack's designs and assets, the Engineering team seems to have their shit together, it is delightful to read your Elixir blogposts. Good job!

yeah go on with the downvotes, you know everyone's thinking what I said! Copying your way to success is a strategy and it's not the first time either way :)

Problem is that Discord sucks since it does not have a dedicated server. Sorry, move along.

That's actually why it doesn't suck for the vast majority. Not everyone wants to pay $ every month so they could have their own voice / chat server.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact