
Why Discord is switching from Go to Rust - Sikul
https://blog.discordapp.com/why-discord-is-switching-from-go-to-rust-a190bbca2b1f
======
the-alchemist
Looks like the big challenge is managing a large, LRU cache, which tends to be
a difficult problem for GC runtimes. I bet the JVM, with its myriad tunable GC
algorithms, would perform better, especially Shenandoah and, of course, the
Azul C4.

The JVM world tends to solve this problem by using off-heap caches. See Apache
Ignite [0] or Ehcache [1].

I can't speak for how their Rust cache manages memory, but the thing to be
careful of in non-GC runtimes (especially non-copying GC) is memory
fragmentation.

Its worth mentioning that the Dgraph folks wrote a better Go cache [2] once
they hit the limits of the usual Go caches.

From a purely architectural perspective, I would try to put cacheable material
in something like memcache or redis, or one of the many distributed caches out
there. But it might not be an option.

It's worth mentioning that Apache Cassandra itself uses an off-heap cache.

[0]:
[https://ignite.apache.org/arch/durablememory.html](https://ignite.apache.org/arch/durablememory.html)
[1]: [https://www.ehcache.org/documentation/2.8/get-
started/storag...](https://www.ehcache.org/documentation/2.8/get-
started/storage-options.html#bigmemory-\(off-heap-store\)) [2]:
[https://blog.dgraph.io/post/introducing-ristretto-high-
perf-...](https://blog.dgraph.io/post/introducing-ristretto-high-perf-go-
cache/)

~~~
stingraycharles
> The JVM world tends to solve this problem by using off-heap caches. See
> Apache Ignite [0] or Ehcache [1].

For those who care, I was interested how off-heap caching works in Java and I
did some quick searching around the Apache Ignite code.

The meat is here:

\- GridUnsafeMemory, an implementation of access to entries allocated off-
heap. This appears to implement some common Ignite interface, and invokes
calls to a “GridUnsafe” class
[https://github.com/apache/ignite/blob/53e47e9191d717b3eec495...](https://github.com/apache/ignite/blob/53e47e9191d717b3eec495e6246cd957a8d33c7d/modules/core/src/main/java/org/apache/ignite/internal/util/offheap/unsafe/GridUnsafeMemory.java)

\- This class is the closest to the JVM’s native memory, and wraps
sun.misc.Unsafe:
[https://github.com/apache/ignite/blob/53e47e9191d717b3eec495...](https://github.com/apache/ignite/blob/53e47e9191d717b3eec495e6246cd957a8d33c7d/modules/core/src/main/java/org/apache/ignite/internal/util/GridUnsafe.java)

\- And this, sun.misc.Unsafe, is what it’s all about:
[http://www.docjar.com/docs/api/sun/misc/Unsafe.html](http://www.docjar.com/docs/api/sun/misc/Unsafe.html)

It’s very interesting because I did my fair share of JNI work, and context
switches between JVM and native code are typically fairly expensive. My guess
is that this class was likely one of the reasons why Sun ended up implementing
their (undocumented) JavaCritical* etc functions and the likes.

~~~
sreque
Unsafe lets you manipulate memory without any JNI overhead other than when
allocating or de-allocating memory, and that is usually done in larger chunks
and pooled to avoid the overhead at steady state. Netty also takes advantage
of Unsafe to move a lot of memory operations off the java heap.

Unsafe was one of the cooler aspects to Java that Oracle is actively killing
for, well, no good reason at least.

~~~
jayd16
C# seems to have a neat middle ground for this kind of stuff with their
Span<T> api.

~~~
to11mtm
True, but we had our own version of unsafe for a much longer time. MS was just
pragmatic enough to allow it across the ecosystem.

I'm guessing at least some of that was a side effect of wanting to support
C++; not having pointers as an option would have killed C++/CLI from the get
go.

------
rvcdbn
Seems like you were hitting: runtime: Large maps cause significant GC pauses
#9477 [0]

Looks like this issue was resolved for maps that don't contain pointers by
[1]. From the article, sounds like the map keys were strings (which do contain
pointers, so the map would need to be scanned by the GC).

If pointers in the map keys and values could be avoided, it would have (if my
understanding is correct) removed the need for the GC to scan the map. You
could do this for example by replacing string keys with fixed size byte
arrays. Curious if you experimented this approach?

[0]
[https://github.com/golang/go/issues/9477](https://github.com/golang/go/issues/9477)

[1] [https://go-review.googlesource.com/c/go/+/3288](https://go-
review.googlesource.com/c/go/+/3288)

~~~
gwbas1c
Everything I've read indicates that RAM caches work poorly in a GC
environment.

The problem is that garbage collectors are optimized for applications that
mostly have short-lived objects, and a small amount of long-lived objects.

Things like large in-RAM LRU are basically the slowest thing for a garbage
collector to do, because the mark-and-sweep phase always has to go through the
entire cache, and because you're constantly generating garbage that needs to
be cleaned.

~~~
pkolaczk
A high number of short lived allocations is also a bad thing in a compacting
GC environment, because every allocation gets you a reference to a memory
region touched very long time ago and it is likely a cache miss. You would
like to do an object pool to avoid this but then you run into a pitfall with
long living objects, so there is really no good way out.

~~~
mcguire
???

The allocation is going to be close to the last allocation, which was touched
recently, no? The first allocation after a compaction wii be far from recent
allocations, but close to the compacted objects?

~~~
pkolaczk
Close to the last allocation doesn't matter. What matters is the memory
returned to the application - and this is memory that has been touched long
ago and unlikely in cache. If your new generation size is larger than L3 cache
it will have to be fetched from main memory for sure every time you start the
next 64 bytes. I believe a smart cpu will notice the pattern and will prefetch
to reduce cache miss latency. But a high allocation rate will use a lot of
memory bandwidth and would thrash the caches.

An extreme case of that problem happens when using GC in an app that gets
swapped out. Performance drops to virtually zero then.

------
carllerche
Tokio author here (mentioned in blog post). It is really great to see these
success stories.

I also think it is great that Discord is using the right tool for the job. It
isn't often that you _need_ the performance gains that Rust & Tokio so pick
what works best to get the job done and iterate.

~~~
Polyisoprene
No offense to Tokio and Rust, I really like Rust, but having someone rewriting
their app because of performance limitations in their previous language
choice, isn’t really someone picking the right tool for the job necessary.

I’m not so sure they would have done the rewrite if the Go GC was performing
better, and the choice of Rust seems primarily based on prior experience at
the company writing performance sensitive code rather than delivering business
value.

~~~
qaq
too much focus on "business value" often ends-up with codebase in a state that
makes delivery of that business value pretty impossible. Boeing was delivering
a lot of business value with MAX ...

~~~
sneezetheory
THIS !!! this is so underrated

------
RcouF1uZ4gsC
> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize
> memory usage.

Collections are one of the big areas where Go's lack of generics really hurts
it. In Go, if one of the built in collections does not meet your needs, you
are going to take a safety and ergonomic hit going to a custom collection. In
Rust, if one of the standard collections does not meet your needs, you (or
someone else) can create a pretty much drop-in replacement that does that has
similar ergonomic and safety profiles.

~~~
correct_horse
I'm not sure what you mean by standard collections, but BTreeMap is in Rust's
standard library.

~~~
pdpi
I think the point the GP is trying to make is that there’s no reason why
BTreeMap couldn’t be an external crate, while only the core Go collections are
allowed to be generic.

A corollary to this is that adding more generic collections to Go’s standard
library implies expanding the set of magical constructs.

~~~
The_rationalist
Rust has it's lot of weird hacks too. E.g array can take traits impls only if
they have less than 32 elements... [https://doc.rust-
lang.org/std/array/trait.LengthAtMost32.htm...](https://doc.rust-
lang.org/std/array/trait.LengthAtMost32.html)

~~~
masklinn
That's… not that at all. You can absolutely implement traits for arrays of
more than 32 elements[0].

It is rather that due to a lack of genericity (namely const generics) you
can't implement traits for [T;N], you have to implement them for each size
individually. So there has to be an upper bound somehow[1], and the stdlib
developers arbitrarily picked 32 for stdlib traits on arrays.

A not entirely dissimilar limit tends to be placed on _tuples_ , and
implementing traits / typeclasses / interfaces on them. Again the stdlib has
picked an arbitrary limit, here 12[2], the same issue can be seen in e.g.
Haskell (where Show is "only" instanced on tuples up to size 15).

These are not "weird hacks", they're logical consequences of memory and file
size not being infinite, so if you can't express something fully generically…
you have to stop at one point.

[0] here's 47 [https://play.rust-
lang.org/?version=stable&mode=debug&editio...](https://play.rust-
lang.org/?version=stable&mode=debug&edition=2018&gist=c523f6cf4ff4ff2e19adf79300d51567)

[1] even if you use macros to codegen your impl block

[2] [https://doc.rust-
lang.org/src/core/fmt/mod.rs.html#2115](https://doc.rust-
lang.org/src/core/fmt/mod.rs.html#2115)

~~~
kibwen
Also worth noting that Rust's const generics support has progressed to the
point that the stdlib is already using them to implement the standard traits
on arrays; the 32-element issue still technically exists, but only because the
stdlib is manually restricting the trait implementation so as to not
accidentally expose const generics to stable Rust before const generics is
officially stabilized.

------
_ph_
If you have a problem at hand which does not really benefit from the presence
of a garbage collector, switching to an implementation without a garbage
collector has quite a potential to be at least somewhat faster. I remember
myself to run onto this time trigger for garbage collection long in the past -
though I don't remember why and mostly forgot about ever since until I read
this article. As also written in the article, even if there are no allocations
going on, Go forces a gc every two minutes, it is set here:
[https://golang.org/src/runtime/proc.go#L4268](https://golang.org/src/runtime/proc.go#L4268)

The idea for this is (if I remember correctly) to be able to return unused
memory to the OS. As returning memory requires a gc to run, it is forced in
time intervals. I am a bit surprised that they didn't contact the
corresponding Go developers, as they seem to be interested in practical use
cases where the gc doesn't perform well. Besides that newer Go releases
improved the gc performance, I am a bit surprised that they didn't just
increase this time interval to an arbitrary large number and checked, if their
issues went away.

~~~
KMag
Not only is there good potential for a speed improvement, but languages built
around the assumption of pervasive garbage collection tend not to have good
language constructs to support manual memory management.

To be fair, most languages without GCs also don't have good language
constructs to support manual memory management. If you're going to make wide
use of manual memory management, you should think very carefully about how the
language and ecosystem you're using help or hinder your manual memory
management.

------
jrockway
This seems like a nice microservices success story. It's so easy to replace a
low-performing piece of infrastructure when it is just a component with a
well-defined API. Spin up the new version, mirror some requests to see how it
performs, and turn off the old one. No drama, no year-long rewrites. Just a
simple fix for the component that needed it the most.

~~~
thijsvandien
You don't need microservices for that, though. One might as well have moved
that piece into a library.

~~~
kccqzy
And then deal with cross-language FFI boundaries and cross-language builds.

~~~
SlowRobotAhead
This is what clicked for me on microservices years back. That the language
wasn’t important and if I couldn’t do it in python or C, someone else could in
Go or Java or etc.

Compared to if I wrote something in house entirely in C... lolno

~~~
Tomis02
Landing in a shop that uses N programming languages for N microservices would
be a pretty miserable experience.

~~~
koffiezet
I've seen quite a few environments, and usually there's only a limited current
set of tech the devs are allowed to use, and if that's not the case, I try to
enforce this, but this set should evolve depending on the needs.

The main issue however is manpower. At my current client, one of the
technologies still actively being used for this reason is PHP (which is a
horrible fit for microservices for a lot of reasons), because they have a ton
of PHP devs employed, and finding a ton of (good) people with something more
fitting like Go or Rust knowledge is hard and risky and training costs a lot
of money (and more importantly: time)...

~~~
qaq
I can buy this for Rust but if people have issues picking up Go quickly ...

~~~
koffiezet
Well, picking up the language itself is one thing (and I agree, that's quite
easy with Go), but getting familiar with the ecosystem, best practices and
avoiding habits from other languages? That's an entirely different thing.

And that's also how management usually sees it, and if they're smart they also
realise that the first project using an unfamiliar technology is usually one
to throw away.

------
flafla2
> After digging through the Go source code, we learned that Go will force a
> garbage collection run every 2 minutes at minimum. In other words, if
> garbage collection has not run for 2 minutes, regardless of heap growth, go
> will still force a garbage collection.

> We figured we could tune the garbage collector to happen more often in order
> to prevent large spikes, so we implemented an endpoint on the service to
> change the garbage collector GC Percent on the fly. Unfortunately, no matter
> how we configured the GC percent nothing changed. How could that be? It
> turns out, it was because we were not allocating memory quickly enough for
> it to force garbage collection to happen more often.

As someone not too familiar with GC design, this seems like an absurd hack.
That this 2-minute hardcoded limitation is not even configurable comes across
as amateurish even. I have no experience with Go -- do people simply live with
this and not talk about it?

~~~
_ph_
With recent Go releases, GC pauses have become neglible for most applications.
So this should not get into your way. However, it can easily tweaked, if
needed. There is runtime.ForceGCPeriod, which is a pointer to the
forcegcperiod variable. A Go program, which _really_ needs to change this, can
do it, but most programs shouldn't require this.

Also, it is almost trivial to edit the Go sources (they are included in the
distribution) and rebuild it, which usually takes just a minute. So Go is
really suited for your own experiments - especially, as Go is implemented in
Go.

~~~
calcifer
> especially, as Go is implemented in Go.

Well, parts of it. You can't implement "make" or "new" in Go yourself, for
example.

~~~
_ph_
You have to distinguish between the features available to a Go program as the
user writes it and the implementation of the language. The immplementation is
completely written in Go (plus a bit of low-level assembly). Even if the
internals of e.g. the GC are not visible to a Go program, the GC itself is
implemented in Go and thus easily readeable and hackeable for experienced Go
programmers. And you can quickly rebuild the whole Go stack.

~~~
calcifer
> You have to distinguish between the features available to a Go program as
> the user writes it and the implementation of the language.

I do, I'm just objecting to "Go is implemented in Go".

~~~
dodobirdlord
This reminds me of the ongoing saga of RUSTC_BOOTSTRAP[0][1]

The stable compiler is permitted to use unstable features in stable builds,
but only for compiling the compiler. In essence, there are some Rust features
that are supported by the compiler but only permitted to be used by the
compiler. Unsurprisingly, various non-compiler users of Rust have decided that
they want those features and begun setting the RUSTC_BOOTSTRAP envvar to build
things other than the compiler, prompting consternation from the compiler
team.

[0] [https://github.com/rust-lang/cargo/issues/6627](https://github.com/rust-
lang/cargo/issues/6627) [1] [https://github.com/rust-
lang/cargo/issues/7088](https://github.com/rust-lang/cargo/issues/7088)

~~~
estebank
This is not entirely correct. These things that "can only be used by the
compiler" are nightly features that haven't been stabilized _yet_. _Some_ of
them _might_ never be stabilized, but you could always use them in a nightly
conpiler, stability assurances just fly out the window then. This is also why
using that environment variable is _highly_ discouraged: it breaks the
stability guarantees of the language and you're effectively using a pinned
nightly. This is reasonable only in a _very_ small handful of cases.

~~~
giornogiovanna
Yep. Beyond that, there is at least one place[0] where the standard library
uses undefined behavior "based on its privileged knowledge of rustc
internals".

[0]: [https://doc.rust-lang.org/src/std/io/mod.rs.html#379](https://doc.rust-
lang.org/src/std/io/mod.rs.html#379)

------
tiffanyh
It should also be noted that Rust interoperates extremely well with Erlang,
which is the basis of Discord (via Rustler).

[https://github.com/rusterlium/rustler](https://github.com/rusterlium/rustler)

[https://blog.discordapp.com/scaling-
elixir-f9b8e1e7c29b](https://blog.discordapp.com/scaling-elixir-f9b8e1e7c29b)

------
_bxg1
It's always good to see a case-study/anecdote, but nothing in here is
surprising. It also doesn't really invalidate Go in any way.

Rust is faster than Go. People use Go, like any other technology, when the
tradeoffs between developer iteration/throughput/latency/etc. make sense. When
those cease to make sense, a hot path gets converted down to something more
efficient. This is the natural way of things.

~~~
kerkeslager
> It's always good to see a case-study/anecdote, but nothing in here is
> surprising. It also doesn't really invalidate Go in any way.

Well, sure, because categorizing languages as "valid/invalid" doesn't make any
sense.

But it does show _yet another_ example of how designing a language to solve
Google's fairly-unique problems doesn't result in a general-purpose language
suitable for solving most people's problems.

~~~
kikimora
Long GC pauses caused by large collections/caches are decade long problem with
no real wide spread solution so far. With Java and .NET you can resort to off-
heap data. Not sure if this is possible with Go.

~~~
dnautics
Erlang's basically solved it (and, arguably, solved it decades ago); relevant
as discord uses erlang VM in places.

~~~
bsder
Erlang "solved" the problem by breaking having lots of little heaps so a GC
can blast through the entire heap extremely quickly.

Would that actually work in this instance? It seems like that LRU cache
they're talking about is kind of large.

~~~
kerkeslager
> Would that actually work in this instance? It seems like that LRU cache
> they're talking about is kind of large.

I can't say for sure without knowing what the contents of that heap is, but I
suspect that yes, it would work.

However, the reason the heaps are so small is that they're each a lightweight
thread, and in Erlang, spinning up new threads is a way of life. It would be
hard to overstate what a fundamentally different architecture this is.

------
kardianos
I'm glad they found a good solution (rust) to solve their problem!

Also note this was with Go1.9. I know GC work was ongoing during that time, I
wonder if this time of situation would still happen?

~~~
faitswulff
From /u/DiscordJesse on reddit:

> We tried upgrading a few times. 1.8, 1.9, and 1.10. None of it helped. We
> made this change in May 2019. Just getting around to the blog post now since
> we've been busy.

[https://www.reddit.com/r/programming/comments/eyuebc/why_dis...](https://www.reddit.com/r/programming/comments/eyuebc/why_discord_is_switching_from_go_to_rust/fgjsjxd/)

~~~
cesarb
Another interesting comment in the same reddit thread, from /u/brian-discord
([https://old.reddit.com/r/programming/comments/eyuebc/why_dis...](https://old.reddit.com/r/programming/comments/eyuebc/why_discord_is_switching_from_go_to_rust/fgk56y4/)):

> Another Discord engineer chiming in here. I worked on trying to fix these
> spikes on the Go service for a couple weeks. We did indeed try moving up the
> latest Go at the time (1.10) but this had no effect.

> For a more detailed explanation, it helps to understand what is going on
> here. It is not the increased CPU utilization that causes the latency.
> Rather, it's because Go is pausing the entire world for the length of the
> latency spike. During this time, Go has completely suspended all goroutines
> which prevents them from doing any work, which appears as latency in
> requests.

> The specific cause of this seems to be because we used a large free-list
> like structure, a very long linked list. The head of the list is maintained
> as a variable, which means that Go's mark phase must start scanning from the
> head and then pointer chase its way through the list. For whatever reason,
> Go does (did?) this section in a single-threaded manner with a global lock
> held. As a result, everything must wait until this extremely long pointer
> chase occurs.

> It's possible that 1.12 does fix this, but we had tried upgrading a few
> times already on releases that promised GC fixes and never saw a fix to this
> issue. I feel the team made a pragmatic choice to divest from Go after
> giving the language a good attempt at salvaging the project.

~~~
earthboundkid
Ugh, linked lists fuck everything up. It’s never the right data structure. Use
a vector!

~~~
anarazel
That's only true if you actually need to access more than a few elements at
once. And if you never need to insert/delete to/from anywhere but the end.

~~~
earthboundkid
Even if you need middle inserts but not a B-tree (weird), it’s still better to
use a vector in most cases. Time to find the insertion point will dominate.

------
correct_horse
I've heard lots of hot takes on "what Go really is". Here's mine.

Go is what would have happened if Bell Labs wrote Java.

~~~
kick
Minor nitpick: That already happened, Limbo is what happened when Bell Labs
wrote Java.

~~~
correct_horse
Huh. I managed to hear about Inferno, but not remember the Limbo part.

In that case, Go is Bell Labs' second attempt at Java.

~~~
kick
Third, there was also a language I can't remember the name of that happened at
the same time as Alef.

~~~
steveklabnik
Newsqueak?

~~~
kick
That may have been what I was thinking of (or Squeak, for that matter, if my
sense of time was off), but I'm not sure!

------
yippir
I chose Rust over Go after weighing the pros and cons. It was an easy
decision. I wouldn't consider using a high level language that lacks generics.
The entire point of using a high level language is writing less code.

~~~
shdh
The syntax looks pedantic to me. Going to require some adjusting.

------
mperham
Better title: "One Discord microservice with extremely high traffic is moving
to Rust"

~~~
jhgg
This is one of multiple, we did not blog about this one, but switching a
Python http service for analytics ingest that was purely CPU bound to rust
resulted in a 90% reduction in compute required to power it. However, that's
not too interesting because it's known that Python is slow haha.

We have 2 golang services left, one of them has a rewrite in rust in PR as of
last week (as a fun side project an engineer wanted to try out.)

Additionally, as we move towards a more SOA internally, we plan to write more
high velocity data services, and rust will be our language of choice for that.

~~~
onebot
Think replacing elixir with Rust would ever be a consideration? Rust isn't
there yet, but if you are NIF'ing a bunch of stuff, seems like it could make
sense at some point?

~~~
meowface
I Googled around, but couldn't find the answer. What is "NIF"?

~~~
dnautics
[https://erlang.org/doc/man/erl_nif.html](https://erlang.org/doc/man/erl_nif.html)

Short: ffi for erlang.

------
unlinked_dll
It'd be cool to look at more signal statistics from the CPU plot.

It appears that Go has a lower CPU floor, but it's killed by the GC spikes,
presumably due to the large cache mentioned by the author.

This is interesting to me. It suggests that Rust is better at scale than Go,
and I would have thought with Go's mature concurrency model and implementation
would have been optimized for such cases while Rust would shine in smaller
services with CPU bound problems.

Great post!

~~~
arnsholt
My first guess for the slightly higher CPU floor of the Rust version is that
the Rust code has to do slightly more work per request, since it will free
memory as it gets dropped, whereas the Go code doesn't do any freeing per
request, but then gets hit with the periodic spike every two minutes where the
entire heap has to be traversed for GC.

~~~
jhgg
tokio 0.1 was definitely less efficient, when we compare go to 0.2, tokio uses
less cpu consistently, even when compared to a cluster of the same size almost
a year later with our growth over the time since we switched over.

------
fmakunbound
Why /does/ it run a GC every 2 minutes? I went looking and didnd't find a
reason in the code...

[https://github.com/golang/go/search?q=forcegcperiod&unscoped...](https://github.com/golang/go/search?q=forcegcperiod&unscoped_q=forcegcperiod)

Go's GC seems kind of primitive.

------
Thaxll
Really interesting post, however they're using a 2+years old runtime, Go 1.9.2
was released 2017/10/25 why did they not even try Go 1.13?

For me the interesting part is that their new implementation in Rust with a
new data structure is less than 2x faster than an implementation in Go using a
2+years old runtime.

It shows how fast Go is vs an very optimized language + new data structure
with no GC.

Overall I'm pretty sure there was a way to make the spikes go away.

Still great post.

~~~
yazaddaruvala
The graphs were in different units. The final Rust version was over 100x
faster.

~~~
Thaxll
Which doesn't make any sense. Rust is not x100 faster than Go.

~~~
yazaddaruvala
Rust and Go likely translate into similar enough assembly for similar code to
make the performance close enough.

However, bigger caches will always have more cache hits than smaller caches.
Therefore could easily be 100x faster.

The blog does a better job explaining everything than I can but simply put the
“granular” memory management Rust allows gave them an improved ability to
create a bigger cache. Go (at the time) while great did not work well for that
particular usecase and required smaller cache sizes.

------
eric-hu
I'm curious what the product-engineering landscape in the company looks like
to allow for a language rewrite to happen. I feel like this would be a hard
sell in all companies I've worked at. Was this framed as a big bug fix? Or was
faster performance framed as a feature?

~~~
modo_mario
I think they're at a scale now where the cost of running it starts to become
important as well. At least when we're talking about big performance increases
like this.

------
meirelles
The Twitch folks were facing a related situation with the GC. They developed a
workaround that they called Ballast, reducing the overall latency and making
it more predictable. Quite impressive results [0].

The Go's GC is groundbreaking in several aspects, but probably needs to
provide ways to fine-tune it. Posts like this make me believe that one-size-
fits-all settings are yet to be seen.

[0]: [https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-
how-i...](https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i-learnt-
to-stop-worrying-and-love-the-heap-26c2462549a2/)

------
johnmc408
Non programmer here, but would it make sense to add a keyword (or flag) to Go
to manually allocate a piece of memory (ie not use GC). That way, for some use
cases, you could use avoid GC for the critical path. Then when GC happened, it
could be very fast as there would be far less to pause-and-scan (in this use
case example). Obviously this would have to be optional and discouraged...but
there seems to be no way to write an intensive real-time app with a GC based
language. (again non-programmer that is writing this to learn more ;-)

~~~
oconnor663
There are two things you'd have to do at the same time that make this
complicated:

\- You'd have to ensure that your large data structure gets allocated entirely
within the special region. That's simple enough if all you have is a big
array, but it gets more complicated if you've got something like a map of
strings. Each map cell and each string would need to get allocated in the
special region, and all of the types involved would need new APIs to make that
happen.

\- You'd have to ensure that data structures in your special region never hold
references to anything outside. Since the whole point of the region is that
the GC doesn't scan it, nothing in the region will be able to keep anything
outside the region alive. Any external references could easily become dangling
pointers to freed memory, which is the sort of security vulnerability that GC
itself was designed to prevent.

All of this is doable in theory, but it's sufficiently difficult, and it comes
with sufficiently many downsides, that it makes more sense for a project with
these performance needs to just use C or Rust or something.

~~~
zozbot234
> Since the whole point of the region is that the GC doesn't scan it, nothing
> in the region will be able to keep anything outside the region alive.

You can treat external references as GC roots.

~~~
int_19h
How do you know that they exist, if you're not scanning that memory?

~~~
zozbot234
The data structure code can take care of this by registering GC roots with the
garbage collector (and de-registering them if an external reference changes).
It's no different in principle than any other smart pointer.

------
kerkeslager
Go is not a general-purpose language. It's a Google language designed to solve
Google's problems. If you aren't Google, you probably have different problems,
which Go isn't intended to solve.

EDIT: Currently at -4 downvotes. Would downvoters care to discuss their votes?

~~~
cmrdporcupine
As a Googler, I don't consider this accurate. I've been here 8 years and have
yet to work on a Go code base. Yes, there are projects in Go. Certainly not a
majority, nor even a significant minority, honestly.

No, I wouldn't say Go is specific to Google's problems, though I'm sure some
of the engineers had them in mind. I see Go used far more outside of Google
than in.

~~~
takeda
Isn't that indication of a failure? It seems like Go aimed to replace Python
and Java code at Google.

~~~
cmrdporcupine
My impression (and this was pre-Google and I haven't paid attention since I
got here, so) is that it was Rob Pike's and Ken Thompson's project coming out
of their long experience with Plan 9 and Inferno/Limbo. That it happened to
meet some requirements for some Google projects -- I'm sure that might have
been an intent. But that feels a bit like an explanation after the fact, since
Go very obviously shows the biases and philosophy from the projects that the
original authors had in their previous work.

------
justadudeama
> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize
> memory usage.

Can someone explain to me how BTreeMap is more memory efficient than a
HashMap?

~~~
afranchuk
A BTreeMap should typically have O(n) memory usage, whereas a HashMap
(depending on load factor) will usually have O(kn) memory usage, where k > 1\.
This is because a HashMap allocates the table into which it will store hashed
values upfront (and when the load is too great), so it can't anticipate how
many values may be added nor what sorts of collisions may occur at this time.
Yes, collisions are typically stored as some allocate-per-item collection, but
the desire of a HashMap is to avoid such collisions. A BTreeMap allocates for
each new value.

Note that this explanation is a bit handwavy, as both data structures have
numerous optimizations in production scenarios.

~~~
nybble41
There is no difference between O(n) and O(kn), if k is a constant. The
notation deliberately ignores constant factors. (That's why you can say a
BTreeMap requires O(n) memory independent of the size or type of data being
stored, provided there is _some_ finite upper bound on the sizes of the keys
and values.)

~~~
afranchuk
Yeah I know, it was just the fastest way to indicate that the constant factor
was almost definitely larger for HashMaps. But thank you for clarifying!

------
reggieband
When I see this kind of GC performance, I wonder why you wouldn't change the
implementation to use some sort of pool allocator. I am guessing each Read
State object is identical to one another (e.g. some kind of struct) so why not
pre-allocate your memory budget of objects and just keep an unused list
outside of your HasMap? In a way this is even closer to a ring where upon
ejection you could write the object to disk (or Cassandra), re-initialise the
memory and then reuse the object for the new entry.

I suppose that won't stop the GC from scanning the memory though ... so maybe
they had something akin to that. I assume that a company associated with games
and with some former games programmers would have thought to use pool
allocators. Honestly, if that strategy didn't work then I would be a bit
frustrated with Go.

I have to say, out of all of the non-stop spamming of Rust I see on this site
- this is definitely the first time I've thought to myself that this is a very
appropriate use of the language. This kind of simple yet high-throughput
workhorse of a system is a great match for Rust.

~~~
monocasa
Yeah, they already weren't allocating, it was a GC pause that just scanned and
would come up with essentially no extra garbage every two minutes.

~~~
azakai
A pool allocator could have reduced the number of existing allocations (1 big
one instead of many small ones), making those spikes less significant. (But
that depends on how Go handles interior pointers and GC, so I'm not sure.)

~~~
runevault
Allocations weren't the problem. It was the fact that, every 2 minutes, the GC
would trigger because of an arbitrary decision by the Go team and scan their
entire heap, find little to nothing to deallocate, then go on its merry way.

------
geodel
Makes sense write most efficient stuff for in-house and give resource hog
Electron apps to users.

~~~
jrockway
Discord pays for their servers, but not for their users's computers.

~~~
Hamuko
That's fine as long as you ignore the fact that the users are the customers.

------
archi42
Uhm, I'd suppose the service runs on one or more dedicated nodes - so there
should be no competition for RAM (or if a node runs multiple services, the I'd
expect a fixed memory amount to be available). In such an environment, each
fixed size LRU cache could just allocate a huge chunk of RAM for data +
indices (index size is bound by data size). That's nothing to do with the
ownership model, it's just manually managed memory.

Yes, reality is more complex since they probably have multi socket
servers/NUMA, which might add memory access latencies and atomic updates to
the LRU might require a locking scheme, which also isn't trivial (and where
async Rust might be useful).

------
stiray
And brings me back to my years old nag. "Ok, you got GC, fine. But DO give me
option to hand free specific memory when I want to. I don't consider hand
allocation and deallocation such a pain than GC going wild."

This doesn't only go for Go.

------
karma_daemon
I wish the article would show a graph of the golang heap usage. I'm reminded
of this cloudflare article [0] from a while back where they created an example
that seemed to exhibit similar performance issues when they created many small
objects to be garbaged collected. They solved it by using a pooled allocator
instead of relying solely on the GC. Wonder if that would have been applicable
here to the go version.

[0] [https://blog.cloudflare.com/recycling-memory-buffers-in-
go/](https://blog.cloudflare.com/recycling-memory-buffers-in-go/)

------
mangatmodi
Why would they switch to rust, rather than upgrading from 3 years old version?

~~~
jhgg
This blog post perhaps is a bit "after the fact" we had made the switch over
mid 2019, and wanted to try out rust as well for services like this, due to
adoption elsewhere in the company. Also, after upgrading between 4 golang
versions on this service and noticing it didn't materially change performance,
we decided to just spend our time on the rewrite (for fun, and latency) and to
get a head start into the asynchronous rust ecosystem.

This blog post kinda internally matches our upgrade to std::futures and tokio
0.2, away from futures 0.1.

~~~
The_rationalist
Out of curiosity, why didn't you choose Kotlin? It can reuse the Java
ecosystem which allow you to save tons of money, and give you advanced
features and scalability. It is a sexier and more ergonomic language too. And
with e.g ZGC, you can have a GC that is fine tunable, and that has very low
latency.

By choosing rust you will suffer a great deal of the limitations of it's poor,
not production ready, ecosystem. I'm not even talking about the immaturity of
the async await support.

~~~
therockhead
> By choosing rust you will suffer a great deal of the limitations of it's
> poor, not production ready, ecosystem.

Why do you think that? Seems like Rust is a great choice for this type of high
performance work.

~~~
ncmncm
Rust does best when the number of lines of code that must be parsed in an
edit-compile-test loop is small. When the sources that must be parsed get
large, coders suffer.

It is doubtful that this will improve, much, without breaking changes to the
language. The range of code over which type inference operates, or at least
programmers' reliance on it, would need to contract by quite a lot. There
would be Complaints.

~~~
steveklabnik
Type inference only operates within function bodies. It's also not the thing
that causes compilation to be slow.

------
StreamBright
Pretty amazing write up from Jesse. I really like how they maxed out Go first
before even thinking about a rewrite in Rust. It turns out no-GC has pretty
significant advantages in some cases.

~~~
Rapzid
Unsafe or it doesn't count ;)

------
yobert
Not to participate in the flaming-- but I'd love to hear some stats about
compile times for the two versions of the service. (Excellent write-up by the
way! Thanks!)

------
dfee
The one problem I’m curious as to how channel-based chat applications solve,
to which my google-fu has never lead me in the right direction: how do you
handle subscriptions?

I imagine a bunch of front end servers managing open web sockets connections,
and also proving filtering/routing of newly published messages. Alas, it’s
probably best categorized as a multicast-to-server, multicast-to-user problem.

Anyways, if there’s an elegant solution to this problem, would love to learn
more.

~~~
Pfhreak
Not sure if this is exactly what you are looking for, but I'd do some digging
into consistent hash rings.

~~~
dfee
Oh, interesting:
[https://en.m.wikipedia.org/wiki/Consistent_hashing](https://en.m.wikipedia.org/wiki/Consistent_hashing)

> Consistent hashing maps objects to the same cache machine, as far as
> possible. It means when a cache machine is added, it takes its share of
> objects from all the other cache machines and when it is removed, its
> objects are shared among the remaining machines.

I guess the challenge here is that subscriptions are sparse: I.e. one ws
connection can carry multiple channel subscriptions, thus undermining the
consistent hash.

~~~
Pfhreak
There's a number of ways to tweak the algorithm, e.g. by generating multiple
hashes per endpoint and then distributing them around a unit circle.

I've seen this used to consistently allocate customers to a particular set of
servers, not just ensure you are hitting the right cache. It doesn't fully
solve the subscription issue where multiple people are in multiple channels,
but it could probably be used as a building block there.

------
willvarfar
This is a bit late to add, but from the description of the problem in the
article, the way to make the program faster, irregardless of language, is to
use a big array rather than lists and trees. Carve the array up as necessary,
so the array of users to offsets in the array where the data is. Basically, be
your own memory allocator, with all the loss of safety but the order of
magnitude improvement in efficiency that that brings.

------
thedance
These kinds of posts would be much more interesting if they discussed
alternatives considered and rejected. For example why did they choose Rust
over C++?

~~~
therockhead
The article mentioned that they have already used Rust successfully in house,
so when you consider that Rust is inherently safer than C++, it seems like
they picked the right language.

------
luord
Usually this kind of article is about "migrating from massively popular
language to more niche language that we like better".

This is more from niche to niche. Thought that was interesting, but yet the
discussion here wasn't all that different to the usual. Guess it's flamewars
always, regardless of popularity.

------
mister_hn
Why not C++, if performance was an issue?

~~~
loeg
Why would you pick C++ for a new codebase in 2019 or 2020 if Rust met your
needs?

~~~
nuclx
Compilation times.

~~~
Narishma
In my experience, C++ is slower to compile than Rust.

------
viraptor
The next step I expected after LRU tunning was to do simple sharding per user,
so that there are more services with smaller caches, (cancelling out the
impact) with smaller GC spikes, offset in time from each other. I'm curious if
that was considered and not done for some reason.

------
woah
Switching to Rust is a good idea, but I was wondering- would it be possible to
run two identical instances in parallel and return results from the fastest
one? This would almost completely eliminate GC pauses from the final output.

------
highfrequency
Curious about their definition of “response time” in the graph at the end.
They’re quoting ~20 microseconds so I assume this doesn’t involve network
hops? Is this just the CPU time it takes a Read State server to do one update?

~~~
jhgg
Correct. This is internal time it takes to process the message. Since once a
node is "warm" thanks to their large caches, it's mostly in memory operations
and queueing for persistence which happens in the background.

~~~
Sikul
Also worth noting: Most requests to the service have to update many Read
States. For instance, when you @everyone in the Minecraft server we have to
update over 500,000 Read States.

------
mc3
More accurately "Why Discord is switching a service from Go to Rust"

------
dennisgorelik
> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize
> memory usage.

Why would BTreeMap be faster than HashMap? HashMap performance is O(1), while
BTreeMap performance is O(log N).

~~~
scott_s
This subthread explains why it's more _memory efficient_ to use a tree-based
structure:
[https://news.ycombinator.com/item?id=22239393](https://news.ycombinator.com/item?id=22239393).
Short version is that in order to get good performance out of a hashtable
based structure, you want to have _more_ than _n_ slots in order to achieve
good performance.

Which brings me to my second point: hashtable based data structures are not
worst-case _O(1)_. They are worst-case _O(n)_ , because in the worst case, you
will either have to scan every entry in your table (open addressing) or walk a
list of size _n_ (separate chaining). Of course, good hashtable
implementations will not allow a situation with so many collisions, but in
order to avoid that, they will need to allocate a new table and copy over the
contents of the old, which is also a _O(n)_ operation.

Given two kinds of data structures, one which is average-case _O(1)_ , but
worst-case _O(n)_ versus best- and worst-case _O(log n)_ , which one you
choose depends on what kinds of performance you're optimizing for, and how bad
the constants are that we've been ignoring. If you care more about throughput,
then you usually want average-case _O(1)_ , as the occasional latency spikes
aren't important to you. But if you care more about latency, then you'll
probably want to choose worst-case _O(log n)_ , assuming that its
implementations constants aren't too bad.

~~~
Jweb_Guru
Cuckoo hashmaps are worst case O(1) when implemented correctly, up to resizing
(however, they do need more space and perform worse in virtually all real
benchmarks).

------
deepsun
Wait, isn't Go devs said they solved GC latency problems [1]?

(from 2015): "Go is building a garbage collector (GC) not only for 2015 but
for 2025 and beyond: A GC that supports today’s software development and
scales along with new software and hardware throughout the next decade. Such a
future has no place for stop-the-world GC pauses, which have been an
impediment to broader uses of safe and secure languages such as Go." [2]

[1]
[https://www.youtube.com/watch?v=aiv1JOfMjm0](https://www.youtube.com/watch?v=aiv1JOfMjm0)

[2] [https://blog.golang.org/go15gc](https://blog.golang.org/go15gc)

~~~
terminaljunkid
That seems to be written by some Manager with slight clue of tech, tbh.

------
Fire-Dragon-DoL
I run the same question here: can't a memory pool be used in this case?

In gaming industry there are similar problems with GC and they were solved
with memory pools

------
nemo1618
I wonder if it would be feasible to rewrite the LRU cache (either fully or in
part) in a way that does not require the GC to scan the entire cache.

~~~
kerkeslager
Yes, it's possible: that's generational garbage collection. But last I heard,
Google decided writing a modern GC was too complicated.

They're probably right, because Google doesn't need it. But for everyone else
who decided to use a language designed to solve Google's fairly-unique
problems as if it were a general-purpose language: that kind of sucks, doesn't
it?

~~~
terminaljunkid
The fact seems to be that the go team is not so well funded as it seems. Go is
not Google's language in the sense C# is MS' language or Java was Sun
language.

------
musicale
1st Law of Garbage Collection: Consistent speed and efficiency usually
requires circumventing the garbage collector.

------
arjunbajaj
Question for the Discord team: Was implementing the same service in Elixir an
option? Did you try it/why not?

~~~
robocat
Discord also use Elixir - there are comments elsewhere above for why Elixir
might be a bad choice in this case.

~~~
arjunbajaj
Thanks!

------
donatj
I feel like from the definition of the service, the entire thing could easily
be replaced with a Redis cluster.

~~~
Sikul
We originally cached this data with a Redis cluster but we hit scaling issues.
The Read States service only exists because Redis had issues.

~~~
donatj
Hah, well now I feel like a dufus. Good info!

~~~
Sikul
No worries, we could have mentioned that in the post as part of the service
history :)

------
dis-sys
I believe the problem described in this blog has been at least partially
addressed in the Go 1.12 release.

------
sayusasugi
Great, can the client be ported to Rust while you're at it? Electron is such a
joke.

------
raverbashing
> but because the garbage collector needed to scan the entire LRU cache in
> order to determine if the memory was truly free from references

Yeah please tell me again how GC is a superior solution to reference counting
in cases when you know exactly when you don't need the object anymore.

(Hint: RC is not GC if the object is dealocating itself)

------
brylie
What are some recommended resources for a gentle introduction to Rust?

~~~
fatbird
I read the Rust Programming Language book over Christmas and it's a very good
introduction to it, probably one of the best I've seen for any language. It's
got a good voice, and it's very good about putting enough context around Rust
design decisions to understand the why as well as the how. But's it's not so
long that it feels like a slog.

~~~
brylie
Link, for convenience:

[https://doc.rust-lang.org/book/](https://doc.rust-lang.org/book/)

------
crazypython
In D, you may explicitly delete memory while having a GC.

------
pmarreck
Is the Discord server-side still coded in Elixir?

------
joseluisq
That's why the {blazing-fast} term is becoming popular.

Rust won again.

------
pkolaczk
This is consistent with my observations of porting Java code to Rust. Much
simpler and nicer to read safe Rust code (no unsafe tricks) compiles to
programs that outperform carefully tuned Java code.

~~~
dang
We detached this subthread from
[https://news.ycombinator.com/item?id=22240978](https://news.ycombinator.com/item?id=22240978).

------
FisherGuy44
This is not a fair comparison. Go 1.9.2 was released over 2 years ago. In that
time they have fixed a lot of the GC stutter issues. Comparing rust nightly to
a 2 year old compiler is unfair.

------
moneywoes
Another blow for Google.

------
buboard
maybe next year : why discord is switching to C

------
adamnemecek
Rust is maturing. I legit don't think there are too many good reasons to use
Go over Rust. You can call Rust from Go but not vice versa.

~~~
steveklabnik
(You can call Go from Rust: [https://blog.arranfrance.com/post/cgo-sqip-
rust/](https://blog.arranfrance.com/post/cgo-sqip-rust/) )

------
LaserToy
I’m sorry, but isn’t it cashing 101 ? Do not keep long living objects in GC
managed memory. And there are ways to do it in both go and even java.

------
jaten
just use an off heap hash table. simple.
[https://github.com/glycerine/offheap](https://github.com/glycerine/offheap)

Also, as others have said, lots of big GC improvements were ignored by
insisting on go1.9.2 and not the latest.

~~~
favorited
The graphs are from 1.9.2, but the author said they tried 1.8, 1.9, and 1.10
and saw the same thing.

------
nottorp
Can someone wake me up when they switch from javascript to something native in
the _client_?

I just checked and as usually, I have an entry labeled "Discord Helper (Not
Responding)" in my process list. I don't think i've ever seen it in a normal
state.

~~~
zlynx
That is kind of bad Windows programming but easy to do when writing an app
that doesn't need to handle Windows event messages. It probably sits in a loop
waiting on socket events and doesn't care if you sent it a WM_QUIT or not. It
would be easy to pump the message loop and ignore all, but why bother?

~~~
nottorp
Lol it's a javascript thing that instantiates a copy of Chrome, not a Windows
program. I doubt they know what a WM_QUIT is...

------
blazespin
Confused, aren't they losing memory safety?

I get for certain core code situations, you want to manage all memory safety
yourself (or use built in static GC), but beyond that it seems to me at a
higher level you'd rather have the automatic GC. Why burden all of your
developers rather than just a core few?

I don't think GC issues is a compelling argument to move everything to Rust.
I'm not saying there aren't compelling arguments, but that just seems a bit
odd that that's their main argument.

~~~
buzzerbetrayed
I’ve never heard the argument that moving to rust reduces memory safety. Isn’t
memory safety what rust is known for?

~~~
Matthias247
It is! But in Rust you still have an escape hatch in the form of the `unsafe`
annotation which allows for mistakes which break memory safety. I don't think
Go has something like that, unless you use the FFI. So saying that Go is at
least as memory safe as Rust might not be too wrong of a statement.

However I think in total Rust is safer. E.g. Rust prevents a ton of race
conditions in multithreaded code, which Go can not do.

~~~
Jweb_Guru
Go has data races on multiple cores in safe code, without using any unsafe
intrinsics or C FFI.

------
blackrock
Would it have been better if they went with Elixir?

Write their code in a functional style. Get the benefits of the Erlang BEAM
platform.

Their system runs over the web, so time sensitivity isn’t as important, in
comparison to video games, VR, or AR.

Anyone ever done a performance comparison breakdown between something like
Elixir vs. Rust?

~~~
jerf
"Would it have been better if they went with Elixir?"

No. It would have been unshippably bad. BEAM is generally fairly slow. It was
fast at multitasking for a while, but that advantage has been claimed by
several other runtimes in 2020. As a language, it is much slower than Rust.
Plus, if you tried to implement a gigantic shared cache map in Erlang/Elixir,
you'd have two major problems: One is that you'd need huge chunks of the map
in single (BEAM) processes, and you'd get hit by the fact BEAM is not set up
to GC well in that case. It wants lots of little processes, not a small number
of processes holding tons of data. Second is that you'd be trading what in
Rust is "accept some bytes, do some hashing, look some stuff up in memory"
with generally efficient, low-copy operations, with "copy the network traffic
into an Erlang binary, do some hashing, compute the PID that actually has the
data, _send a message_ to that PID with the request, _wait for the reply
message_ , and then send out the answer", with a whole lot of layers that
expect to have time to make copies of lots of things. Adding this sort of
coordination into these nominally fast lookups is going to slow this to a
crawl. It's like when people try to benchmark Erlang/Elixir/Go's threading by
creating processes/goroutines to receive two numbers and add them together "in
parallel"; the IPC completely overshadows the tiny amount of work being done.
(They mention tokio, but that's still going to add a lot less coordination
overhead than Erlang messages.)

Go is a significantly better language for this use case than
Elixir/Erlang/BEAM is, let alone Rust.

(This is not a "criticism" of Erlang/Elixir/BEAM. It's an engineering
analysis. Erlang/Elixir/BEAM are still suitable for many tasks, just as people
still use Python for many things despite the fact it would be a
catastrophically bad choice for this _particular_ task. This just isn't one of
the tasks it would be suitable for.)

~~~
hopia
Not to disagree with your analysis of the performance implications, but I
don't think having all that data under a single or a few processes would be
the right architectural pattern to handle this in Elixir.

The article says that the data is basically "per-user", indicating that the
active client connection process could be used to store the data. It already
hosts other data related to the client (connection) anyway. I think updating
and querying it globally would be the trouble in that case.

Another could be storing the data in mnesia, BEAM's internal mutable in-memory
DB. Probably better, but still not ideal to solve this.

Anyway, you're right in that no matter how you'd try to solve this problem on
pure Elixir you'd still be seeing some bottlenecks because BEAM just isn't
very well suitable for this kind of problems, hence Rust.

But can you elaborate on what you mean by other platforms catching up with
Elixir's inherent concurrency advantages? Which modern platforms give similar
features?

~~~
jerf
"The article says that the data is basically "per-user","

Given that this is a table of who is "online", I don't think that's per-user
in the sense that you are inferring. I infer that it's not a whole bunch of
little local data that doesn't interact, it's a big global table of who is
online and not online, constantly being heavily read from and written to in
real time. Consider from the perspective of Bob's Erlang process that he wants
to go offline and notify all of his currently-online friends that is is going
offline. Bob's Erlang process doesn't have that data. Bob's Erlang process is
going to get it from the Big Table of Who's Online. That table is the problem;
it can't be stored in Bob's Erlang process.

I was at least imagining that the table could be partitioned into pieces
pretty trivially (first X bits of the hash), but with Erlang's design, that
implies an IPC just to ask some server process to give me the PID of the chunk
I need to talk to, which itself is going to bottleneck. (In practice we'd
probably cheat and use a NIF to do that, but that amounts to an admission that
Erlang can't do this, so....)

At smaller scales you could try to live update Bob's local information as it
changes, but this breaks down in all sorts of ways at scales far smaller than
Discord, scales much closer to "a single mid-sized company".

"Another could be storing the data in mnesia, BEAM's internal mutable in-
memory DB."

I have used mnesia for loads _literally_ a ten-thousandth as small as this,
_if that_ (I could probably tack two more zeros on there), and it breaks down.
It is an absolutely ludicrous idea that mnesia could handle what Discord is
doing here. Last I knew the official Erlang community consensus was basically
that mnesia really shouldn't be used for anything serious; my experience
backed that up.

I think a non-trivial part of the reason why Erlang hasn't taken off is that
its community still seems to exist in 2003, where it's a really incredible
unique language that solves huge problems that nobody else does. In 2003, it
rather has a point. But a lot of things have learned from Erlang, and
incorporated its lessons into newer designs, and moved on.

See my other comment for what other runtimes have Erlang's advantages, but I'd
invite you just to consider what we seem to basically agree on here; Erlang
would be wildly slower and require a lot more hardware than Rust, the Rust
code probably wasn't that hard to write, ... and the Rust code is _way_ more
likely to be correct than the Erlang code, too. I mean, what more "catching up
to Elixir's inherent concurrency advantages" _in this context_ than "did a job
Elixir couldn't possibly do" do you want?

~~~
hopia
Yeah the scale is what makes this problem a problem here. I've done exactly
that "online" stuff per user process and it works fine on a small scale, even
when it needs to be globally inferred. But I suspect it'd quickly become the
bottleneck when scaling.

I had no idea mnesia was that fragile though, what gives? What kind of issues
did you encounter with it? What do you use now to solve those issues with
Erlang/Elixir?

Sure, we all know Erlang doesn't shine in computationally intensive workloads.
Obviously, Rust was the right call here. But stateful distributed soft real-
time concurrency, can you really say with a straight face that Rust comes with
all the same features as BEAM out-of-the-box? Or any other modern platform for
that matter. I've yet to see Erlang/Elixir beaten in that particular niche.

~~~
jerf
"I had no idea mnesia was that fragile though, what gives? What kind of issues
did you encounter with it? What do you use now to solve those issues with
Erlang/Elixir?"

I had ~10,000 devices in the field with unique identifiers creating long-term,
persistent connections to a central cluster. An mnesia table stored basically
$PERSISTENT_ID -> PID they are connected to. It needed to be updated when they
connected and disconnected, which let me emphasize was a relatively rare
occurrence; the ideal system would be connected for days at a time, not
connecting & disconnecting dozens of times a minute. At most, reconnection
flurries might occasionally occur where they'd all try to connect over the
course of a few minutes (they had backoff code built in) if the cluster was
down for some reason.

Mnesia fell over. A lot. All I could find online as an explanation was
basically "yeah, don't do that with mnesia". Bizarrely, it wasn't the
connection flurries that did it, either... it was the normal "maybe a few
dozen events per second" that tended to do it. Erlang itself was usually fine.
(Although for machines right next to each other in a rack, I did lose the
clustering more often than I'd like, and have to hit the REPL to re-associated
nodes together. Much less often than mnesia corrupted itself, though.)

"can you really say with a straight face that Rust comes with all the same
features as BEAM out-of-the-box?"

Well, that's another way of looking at what I was trying to say. That's the
_wrong question_. Rust doesn't need "all the same features as BEAM". Rust
needs "the features necessary to do the work". While the Erlang community is
looking for a language that has "all the same features as BEAM" and smugly
congratulating themselves that no other language seems to have cracked that
yet, a number of languages are passing them by by implementing _different_
features. Many of those languages, as I said, are informed by Erlang. Many of
these new languages are choosing their "not exactly like Erlang" features in
knowledge, not ignorance, as I think the Erlang community thinks.

Besides, Erlang builds in a lot of things that can be libraries in other
languages. I built the replacement in Go. Mostly because it was hard to get
people who wanted to work in Erlang but despite the rage on HN anytime Go
comes up, getting people who are willing to work in Go was trivial even 5
years ago. ( _Hiring_ someone who knows Go already is still a bit of a
challenge, but crosstraining someone into it is _easy_.) For the port, I wrote
[https://github.com/thejerf/reign](https://github.com/thejerf/reign) . You
will look at it and go "But Erlang has this and that and the other thing with
its clustering, and your thing doesn't have those things!" And my response is
twofold: First, that some of those things are supported in Go code in other
ways than what you are expecting, and that was not intended to be "Erlang in
Go" but "a library for helping port Erlang programs into Go without
rearchitecting", and second... the resulting cluster has been more reliable
and more performant (we actually cut the cluster from 4 to 2, because now even
a single machine can handle the entire load), and all the "features" reign is
missing, well, maybe they aren't so important out of the context of Erlang. I
suppose in my own way this is another sort of story like Discord's; _on the
metrics I care about_ , my home-grown clustering library worked better for me
than Erlang's clustering code.

(In fact, Go's even got the edge on Erlang for GC _for my use case_ , which is
one of the ways in which the new system is more performant. Now, it happens
that my system is architected on sending around messages that may frequently
be several megabytes in size, and Erlang was really designed for sending
around lots of messages in the kilobyte range. Even as I was using it, Erlang
got a lot better with handling that, but it still was never as good or fast as
Go, and Go's only gotten better since then, too. I was able to do things in Go
for performance to re-use my buffers that are impossible in Erlang.)

So, I mean, while I do deeply respect Erlang for its pioneering position, and
I am particularly grateful for the many years I spent with it back when it was
the only option of its sort (if I had to write the project in question in C++
or something, I just _wouldn 't_ have; do not think I "hate" Erlang or
something, I am very grateful for it), if I am a bit less starry-eyed about it
than some it's because I see it as... just code. It's just code. Erlang gets
no special access to CPU instructions or special Erlang-only hardware that
allows it to do things no other language can. It's just code. Code that can be
and has been written in other languages, in other environments.

I like Erlang in a lot of ways, and respect its place in history. But it's
community is insular, maybe even a bit sick, and I don't really expect that to
change, because once an individual realizes it, they tend to just leave,
leaving behind only the True Believers, who still believe that Erlang is the
unique and special snowflake... that it _was_... 15 years ago.

~~~
hopia
Thanks for the comprehensive reply!

I guess I better experiment more with mnesia before really using it for
anything serious. Or find alternatives. We had Redis before but that
experience turned out just awful so we got rid of it.

As for the community, I think Elixir is where it's at nowadays. There is,
unsurprisingly, a very strong focus on webby stuff with Elixir, and a lot of
the things you would build with it are just _easy_. Like a multi-machine chat
server.

If I started to build a new distributed chat server today, Elixir would still
be the easiest way to go, despite eventually likely not being the most
performant solution out there. Discord likewise seems happy with their choice
for this particular use case, only supplementing it with the likes of Rust for
specific problems in their domain.

I mean you yourself built a lot of the Erlang's/BEAM's logic from the scratch
on Go just to be able to use it there. I'm expecting I'd end up in a similar
alley with Rust/Haskell/take your pick if I was attacking the problems where
Elixir has all the facilities already set up and battle tested.

------
tonyferguson
Wow, Rust is amazing, so fast! It is like these people never learnt c? Why did
they spend all this time trying to optimise such a high level language? Surely
they can afford a more experienced engineer who will tell them that is a path
that isn't worth it? I jump straight to c when there is anything like this,
although I guess Rust is an option these days.

------
fxtentacle
Sounds like badly reinventing the wheel. If you need a large in-memory LRU
cache, use memcached. Problem solved, because then Go doesn't need to allocate
much memory anymore. And I'd wager that JSON serialization for sending a reply
back to the client will dominate CPU load anyway, so that the overhead for Go
to talk to Memcached will be barely noticeable.

------
shanev
When a company switches languages like this, it's usually because the
engineers want to learn something new on the VC's dime. They'll make any
excuse to do it. As many comments here show, there are other ways to solve
this problem.

------
dancemethis1
Well, none of it matters since Discord is hostile software. No language will
solve their privacy-trampling deeds.

------
harikb
> Discord has never been afraid of embracing new technologies that look
> promising.

> Embracing the new async features in Rust nightly is another example of our
> willingness to embrace new, promising technology. As an engineering team, we
> decided it was worth using nightly Rust and we committed to running on
> nightly until async was fully supported on stable.

> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize
> memory usage.

It is always an algorithm change

------
esjeon
I wonder if they actually did their homework. Doesn't matter if they like it,
but they could have avoided rewriting, if they wanted.

The thing is, you can allocate memory outside of Go, and GC will simply ignore
such regions, since GC only scan regions known to it. (Mmap should work like a
charm here.) A drawback is that pointers in such regions will not be counted,
but it's easy to workaround by copying whole data, which is encouraged by the
language itself.

TBH, Go sucks for storing a large amount of data. As you can see here, even
the simplest cache can be problematic. The language is biased towards large
datacenters, where the amount of available resources are less of a concern.
Say, this problem can be solved by having external cache servers and extra
nodes around them. Latency will not be idealistic, but the service will
survive with minimal changes.

------
h2odragon
Excellent write up, and effective argument for Rust in this application and
others. My cynical side sums it up as:

"Go sucked for us because we refused to own our tooling and make a special
allocator for this service. Switching to Rust forced us to do that, and life
got better"

~~~
staticassertion
I'm confused. Build a special allocator for Go you mean? That feels like going
well beyond typical "own your tooling".

~~~
h2odragon
I'm outdated. I used to have 4 different python interpreter builds, for
different purposes, where the modern world would be using lua as a glue
language. I had nothing like the scale, staff, or budget of Discord; all I had
was need and tools that could bend to fill it.

I think this is a great write up of why they chose a different tool. I don't
say it was the wrong decision, they make that argument pretty well too. I'm
still surprised that either Go isn't malleable enough to have bent around the
need, or they didn't feel it worth more effort than parameter tweaking to bend
it so.

------
echopom
This was an extremely interesting read.

I'm quiet disappointed though they did not update their Go Version to
1.13[0][1] which would normally have remove the spike issue and thus he
latency before they move to Rust...

Rust seems more performant with proper usage ( tokio + async ) but I'm more
worried about the ecosystem that doesn't seem has mature has Go.

We could quote the recent[2] Drama with Actix...

[0][https://golang.org/doc/go1.13#runtime](https://golang.org/doc/go1.13#runtime)
[1][https://golang.org/doc/go1.12#runtime](https://golang.org/doc/go1.12#runtime)
[2][https://github.com/fafhrd91/actix-web-
postmortem](https://github.com/fafhrd91/actix-web-postmortem)

~~~
chc
Why would you want to bring up the Actix author's drama? That doesn't seem
like something that should reflect on a language one way or the other.

~~~
deweller
As an outsider to both the Go and Rust cultures, I read the Actix news and
walked away with the impression that the Rust ecosystem is less mature.

~~~
cies
Go's is more pragmatic. Rust's is more purist, and that reflects on the
language features (more functional, more free in allowing you to use it for
any purpose where Go is network-app specific, more strict in typing), the
licensing and the attitude towards collaboration.

That collaboration thing is why Actix exploded I think. While mostly an
isolated incident it does show some clash between the author's values (and
possibly the author's employer's (MSFT) values) and the values of the general
Rust community. I would not say that reflects on the maturity of the langues
or ecosystem.

In Go a lot of stuff is Google dictated. In Rust it's a true open governance
innovation project (looking to become a non-profit). Since the Go is a very
specific language --made for networked apps and only has one way to do
concurrency-- and Rust very broad --a true general purpose prog lang-- it is
easy to see how Go mature so quickly (not much to mature) and also why it got
a bit old so quickly as well (ignores most innovations in computer science of
the last decades).

~~~
nemothekid
The Go community has a very similar story, where someone released a web
framework, with an unorthodox set of features, and was flamed to the point
where he abandoned the project and quit OSS.

[https://github.com/go-martini/martini](https://github.com/go-martini/martini)

~~~
fjp
What was so unorthodox/upsetting to people there?

~~~
nemothekid
Martini used the service injection pattern and made use of reflection to do
so. It was a very popular framework and one of the first in Go (it currently
has ~10k stars), and the use of reflection became a very contentious viewpoint
in the community.

------
bradhe
Replatforming to solve this problem was a bit silly in my opinion. The
solution to the problem was "do fewer allocations" which can be done in any
language.

~~~
jhgg
Your reply misses the point. We were already doing so few allocations that the
GC only ran because it "had to" at every 2 minute mark. The issue was the
large heap of many long lived objects.

~~~
_ph_
Did you try to change that interval to a much larger time?

~~~
jhgg
When we investigated, there was no way to change that that we could find -
barring compiling go from source (something we could have done, but wanted to
avoid.)

~~~
_ph_
Yes, you have to rebuild go, but that is literally done in a minute. It also
would be interesting, if you happen to have some conclusive benchmarks, how
the latest Go runtime would perform in this sense.

~~~
_ph_
I don't get, why this is downvoted without comments. Compared to a rewrite,
this would have been a miniscule change. Furthermore, considering that you
wrote that long blog post (which I quite appreciate, as it contains
interesting information), it would have been important knowledge, whether the
setting of the parameter was the real culprit - and if it was, a good reason
to shout out to the Go implementors to look closer at it.

~~~
Jweb_Guru
All I'm going to say is that if you think maintaining your own version of a
compiler is the reasonable option compared to a rewrite in another language,
you are probably _deeply_ invested in the former language. This also applies
to kernels and databases.

~~~
_ph_
Well, in this case, "maintaining your own version compiler" concerns a single
value change in the code base. At least, as I wrote, it should have been tried
to identify the root cause for the observed behavior. If this "fix"
significantly improves the behavior, it would have been a good data point to
reach out to the Go developers to resolve this issue.

The problem to get down to the core of these issues are test cases. It seems,
that neither the Go developers nor many other people have run into this as an
issue - I only remember noticing the regular GC some years ago, but it was not
an issue for me. As they have a real-life test case exposing this problem,
they are possibly the only ones, who could verify a potential fix for the
problem.

So, while it is great that they identified the problem and wrote a thorough
blog piece about it, the only thing we learn from this is, that in Go 1.9
there was a latency issue every 2 minutes with their style of application/heap
usage. Unfortunately, we don't know whether this problem was already addressed
in later Go versions, and if not, whether there should be a way to control the
automatic gc intervals to address this.

------
The_rationalist
Borrowed from a comment:

Garbage collection has gotten a lot of updates in the last 3 years. Why would
you not take the exceedingly trivial step of just upgrading to the latest Go
stable in order to at least _try_ for the free win? From the go 1.12 release
notes: “Go 1.12 significantly improves the performance of sweeping when a
large fraction of the heap remains live. This reduces allocation latency
immediately following a garbage collection.” ¯\\_(ツ)_/¯ This sounds like “we
just wanted to try Rust, ok?” Which is fine. But like, just say that.

------
romaniitedomum
You're switching to Rust because Go is too slow? Colour me sceptical, but this
seems more like an excuse to adopt a trendy language than a considered
technical decision. Rust is designed first and foremost for memory safety, and
it sacrifices a lot of developer time to achieve this, so if memory safety
isn't high in your list of concerns Rust is probably not going to bring many
benefits.

~~~
hajile
Did you read the article? The naive Rust version was better than the tuned
golang version in every metric. The most important one (latency) simply wasn't
fixable due to golang's GC (something that is a bit of a general GC issue I
might add).

~~~
romaniitedomum
Did you read my comment? I don't dispute that the Rust version is faster in
every way. I am disputing that rewriting in Rust was a sensible technical
decision, and in support of this I point you to where the author describes
having to use a nightly build of the compiler to get async support. Given that
they had to jump through a lot of hoops to make this work, I am saying they
could have achieved the same speed increase with less effort using a stable C
or C++ compiler. Hell, had they invested a fraction of the time spent
rewriting in Rust in the Go version, I'll bet they could have improved it to
the point where there was no need to rewrite it at all.

It's clear that Discord use Rust a lot, and that they are looking for any
excuse to replace existing code with Rust code.

------
_--___-___
"We want to make sure Discord feels super snappy all the time" is hilarious
coming from a program that is infamous for making you read 'quirky' loading
lines while a basic chat application takes several seconds to start up.

Don't really know about Go versus Rust for this purpose, but don't really care
because read states (like nearly everything that makes Discord less like IRC)
is an anti-feature in any remotely busy server. Anything important enough that
it shouldn't be missed can be pinned, and it encourages people to derail
conversations by replying out of context to things posted hours or days ago.

~~~
anchpop
I don't see why that's hilarious. Lots of programs take a second or two to
load and it only happens once on boot for me. "Read states" is just discord
telling you which channels and servers you have unread messages in

~~~
wvenable
Discord takes longer to start up than Microsoft Word.

Desktop development is a total wasteland these days -- there isn't nearly as
much effort put into optimization as server side. They're not paying for your
local compute, so they can waste as much of it as they want.

~~~
graphememes
Microsoft Word isn't patching the application on startup. That's the
difference.

Once it's loaded, how much slower than Word is it?

~~~
wvenable
You're telling me Discord is patching itself on every single launch and this
somehow a valid excuse for slow startup performance?

Almost every single app I run auto-updates itself in some form.

~~~
graphememes
In the case of Discord, yes. That's a valid argument, whether or not it's
truly important, I'm not sure. It certainly is a waste of time to invest
improving when their current system works perfectly fine.

~~~
wvenable
They're investing in server-side project that are also perfectly fine. In this
case, re-writing an entire module in a different language to eek out a tiny
bit more performance!

But on the client side, it's arguably the slowest to launch application I have
installed even among other Electron apps. Perfectly _fine_.

This completely re-enforces my original statement: "Desktop development is a
total wasteland these days -- there isn't nearly as much effort put into
optimization as server side" Desktop having horrible startup performance is
"fine" but a little GC jitter on the server requires a complete re-write from
the ground up.

~~~
jhgg
I think this is a statement that is ignorant to our development efforts, and
how our team is staffed, and what our objectives are.

First and foremost, we do care deeply about desktop performance. We shipped
this week a complete rewrite of our messages components, that come with a
boatload of performance optimizations, in addition to a new design. We spent a
lot of time to do that rewrite in addition to applying new styles, because
given what we know now (and what's state of the art in React world), we can
write the code better than we did 3+ years ago. In terms of total engineering
time spent, the rewrite of messages actually took much longer than the rewrite
of this service from go to rust.

That being said, the desktop app does load much slower than we'd like (and
honestly than I'd like personally.) I commented in another thread why that is.
That being said, the person who is writing backend data-services, is not the
one who's going to be fixing the slow boot times (our native platform team).
These are efforts that happen independently.

As for our motivations for using rust, I think saying that "a little GC jitter
on the server requiring a complete rewrite" is one of many reasons we wanted
to give rust a shot. We have some code that we know works in a Golang. We want
to investigate the viability of Rust to figure out how it'd look like to write
a data service in rust. We have this service that is rather trivial, and has
some GC jitters (that we've been fine with for a year.) So, an engineer (the
author of this blog post) spends some time last year to see what it'd look
like to write an equivalent service in rust, how it'd perform, how easy it'd
be, and what the general state of the ecosystem is like in practice.

I think it's easy to forget that a lot of work we do as engineers isn't all
about what's 100% practical, but also about learning new things in order to
explore new viable technologies. In this case, this project had a very clear
scope and set of requirements (literally rewrite this thing that we know
works), and a very well defined set of success criteria (should perform as-
good or better, see if a lack of GC will improve latencies, get a working
understanding of the state of the ecosystem and how difficult it would be to
write future data services in rust vs go.) Given the findings in our rewrite
of this service, running it in production, and now using features that have
stabilized in rust, we're confident in saying that "in places where we would
have used golang, we consider rust viable, and here's why, given our exercise
in rewriting something from go to rust."

