
Using DTrace to measure Erlang dirty scheduler overhead - bcantrill
https://medium.com/@jlouis666/erlang-dirty-scheduler-overhead-6e1219dcc7
======
arthurcolle
Here is an overview of how BEAM is implemented for people who want a quick
refresher:
[http://www.erlang.org/euc/08/euc_smp.pdf](http://www.erlang.org/euc/08/euc_smp.pdf)

And a comparison to the JVM here: [http://ds.cs.ut.ee/courses/course-
files/To303nis%20Pool%20.p...](http://ds.cs.ut.ee/courses/course-
files/To303nis%20Pool%20.pdf)

~~~
pron
I'll just note that the JVM part in that comparison is wrong. The main
difference between BEAM and the JVM is that BEAM implements a much larger part
of the language's functionality in the runtime, while -- at least for many JVM
languages -- that is not the case with the JVM. The JVM -- similarly to the
CPU+OS -- directly offers a rather general programming model -- shared memory,
kernel threads etc., only with the addition of an optimizing JIT and a GC.
BEAM, OTOH, operates at a much higher level, much closer to the Erlang
language. It offers a very specific form of GC, a very specific form of shared
memory, and a very specific scheduler. All of these -- just like BEAM
implements them on the CPU+OS, can be implemented on top of the JVM, which
implements a lower level-of-abstraction than BEAM.

The comparison, however, compares BEAM to a programming model (offered by the
Java _language_ ) that is very close to the JVM's native, low-level,
abstraction. That is a lot like comparing Erlang and C, namely comparing two
things that are aimed at completely different levels of abstraction. And just
like Erlang can be (and is) implemented in C -- which is a lower level
language -- so too it can be implemented in Java.

Its preemptive lightweight processes can be implemented in Java, its scheduler
can be implemented in Java (both have been, in fact), and even its per-process
GC can be implemented in Java (although that's probably unnecessary given new
Java GCs).

The reason BEAM is implemented that way is not because it results in a better
Erlang runtime, but that a very specific, high-level VM, can yield good(ish --
BEAM is a _very_ slow VM compared to HotSpot or V8) results at relatively
little effort because the high-level constraints imposed by the language are
used to restrict the scope of the runtime, while the JVM has required a much
bigger investment to provide superb result across a wide variety of languages
(HotSpot with its next-gen JIT is comparable to V8 at running JavaScript and
PyPy at running Python, and not too far behind gcc at running C). The price
that BEAM has to pay for that decision is that going beyond the very narrow
limits of execution profile it supports well requires implementing the code in
C. Which is why most large Erlang applications are mixed Erlang/C applications
(Erlang for the control plane, C for the data plane), while JVM applications
and library require virtually no native code (aside from the runtime itself,
which is also moving more and more functionality to Java -- the next gen JIT
is written entirely in Java).

The difference between the JVM (at least HotSpot; there are lots of JVMs) and
BEAM is that BEAM is a reasonable, Erlang-specific (or languages with similar
semantics to Erlang) VM, while HotSpot is a state-of-the-art, general
purpose(ish) VM, with many, many man-centuries behind it.

~~~
jlouis
The JVM could have had like 99% of all the VM market by now had Sun just opted
to fix two things back in the day:

* GC intrinsics, so you could implement functional languages easily.

* Tail calls, so you could implement functional languages easily.

I note LLVM made the same mistake :)

~~~
pron
The JVM _has_ nearly 99% of the non-Windows server VM market (you can't beat
MS on Windows). And tail calls are coming once they matter enough to the
users.

What do you mean by GC intrinsics?

~~~
jlouis
I'm talking about getting academia onboarded back in 1996 here. If you look at
the industrial market, then sure, but I'm looking at what it takes to get
languages that doesn't look like the normal imperative piece of crap to run on
the JVM, and without tail-calls that is just hoop-jumping.

As for the GC, the interplay is the ability to tell the runtime where your
pointers are. It is one of the places which usually lacks because many VMs
assume a calling convention, like the one in Java, say. The JVM is a bit
better here as it uses a stack-based engine, and as such is somewhat simpler
to handle.

Functional compilers rarely, if ever, uses the standard conventions for
handling this. Especially if they want to avoid boxing polymorphic parameters,
and expand them for speed.

We could have had far better client-side and academic penetration of the JVM
by now, had Sun played their cards differently. But alas, they didn't, and we
are stuck with a Server VM market only.

~~~
pron
When you say academia you mean academic PL research, which makes up a very
small portion of CS academia. Most CS people care about FP just as much as the
industry does, and the JVM is quite popular in the algorithmic fields (maybe
not as much as C, but more than any other managed runtime).

Aside from tail-calls (which _will_ come once people really ask for them), the
JVM is about to have pretty much everything functional languages can ever need
(value types and excellent box-elision), and the new JIT (Graal) is the
biggest breakthrough in compiler technology in the last decade or so. PL
academics are drooling over it. ECOOP had a full-day workshop dedicated just
to Graal.

And you _don 't_ want to change calling conventions, because language
interoperability is one of the JVM's greatest strengths. Here it is running
JS, R and C, in the same REPL with Graal:
[https://dl.dropboxusercontent.com/u/292832/useR_multilang_de...](https://dl.dropboxusercontent.com/u/292832/useR_multilang_demo.mp4)

------
rdtsc
The 3-5 usec overhead, with VM optimization flag, for dirty schedulers is
pretty good.

What's the overhead of just passing the data through to a regular NIF?
Probably gets burried in the jitter caused by cache and memory access times...

(For others, if you don't know about the Erlang VM, and didn't understand the
first couple of paragraphs, dirty schedulers is a new feature that solves the
problem of running user created, long running, C extension code inside the
Erlang VM, without blocking the rest of the VM).

~~~
jlouis
With jitter, the call time for a constant, in a dynamically linked library, is
around 1200ns:

    
    
    	  enacl_nif:crypto_box_ZEROBYTES/0                  
    	           value  ------------- Distribution ------------- count    
    	            1000 |                                         0        
    	            1100 |@@@@                                     110903   
    	            1200 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      879765   
    	            1300 |                                         7131     
    	            1400 |                                         689      
    	            1500 |                                         218      
    	            1600 |                                         45       
    	            1700 |                                         29

~~~
rdtsc
Thanks. That is a bit higher than I would expect but not too bad.

