
Optimising for Concurrency: Comparing the BEAM and JVM virtual machines - francescoc
https://www.erlang-solutions.com/blog/optimising-for-concurrency-comparing-and-contrasting-the-beam-and-jvm-virtual-machines.html
======
eggsnbacon1
the author leaves out Kotlin which adds support for coroutines on the language
level and still compiles to java bytecode. These are not classic continuations
because they cannot be cancelled, but they're still very useful and true
fibers.

There's also the Quasar library that adds fiber support to existing Java
projects, but its mostly unmaintained since the maintainers were pulled in to
work on Project Loom.

Then there's Project Loom, an active branch of of OpenJDK with language
support for continuations and a fiber threading model. The prototype is done
and they're in the optimization phase. I expect fibers to land in the Java
spec somewhere around JDK 17.

I figure its fair to mention these as the authors criticisms are somewhat
valid but will not be for very long (few years max?)

In summary: Java will have true fiber support "soon". This will invalidate the
arguments for Erlang concurrency model. They are already outdated if you are
okay using mixed java/kotlin coroutines or Quasar library

The newer Java GC's Shenandoah and ZGC address authors criticisms of pause
times. They already exist, are free, and are in stable releases. Dare I say
they are almost certainly better than Erlang's GC. They are truly state of the
art, arguably far superior to the GC's used in Go, .NET, etc. Pause times are
~10 milliseconds at 99.5%ile latency for multi terabyte heaps, with average
pause times well below 1 millisecond. No other GC'ed language comes close to
my knowledge. His points 1 and 2 no longer exist with these collectors. You
don't need 2X memory for the copy phase and the collectors quickly return
unused memory to the OS. This has been the case for several years.

Hot code reloading. JVM supports this extensively and its used all the time.
Look into ByteBuddy, CGLIB, ASM, Spring AOP if you want to know more. Java
also supports code generation at build time using Annotation Processors. This
is also extensively used/abused to get rid of language cruft

~~~
dnautics
> This will invalidate the arguments for Erlang concurrency model.

What about failure domains? As far as I'm concerned, this is the strongest
reason for actor-based concurrency. I can design my architecture so that
groups of processes that need to die together die together. And it's usually
one or two lines of code, if any.

Here's a real life example. I have a process that maintains an SSH connection
to a host machine, and that ssh connection is used to query information about
running VMs on that host machine. If the SSH connection dies, it kills the
process that is tracking the host machine, which in turn kills the processes
tracking the associated VMs, without perturbing any of the other hosts'
processes or vms. This triggers the host process to be restarted by a
supervisor, which then creates a new SSH connection to query for information
(possibly repopulating VM processes for tracking information). All of this I
wrote zero lines of code for (which, importantly, means I made no mistakes),
just one or two configuration options. More importantly, the system doesn't
get stuck in an undefined state where complex query failures can cause logjams
in the running system.

~~~
eggsnbacon1
You can tie the fates of threads together in Java using thread groups. If you
need more flexibility, or want it to be managed for you, Akka framework offers
this. I believe Akka gives you a model very similar to Erlang.

In Java you would create a thread pool and configure it to restart the threads
if they die. Each thread would wake up every so often to query SSH and dump
their results into a queue. If the query threads die, the processes reading
the queue at the other end have nothing to do so they won't execute. Its easy
to make a consumer queue that executes some code on another thread whenever
data arrives.

Java's exposure of the underlying OS threads and cheap transfer of data
between threads lets people build libraries on top that offer memory models
used by Erlang and others. Its not built in or quite as convenient, but you
can use actors and fibers in Java if you want to.

~~~
dnautics
Yeah that's exactly the problem. It's an afterthought in the system. How
certain can you be that the system you're using is compostable with any other
code brought in to your system, even from libraries outside? In erlang,
failure domains are the raison d'etre of the language, so everything in the
ecosystem will play nice.

Ultimately, systems like akka are extremely complicated to get right, even for
experts, because you have to think about all of the vm bits underneath. I can
(and have) teach a junior programmer basic OTP concepts with the confidence
that they can't mess things up. Now, they wouldn't be able to come up with the
architecture I designed as a good idea, but I could tell them to implement it
(with tests!) and expect them to get it right.

------
Traubenfuchs
> Programming with concurrency primitives is a difficult task because of the
> challenges created by its shared memory model.

I never understood this often repeated point. As junior / mid-level developer
I had the privilege to run self written .jar files on government scale systems
with more than 50 cores. I used Java thread pools and concurrent data
structures to do heavy cross thread caching.

It was all pretty simple and concurrency & parallelism were never an issue but
simply a necessity to make things run fast enough.

Am I a concurrent programming genius? Were the types of problems/challenges I
was solving too simple? When is concurrency in Java ever hard+?

\+ I know about Java masterpieces like the LMAX Disruptor that are mostly
beyond my skill level, but those are low level writte-once libraries you
wouldn't write yourself.

~~~
mrkeen
> When is concurrency in Java ever hard?

Potentially-racey stuff:

* Synchronized primitives don't compose. You can safely `synchronized get(...)` and safely `synchronized put(...)`. But their composition put(get(...)+1) isn't synchronized. And it's hard to mentally revisit it at the end of the day: if you have a class with some methods marked synchronized, nothing will tell whether you've synchronized the right methods. You just have to think it through again and hope you reach the same conclusions as before.

Other (non-racey) stuff:

* Threads are heavy, CompletableFutures are light. But CFs lack the functionality of Threads. A CF can't decide to sleep for a while, nor can it be cancelled. (As an aside, BEAM threads are super light).

~~~
discreteevent
Java has a large set of higher level abstractions for concurrency. You don't
have to use low level locks but you can. (And that's just Java, there's also
Scala, clojure ...)

~~~
throwaway894345
I'm pretty firmly in the "shared memory parallelism is a Good Thing" camp, but
the counter argument to your point is that having a larger set of concurrency
abstractions is a Bad Thing in that any particular piece of code has to
consider all of the different permutations. In a shared-nothing world, there's
a lot less to worry about (except occasionally performance).

~~~
vvanders
The world writeable cross thread also has implications on how your GC
algorithm is designed.

Erlang for instance scopes gc pools per process so short lived processes just
drop the pool. Also GC of one worker doesn't stop any others. Can't remember
if it even needs to be generational because the heaps are already sliced by
process. It's the closest thing to heap arenas I've seen in a VM based
language.

Or take Lua which is single threaded and doesn't require VM safepoints since
everything is done via cooperative coroutines.

Java needs to assume worst case and as such has to be conservative in some of
it's approaches.

------
jph
BEAM is amazing and IMHO there's one very sweet spot ready for optimization:
math functions. I know I can escape out to C/Rust/etc. yet the majority of
what I do is simple float math such as stddev and vector normalization.

The article states benchmark of 5000% speedup on floats when switching from
BEAM to the JVM. I would like to offer $100 as a gift incentive to anyone here
who wants to work on optimizing BEAM math.

~~~
chrisseaton
People say this isn't what BEAM is intended for an it excels elsewhere, which
yes I'm sure it does.

But why can't it be both? Why can't you do everything that BEAM does... and
then also have an optimising JIT for the straight line maths code? Couldn't
you leave all the other parts of the system the same and keep all the existing
benefits? Improving one doesn't damage the other does it?

~~~
olikas
Co-author here.

The problem with number crunching or maths is that it is very difficult to cut
the whole computation into smaller units and pre-emptively schedule it. If it
is possible for a specific use case, then it is moderately easy to replace
that part with NIFs. For effective maths you need to convert the internal
tagged number representation to machine native code that is also expensive.
Solving these two things in the generic case is very difficult while
preserving all the good parts.

~~~
PopeDotNinja
Am I correct in saying that functions written in C do not get pre-empted like
Erlang functions? If that is true, you could write computationally intense
code in C within a BEAM app. But I think this misses the point. Pre-emption is
really cool for concurrency abstractions, and the trade off is being less good
at single threaded computation. Trying to turn Erlang into something like a
Bitcoin miner is kind of like combining a bunch of Roombas to make a Shop-Vac.

~~~
lostcolony
You are correct.

It's actually worse than that; as I recall, the internal numerical
representations of numbers do not necessarily map to the CPU's (for instance,
there is no byte sizing; you have integers and floats, and they can be
arbitrarily large). The work to perform that conversion, do the math, and
convert back, would almost assuredly make it so that a single calculation
takes more time than just doing it within the BEAM. The only way to save time
would be to convert once, do a bunch of math, and convert back. Which would,
yes, prevent pre-emption, AND require indication of intent (so brand new
language constructs, minimally).

That's a lot to expect of the user, and a lot to implement in the
language...all to avoid just writing a NIF.

~~~
chrisseaton
> Which would, yes, prevent pre-emption, AND require indication of intent

I don't understand why. If you have a maths-intensive operation like matrix-
multiplication using untagged maths, why does that prevent pre-emption? Why
does it require indication of intent?

And there's already a basically zero-overhead way to implement pre-emption -
safepoints - that's what the JVM does when it wants to pre-empt in user-space.

~~~
lostcolony
Uh...no. A safepoint is when all threads in the JVM have blocked (which is
purely cooperative, and happens during thread transitions), and, importantly,
when OS threads running native code _still are running_ , but can't
return/respond to the JVM. The JVM doesn't pause those threads.

Which is the point. You can't preempt crunching those numbers if it's not
within the BEAM. Which might be fine. Or it might not. Making it invisible to
the user is not really a good idea when going for soft realtime properties; at
least with a NIF and dirty scheduler you're being explicit about it.

~~~
chrisseaton
> A safepoint is when all threads in the JVM have blocked

And having blocked them, you can then pre-empt them.

> You can't preempt crunching those numbers if it's not within the BEAM.

I still don't see why sorry. If you had a JIT and you compiled maths intensive
code to native code, it could run efficiently in BEAM and still be pre-
emptible by having a safepoint in the generated code.

How do you think Java is doing optimised numerical code that is pre-emptible
from user-space? Safepoints! BEAM could do the same thing.

~~~
lostcolony
You, uh, realize that's not actually preemptive right? Like, having to thread
in checks, that are cooperative, is by definition not preemptive?

~~~
chrisseaton
Yes I think that's a reasonable definition of pre-emption because they don't
interrupt the numerical pipeline. They cause zero data or control
dependencies. But even if you don't think it fits the definition, what do you
think the practical difference is when you argue about this terminology?

What did we want to achieve? We wanted to be able to run a tight loop of
highly optimised, untagged numerical code but still be able interrupt it to
switch threads on demand from user-space if needed.

Safepoints let us do that.

What else did we need that this doesn't cover?

------
hinkley
Can someone steer me to some good benchmarks, discussions of perf
characteristics and gotchas of the BEAM? My search-fu is weak and I'm not
finding the sort of content I'm after.

I'm trying to learn Elixir and being a systems thinker so before I (can) get
too comfortable I'm gonna want to dive into origin stories to build up my
holistic map of why things are the way they are, what can be done and what
can't be done, and understanding bottlenecks in the BEAM seems like it's gonna
have to be part of that (the way I studied JVM tech documentation when I did
perf and architecture work in Java)

~~~
micmus
In general I would say there's no good single book or resource that describes
everything comprehensively. There's a lot of resources, though, but mostly
scattered in various places.

The BEAM Book [1] is a good, though unfinished resource talking in general
about the implementation - the memory model and the interpreter.

If you're interested in some very low-level details of the runtime, the
internal documentation [2] also holds a lot of interesting details.

There are also some additional details on internals at Spawned Shelter [3].

[1]:
[https://blog.stenmans.org/theBeamBook/](https://blog.stenmans.org/theBeamBook/)
[2]:
[https://github.com/erlang/otp/tree/master/erts/emulator/inte...](https://github.com/erlang/otp/tree/master/erts/emulator/internal_doc)
[3]: [http://spawnedshelter.com/#erlang-design-choices-and-beam-
in...](http://spawnedshelter.com/#erlang-design-choices-and-beam-internals)

~~~
hinkley
I want to know the constraints to, and evolution of, sequential computation on
the BEAM. I want to form opinions on how that landscape is likely to change
within the lifespan of a project I'm affiliated with.

I get mostly false positives trying to find those sorts of discussions or
metrics.

~~~
di4na
I am not sure i understand the problem you are trying to find information
about. Maybe explain it a little bit more ? or go ask for it in the elixir
forum, people can try to be your librarians there

~~~
hinkley
To an outsider, it seems like the BEAM documentation [and particularly,
videos] go out of their way to discuss how process management and IPC
communication works and how certain classes of data are managed. They talk
about what makes the BEAM the BEAM to exclusion of all other concerns.

Prior to finding this document ([http://www.cs-
lab.org/historical_beam_instruction_set.html](http://www.cs-
lab.org/historical_beam_instruction_set.html)) I had no idea whether you could
actually do computation on the BEAM. I was starting to wonder if they had
misappropriated the term VM, and some sort of inline assembly trick was being
used for everything but control flow and IPC.

Interpreted code has very, very real computational constraints and you can't
assume people will know this, even now. Especially if your system is
noteworthy for how it is _not_ like other systems. Where does it stop being
'weird' and start being conventional? The boundaries describe both sides of a
distinction. Even if you're only interested in the exotic part, leave some
breadcrumbs for others.

~~~
toast0
Hmm, are you maybe after this page?

[http://erlang.org/doc/efficiency_guide/advanced.html](http://erlang.org/doc/efficiency_guide/advanced.html)

------
shawnz
The JVM supports hot code loading, although this article seems to imply only
BEAM supports it.

~~~
brightball
Do you have a reference on that?

I want to make sure you're talking about the same thing.

~~~
orestis
Clojure does hot code reloading as a built in. You essentially send code to a
running system and you change it. It’s enabled by a dynamic class loader. I
wouldn’t say it’s common outside of Clojure though, the whole language and
ecosystem is built around this concept.

To be clear: JVM enables the feature, so “technically” JVM allows hot code
reload. Not sure how useful this is in practice for non-Clojure JVM users.

~~~
eggsnbacon1
Runtime code generation is a common optimization in java frameworks. End-users
may never see it but the majority of popular frameworks use it under the
covers.

Debuggers also use the functionality to allow live code editing and expression
evaluation when paused on a breakpoint

------
exabrial
They mentioned this for the beam virtual machine but not for the JVM, the JVM
actually can also do hot code loading as long as call site signatures are not
changed or added. In some cases you you can make major changes to the current
stack and restart the frame which is a pretty handy feature for developer.
Some commercial extensions to the JVM get around all of these limitations.

------
alfanerd
Of interest might also be Erjang, Kresten Krab's port of BEAM to JVM.
[https://github.com/trifork/erjang](https://github.com/trifork/erjang)

As I understand it, it is feature complete and actually runs Erlang pretty
well. Could be interesting to see some benchmark testing.

------
ForHackernews
Apparently this is an Erlang BEAM, not Apache BEAM
[https://beam.apache.org/](https://beam.apache.org/)?

~~~
pdimitar
Yes. It's about Erlang's BEAM VM.

------
jeffrallen
Tl,dr. The point seems to be, "shared nothing makes concurrency and GC easy".
Congrats. But also lots of big fast systems use shared memory, so just relax,
STFU, and understand that tradeoffs exist.

