
BEAM languages, Hindley–Milner type systems, and new technologies - pyotrgalois
https://medium.com/this-is-not-a-monad-tutorial/eric-merritt-erlang-and-distributed-systems-expert-gives-his-views-on-beam-languages-hindley-a09b15f53a2f
======
rubyn00bie
Nice article, and for an Elixir fan, provides a nice little snippet on
something I've been having issues with but hadn't really put my finger on
until I saw it:

"[...] and I really dislike that Elixir tries to hide immutability. That does
make it slightly easier for beginners, but it’s a leaky abstraction. The
immutability eventually bleeds through and then you have to think about it."

I don't think it necessarily tries to hide it (at all), but it does have some
instances where something feels like a mutable structure. Those can be, at
least for me, a bit confusing to reason about if you're expecting things both
be and look immutable.

I suppose now that I know exactly what's weird, I should just go dig through
the code and figure it out. Problem solved?

... One other thing, because I see this in the comments already, is that BEAM
isn't the tool for every job-- but for some jobs, it is the only tool to do it
well. Is the JVM faster at general tasks? Hell yes, but that's not the point,
it's not even why BEAM is around.

It's about:

* Small concurrent workloads. Really long running CPU intensive tasks aren't going to be good.

* Low latency. Not just low, but with a very very small standard deviation. Your application's performance will be consistent.

* Fault tolerant.

The list goes on, and here's a nice summary of it (both bad and good):

[http://blog.troutwine.us/2013/07/10/choose_erlang.html](http://blog.troutwine.us/2013/07/10/choose_erlang.html)

There are times when I choose the JVM, there are times when I choose BEAM or
MRI. I just try choose the right tool for the job, but some tools, make some
jobs, very difficult.

 _cough_ ruby _cough_ concurrency _cough_

Edit: One thing for people not familiar with BEAM, a "process" is not a Unix
process, from the Elixir documentation:

"Processes in Elixir are extremely lightweight in terms of memory and CPU
(unlike threads in many other programming languages). Because of this, it is
not uncommon to have tens or even hundreds of thousands of processes running
simultaneously."

~~~
bfrog
Then again, there's NIF libs with Threads for those tasks which are long
running and require computational performance. Last I checked all the really
fast math libraries were written in C/Fortran/C++ not Java

------
johlo
It's surprising that the BEAM support for operations and management is very
rarely mentioned. To me this is the key selling point for using BEAM vs JVM or
something else.

Being able to open a remote console and do system
introspection/tracing/profiling/debugging is a huge advantage when running in
production. And all languages running on top of BEAM ofc get this for free.

In my experience, running JVM in production with tools like
JProfiler/VisualVM/jconsole, etc. does not come close to the BEAM when trying
to understand what is happening in the system.

~~~
pron
> In my experience, running JVM in production with tools like
> JProfiler/VisualVM/jconsole, etc. does not come close to the BEAM when
> trying to understand what is happening in the system.

Then you haven't tried Java Flight Recorder/Mission Control or the new
javosize. BEAM doesn't come close... :)

~~~
johlo
I hadn't heard of javosize before, looks interesting, thanks. Being able to
update code and data on a live system is very useful and I haven't seen that
for the JVM before (BEAM of course handles that :) ).

~~~
pron
Well, there are a lot of JVM tools to inject code into a running application.
Take a look at Byteman (which is much more mature than javosize, but mostly
targets injecting traces for live debugging purposes)

------
abrgr
"It’s not going to be too much longer before we declaratively describe out
systems as well as our code. I am looking forward to that."

Amen! Been doing that to the extent possible for a while and it is terrific!

~~~
nickpsecurity
It's been done before to varying degrees. The original in automatic
programming was one that took input from case tools and autogenerated a lot of
COBOL. Sun's DASL language from ACE project was a domain-specific language for
specifying a type of web application. Around 9-10kloc of it autogenerated
100+kloc XML, client code, server code, etc. Lots of them in the 4GL category
for database manipulation with WINDEV/WEBDEV more general-purpose yet still
requiring coding in BASIC-like language.

So, it's not far-fetched. It will likely be a series of DSL's like the above
or iMatix's model-driven development approach. These would specify it at a
high level with precise requirements and constraints. Then, planning software
with heuristics would produce the code. Similar systems for integration.
Several people's worth of work or 10-20 tools becomes one person with one set
of tools. Doubt we'll replace the person or need for some programming tools.

~~~
abrgr
I definitely agree that we'll always need people who think like programmers.
We can develop tools in the vein of those you mention to significantly enhance
the productivity of those people though. I haven't seen many tools like that
that help in distributed systems or that allow one to easily visualize and
understand an entire system.

~~~
nickpsecurity
That's hard. Good news for you is that there's plenty of ongoing work on
toolkits and methods for doing that. All alpha quality for now. We'll just
have to wait.

------
jdimov9
Here's a tool that you can play with to see how well Elixir scales with an
embarrassingly parallel task (matrix multiplication) when throwing more CPU
cores at it:
[https://github.com/a115/exmatrix](https://github.com/a115/exmatrix)

~~~
eggy
Yes, Elixir does well here, but I still prefer Lisp syntax. I would like to
see a comparison of LFE and Joxa. Joxa seems more like Clojure. This
presentation is a good one, but I'd like to see a nut and bolts comparison
with side-by-side code:

[http://www.slideshare.net/BrianTroutwine1/erlang-lfe-
elixir-...](http://www.slideshare.net/BrianTroutwine1/erlang-lfe-elixir-and-
joxa-oscon-2014)

------
pron
BEAM is a very nice VM (albeit rather slow compared to HotSpot or V8), but I
don't understand why every mention of BEAM has to spread misconceptions about
the JVM:

> In many systems, Java included, the Garbage Collector (GC) must examine the
> entire heap in order to collect all the garbage. There are optimizations to
> this, like using Generations in a Generational GC, but those optimizations
> are still just optimizations for walking the entire heap. BEAM takes a
> different approach, leveraging the actor model on which it is based: If a
> process hasn’t been run, it doesn’t need to be collected. If a process has
> run, but ended before the next GC run, it doesn’t need to be collected

Well, how does BEAM know which process ran (so that its garbage should be
collected)? Bookkeeping, of course, and that is also "just an optimization".
Similarly, if a JVM object hasn't been touched since the last collection -- it
doesn't need to be examined.

> If, in the end, the process does need to be collected, only that single
> process needs to be stopped while collection occurs

And new HotSpot GCs rarely stops threads at all for more than a few
milliseconds (well, depending on the generation; it's complicated), collecting
garbage _concurrently_ with the running application, and other JVMs have GCs
that never ever stop any thread for more than 20us (that's microseconds or
so).

While BEAM's design helps it achieve good(ish) results while staying simple,
the fact is that the effort that's gone into HotSpot gets it better results
for even more general programs (collecting concurrent, shared data structures
-- like ETS -- too).

I've said it before and I'll say it again: Erlang is a brilliant, top notch
language, which deserves a top-notch VM, and the resources Erlang/BEAM
currently have behind them are far too few for such a great language. Erlang's
place is on the JVM. JVMs are used for many, many more soft-realtime (and
hard-realtime) systems than BEAM, and yield much better performance.

An implementation of Erlang on the JVM (Erjang) done mostly by one person, was
able to beat Erlang on BEAM in quite a few benchmarks, and that was without
the new GCs, the new (or much improved) work-stealing scheduler and the new
groundbreaking JIT (which works extremely well for dynamically-typed
languages[1]).

OpenJDK could free Erlang programs from having to write performance-sensitive
code in C (so many Erlang projects are actually mixed Erlang-C projects).
While Erlang can be very proud of how much it's been able to achieve with so
little, instead of fighting the JVM (or, rather, JVMs), it should embrace it.
Everyone would benefit.

[1]:
[https://twitter.com/chrisgseaton/status/586527623163023362](https://twitter.com/chrisgseaton/status/586527623163023362)
,
[https://twitter.com/chrisgseaton/status/619885182104043520](https://twitter.com/chrisgseaton/status/619885182104043520)

~~~
chrisseaton
The thing I don't get about Erlang and BEAM is the idea that having lots of
little processes means that your program will scale brilliantly to run in
parallel.

Programming Erlang (authored by the creator of Erlang) says without any
qualification at all that "Concurrent programs are made from small independent
processes. Because of this, we can easily scale the system by increasing the
number of processes and adding more CPUs."

When I read that I was expecting it to be followed by "ha ha... not really
because of algorithmic sequential dependencies and Amdah's Law of course!" but
it isn't!

You can have an infinite number of processes but if the dataflow graph they
form doesn't have any parallelism then Erlang and BEAM aren't likely to be
able to work any magic to make them so. Even if it did have parallelism it is
only going to have so much and you certainly won't be able to arbitrarily
scale it beyond that by increasing the number of processes.

What's more the typical advice about mutable shared state in Erlang is to
encapsulate it safely in an process - which seems to be a recipe for further
serialisation to me and so a crazy thing to promote!

~~~
corysama
Everything you are saying is technically correct. The issue is that Erlang is
trying to solve a different problem than you are describing. It sounds like
you are hoping to perform some large but single task and are disappointed that
Erlang can't defeat the Amdahl limitations inherent in your task. That's not
Erlang's goal.

Erlang's goal is to take problems that are embarrassingly parallel in theory
and make them embarrassingly parallel in practice. Serving a billion
independent http requests in a distributed, parallel manner can technically be
done in Java or C or assembly. But, it's very hard to do well and very easy to
screw up in painful, confusing, life-wasting ways. Erlang makes it much easier
to do well and much harder to screw up.

~~~
pron
> Erlang makes it much easier to do well and much harder to screw up.

That's a feature of the language, not the VM (compare with Clojure, that does
a similar thing on the JVM). You could still do all that on a higher-quality
VM (simply because the effort put into it is orders-of-magnitude more than
into BEAN; not because OpenJDK's people are smarter or anything).

~~~
corysama
If I could get the JVM's JIT & serial GC performance combined with the BEAM's
trivial-cost threads & thread-segregated GC, it would be sweet indeed.

~~~
pron
> serial GC performance

HotSpot hardly ever uses a serial GC anymore. It's now parallel or parallel
_and_ concurrent.

> thread-segregated GC

You don't really want that if a shared-heap GC can buy you better performance
because it's more mature and saves you all the copying.

> BEAM's trivial-cost threads

You can have that on the JVM.

