Hacker News new | comments | ask | show | jobs | submit login
A Brief History of the BEAM Compiler (erlang.org)
326 points by jxub 6 months ago | hide | past | web | favorite | 74 comments



I'm always surprised and delighted to see so many "first implementations" of languages in prolog. It was described to me in college as "just a novelty" but in fact it's used extensively in the JVM internals [1] and apparently is the starting-point impl for other languages.

But prolog still seems so..awkward. I wonder why langs like Haskell or OCaml aren't more de-facto for these purposes; they seem to have similar expressive power for parser/grammar things and with less inside-out paradigms (imho).

[1] https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf


"seem so..awkward"? Sure because you don't understand prolog. it's very easy to implement languages.

Look how simple I can compile down to asm code in 35 lines of pure prolog code.

https://github.com/segmond/PrologThingz/blob/master/clause_a...

  /*
  	c := 1;
  	r := 1;
  	while c < n do
  	     (c := c + 1;
  	      r := r * c)
  */
  program(1, (
      assign(c,1);
      assign(r,1);
         while((c < n),
  	  (assign(c, c+1);
             assign(r, r*c))))).

  ?- compile(1).
    movc(1,r(1))
    stm(r(1),c)
    movc(1,r(1))
    stm(r(1),r)
  label(_G3638)
    movm(c,r(1))
    movm(n,r(2))
    cmp(1,2)
    bge(1)
    movm(c,r(1))
    movc(1,r(2))
    add(r(2),r(1))
    stm(r(1),c)
    movm(r,r(1))
    movm(c,r(2))
    mul(r(2),r(1))
    stm(r(1),r)
    br(_G3638)
  label(1)


This is great, thank you for sharing this!

Your Prolog compiler is also notable because it can be easily made so general that it not only compiles, but even generates programs.

For example, if I use pattern matching to generalize the first two clauses via:

    cg(i(I), R, [movc(I, r(R))|Z]-Z).
    cg(a(A), R, [movm(a(A), r(R))|Z]-Z).
then we can already use the same code to generate programs and compiled instructions:

    ?- cg(Prog, N, Asm-[]).
    Prog = i(_5354),
    Asm = [movc(_5354, r(N))] ;
    Prog = a(_5354),
    Asm = [movm(a(_5354), r(N))] .
The other clauses can be also generalized, using integer constraints to make them usable in all directions.

For example, if we write:

    cg(X+Y, R, C0-C2):-
            cg(X, R, C0-C1),
            R1 #= R + 1,
            cg(Y, R1, C1-[add(r(R1),r(R))|C2]).
then we can compile an addition whose operands are not even yet fully known:

    ?- cg(i(X)+i(Y), N, Asm-[]).
    Asm = [movc(X, r(N)), movc(Y, r(_8850)), add(r(_8850), r(N))],
    N+1#=_8850.
Still, we can inspect the general pattern that is generated as a result of the compilation, and therefore write for example test cases that subsume many possible concrete cases!

On a general note, one reason why Prolog is so popular for writing compilers is that the language has a built-in mechanism for writing parsers, and in fact describing all kinds of sequences (for example, of tokens, of instructions etc.) in a very convenient way.

For example, the above three clauses can be written equivalently as a so-called definite clause grammar (DCG):

    cg(i(I), R) --> [movc(I, r(R))].
    cg(a(A), R) --> [movm(a(A), r(R))].
    cg(X+Y, R)  -->
            { R1 #= R + 1 },
            cg(X, R),
            cg(Y, R1),
            [add(r(R1),r(R))].

This makes it very easy to see the key aspects of the code. At the same time, the description is so general that it can be used in all directions, to generate, parse and test instructions.

Sample query:

    ?- phrase(cg(i(X)+i(Y), N), Ls).
    Ls = [movc(X, r(N)), movc(Y, r(_2288)), add(r(_2288), r(N))],
    N+1#=_2288.



The "inside out" paradigm of Prolog is what it is good for. Try to write complex procedural programs and you will be driven to insanity by the cuts, using logical failure to express procedural success, etc.

The original sin of conventional programming languages is that we write a sequence of operations that happen one after another in time. Thus we find concurrency, dealing with asynchronous events, and reacting effectively to the environment difficult.

Prolog makes writing simple parsers simple. Certain kinds of search problems are easy to solve, and some kinds of operations (say involving transitive closure) can be defined in two or three lines of Prolog that hopefully you could implement in 50 lines in a more normal language after looking at an algorithms book.

There has been a huge amount of research and commercial implementation of things that are like Prolog but different, for instance having built-in satisfaction solving, combinatorical optimization, etc. Prolog is backwards-chaining (finds rules to justify a conclusion) but other engines are forward chaining like Drools, the Jena Rules Engine, IBM iLog.

I would love to see this approach get more popular.


Thanks for the link!

There is a good reason for using Prolog-like language in a compiler-related project. Prolog was used for rapid prototyping of compilers since the end of 70s and it's still ahead of the time. And if you take a look at a code from old Warren article [1], for example, you'll see that even today almost nobody uses so high-level description of a compiler. The main power of Prolog is not in parsing. Prolog is unbeatable in a program analysis and transformations. See Datalog (you may find it even in a Dragon book nowadays!), see Stratego/XT, they all inspired by good old Prolog.

By the way, Prolog is also good in describing formal semantics of PLs. Good PL reference or a standard should include clean formal notation for semanics of the language, not some bureaucratic English text... Because it's already 2018, we need it for compiler developers and for deeper understanding of a language.

[1] http://sovietov.com/tmp/warren1980.pdf (this is excellent article, but, sadly, almost unknown one)


It has made the front page now https://news.ycombinator.com/item?id=17674859


> in fact it's used extensively in the JVM internals

I've not seen Prolog used in the JVM internals myself - it's just being used in this document as an inference notation to explain things as far as I know. Unless you know specifically where it's used, which would be interesting?


That's a good point - the doc uses it for notational clarity but it's not clear if it's actually used in the implementation(s).


Well, OCaml is indeed rather popular for implementing compilers for new languages (sometimes later bootstrapped), including ones which became rather popular. I can think of F#, Rust, Elm, Haxe, FlowType for instance.


Was Elm originally implemented in OCaml? I thought it was Haskell.

Another two relatively popular languages implemented in Haskell are Idris and Agda.


They might have meant ReasonML, both it and Elm interop w/ JS

https://github.com/facebook/reason


I could have added reasonml to the list indeed, but reasonml really is (a different syntax for) ocaml.


Elm is still implemented in Haskell. Self-hosting isn't a goal.


Yes sorry, Elm is indeed implemented in Haskell.


There's also a project for implementing a prolog like interpreter for doing type checking and other logic for Rust, I think it works on the MIR level, but could be wrong on that.


Having some experience with both languages (though not professionally), I personally much prefer Prolog to Haskell. There's an initial hurdle with Prolog because it uses such an unfamiliar paradigm, but once you get good with it, programming in Prolog is pure pleasure for me.

I like Haskell just fine, it's an excellent language, but my heart belongs to Prolog.


Do you know of Mercury? It's Prolog with types, basically.

https://mercurylang.org/

The language is fantastic, but I think it's fair to say that the implementation and ecosystem leave some things to be desired.



Probably because they predate haskell. In the case of Java, thats not true, but Haskell would have only been 5 years old at the time, and was largely academic.


Not even 5 years old — Java started development in 1991 as Oak so it would have been really early:

https://en.wikipedia.org/wiki/Oak_(programming_language)


Haskell is pretty common for this. See purescript, elm, agda, idris. There are more here: https://wiki.haskell.org/Applications_and_libraries/Compiler...


Prolog and Erlang in the same thread - is it Christmas already? Two of my favourite languages, both hugely undervalued.

I remember the moment I first "got" Prolog. Having only programmed in C/C++ to that point, it took some mental contortion. But the "oh!" moment was amazing. Backward chaining & logical variables are just so incredibly elegant. I still use transitive closure as an example of just how powerful the language is:

path(A, B) :- link(A,B).

path(A, B) :- link(A,I), path(I,B).

That's it. Stunningly simple.

As for Erlang - as others have noted - it was conceived right from the start as a language for concurrency. It's had 20 years to mature and robustly solve problems that inherently single-threaded languages have barely even recognised yet (C/C++/Java/C#/python/javascript/rust/...). Supervisors / error handling and threading are two notable examples. It makes me sad to look at the spread of "async". An inelegant sticking plaster that adds yet another concurrency construct and turns it into a tactical decision - on every method - and at the call site too.

Erlang shows that a single, well-conceived and executed construct is both possible and preferable. Lovely.


Async is terrible as a primitive, but as sugar on top of real concurrency, it's not terrible. I use Task.Supervised.async() all the time in elixir, with confidence that I have access to all the parts that I might need for diaster recovery if I should need it later.


You might be interested in the idea of Web Prolog, a language being written by Torbjorn Lager. It aims to provide similar concurrency features as Erlang, but retaining the logic variables, backtracking and unification etc provided by standard prolog.

It's at a point where he needs feedback. https://www.youtube.com/watch?v=mmNvxDEaLO4


Can you explain what transitive closure is and how your prolog snippet accomplishes it?


https://en.wikipedia.org/wiki/Transitive_closure#In_graph_th...

> In computer science, the concept of transitive closure can be thought of as constructing a data structure that makes it possible to answer reachability questions. That is, can one get from node a to node d in one or more hops? A binary relation tells you only that node a is connected to node b, and that node b is connected to node c, etc.

That snippet defines a function path(A, B) by specifying two cases:

1. a link(A, B) exists

-or-

2. some element I exists, such that a link(A, I) exists and path(I, B) exists.


If you'd like a high-level discussion of why the BEAM was designed as it was, check out my recent post: https://dockyard.com/blog/2018/07/18/all-for-reliability-ref...

The TL;DR is: reliability was the main goal, and using isolated processes which can react to each others' failures, even when running on separate machines, was the way they found to get reliability. This turned out to be nice for scalability also.


When I say this out loud, it sounds like a stupid question, but I'm still curious...

Outside of the fact that different programming languages run on BEAM and the JVM, what's the difference between the two?

I believe BEAM ships with supervisors, which is why Elixir can utilize them... what else is different?


The BEAM is a preemptive VM. That means that no one long running process can hog resources.

The BEAM was built on a principal of responsiveness since it was originally designed for telephonics.

Here's an in depth article on preemption in the BEAM that I shared with a friend this morning https://hamidreza-s.github.io/erlang/scheduling/real-time/pr...


> The BEAM is a preemptive VM.

The BEAM Book shared by armitron disagrees: https://happi.github.io/theBeamBook/#_scheduling_non_preempt...

Though an Erlang programmer never sees this in practice, because compiler inserts yield points appropriately.


Maybe on the definition of "preemptive", but it should be treated as a preemptive VM in practice.


Nah. You can screw with the BEAM’s reduction-scheduling even in pure Erlang code: just write a really long function body consisting only of arithmetic expressions (picture an unrolled string hash function.) Since it contains no CALL or RET ops, the scheduler will never hit a yield point, even after going far “into the red” with the current process’s reduction budget.

You just never see this in real Erlang code, because who would code that way? If you want to be idiomatic, use recursion. If you want to be fast, use a NIF. Why ever use an unrolled loop body?

But it can happen, and therefore, the BEAM does not guarantee preemption, even in Erlang. Reduction scheduling isn’t a “kind of” preemptive scheduling. It’s its own thing.


I mean, like i said, it should be treated as a preemptive VM in practice. In reality its not, but for most practical cases, its easier to understand it terms of preemption.


As a developer, yes. As an ops person trying to figure out why your deployed release isn’t making its deadlines, no.


I want that.

I hate looking at spinners on my computer because some software is hung up here or there.

I am prototyping a "desktop" search application which hits a number of sources and interacts with them by their protocol (maybe it gets full hits from bing but just gets identifiers from pubmed and has to look up the identifiers)

This can never show a spinner but instead keep the UI live all the time, filling in results as soon as they become available.

I have been looking at asyncio in Python and it is an unholy mess. Just as Java was the first environment to be really threadsafe (eg. the found the memory model of C underlying it was screwed and implemented the first sound memory model) BEAM seems to be the first environment designed with responsiveness in mind.


The WinRT environment enforces non-blocking async.

Any operation in the library that can take time is async. Async is everywhere.

I've always worked on embedded UIs where we had a physical watchdog timer (an independent chip power cycles your main device if a keep alive function isn't called periodically) that was set to ~30ms.

In test, anything running longer than 30ms had to be broken up.

The entire system was written in C and C++.

The lesson is you can write responsive apps in any language, but the entire underlying stack needs to cooperate. IIRC WinRT's requirement for "async" is operations that take more than 1 second. I disagree with that, I'd set the limit at 100ms, but I guess that'd annoy programmers even more. :)


I think WinRT is a non-starter for two audiences.

(1) People who know windows will find it hard to switch to the async model (2) People who don't know windows will have access to a disorientingly large API


To be fair, Python asyncio is a really ugly interface for something that other languages (e.g. JS and others) have done a lot better. Maybe look at other languages than Python for async programming.


I spend a quarter of the day looking at laggy JS apps so I would rule that out.


A language providing good support for asynchronous programming does not mean that developers use those features well. ;)

And even if they do, that does not mean that the application as a whole performs well. You can write beautiful code using async/await that performs awfully, if you wait on the right things.


Nice, I'll check it out!


The big features for me are:

Threads are cheap enough that it makes sense to spawn a thread for a task, and write the task in a straight forward way -- you don't have to interact with the event loop, you don't have to do awkward continuations, you can often just do one thing at a time, and it's all straight forward. If you need to parallelize sub requests, that's not too bad either.

Asynchronous message passing is a super useful primitive.

"No shared state" means you need to think about problems in terms of getting requests to the right worker that has the state, instead of locking the right thing; that turns locking problems into queueing problems, and queuing problems are easier. There are ways to share state (ets), and if you're not careful, you can get into the same locking problems that are easy to get into with traditional shared memory threading environments.

Hot code loading is lovely. It's theoretically possible to do something similar in Java, but nobody (to my knowledge) actually does. It's so much more convenient to update the code than to drain servers and restart them. There's certainly an abundance of ways to ruin your day with hot loading, but it's magic when it works.


Supervisors in Erlang are mostly, if not entirely, built on top of BEAM, not something in BEAM itself. BEAM was written to provide the primitives the supervisors need, but it's just primitives, not "supervisors".

Beam is distinguished by being a bytecode interpreter that implements pre-emption at that level, the concept of lightweight threads it implements and the communication mechanisms enabled between them, particularly the "linking" mechanic that allows one process to easily watch another in various ways, and the services it provides to those processes, which can be seen by querying the metadata for the processes.


One of the best references on BEAM is here: https://happi.github.io/theBeamBook/

You can skim it in order to see the major differences with the JVM. For me, the fact that BEAM has out of the box m:n threading [lightweight processes] and that each process has its own heap makes it stand out compared to the JVM.


Aside from preemption, the other notable difference is process management. They are incredibly lightweight in terms of RAM and CPU usage. You can create millions of them because they do not map directly to kernel threads, and thus to not incur the overhead associated with them.

On the contrary, the JVM uses kernel threads (except for old obsolete versions of some JVM implementations which supported green threads which ran in userspace), which are much more expensive. Because of this the JVM offloads thread scheduling to the OS. BEAM handles process scheduling internally.


Lightweight meaning 0.5kb for a BEAM process vs 1024kb for a JVM thread. A goroutine in Go is 2kb for comparison sake.


> lightweight meaning 0.5kb for a BEAM process

It's quite a bit more than that. According to the offical doc[0] it's 309 words without SMP or HiPE, on my machine (with both) it's 336 words.

That's 1.2~1.3k on a 32b system and 2.4~2.6 on a 64b system.

[0] http://erlang.org/doc/efficiency_guide/processes.html


Production Erlang systems tend to take advantage of hibernating many of their “millions of processes”, though, since it’s a rare architecture where those million processes are all hot. If you look at the memory usage of such systems, it’s a lot less than that figure would imply, mainly because of process hibernation.


That is true, best-case scenario (a process storing no data whatsoever) hibernating can remove the 233 words minimal heap (and stack) preallocated, yielding 76 words without hipe or smp, or 103 with them, for 304 / 608 / 412 / 824 bytes (nothing/32, nothing/64, hipe+smp/32, hipe+smp/64).


I came across that number a couple of years ago when I was doing research for a presentation. It may have been from a book but I'll see if I can dig up the source.


As derefr notes, it might have been the size of a hibernating (rather than live) process. It's 0.3~0.4k on 32b systems, though 0.6~0.8 on 64b ones.


Except JVM is going to get fibers, co-routines and tail calls in a future release.

"Project Loom with Ron Pressler and Alan Bateman" at this week's JVMLS 2018

https://www.youtube.com/watch?v=J31o0ZMQEnI


Interesting. It seemed to be independent library before. May be shoving it inside JVM lead to more usage finally.


I have been out of Java programming for a while. Why/when were green threads deprecated?


They were considered a workaround for architectures that didn't have native multi-threading. I don't remember when they were dropped, but it was well before the Oracle acquisition.


The JVM ran all its green threads on a single CPU, even if multiple CPUs were present.

When multi core systems became more common, Sun faced the choice of overhauling the green threads implementation or going to us system threads. I don’t know why they made the choice, but it likely was a combination of time pressure (writing and tuning a good m:n green threads implementation is hard) and doubts about the usefulness of green threads on multicore systems.


Preemptive is pretty nice. You can have a for loop that runs forever and not take over.

It does processes nicely they're green thread iirc very fast to create.

Clustering is built into it with security.

It's built for soft real time so reliability is pretty nice much more battle tested than perhaps JVM.

It's not own by Oracle is a big deal for me.


>Clustering is built into it with security

Just to note, distributed erlang is NOT secure AT ALL. You absolutely need your own source of network isolation to keep it secure.

By default the EPMD daemon (that coordinates nodes in a cluster) listens on an open port and accepts node join requests if they contain a correct secret cookie (short authentication string). EPMD has zero protection against brute force attempts to guess a cookie. If your EPMD port is open to the internet it is trivial to gain access to your entire cluster and execute arbitrary code.

See: https://insinuator.net/2017/10/erlang-distribution-rce-and-a...


What's "soft" real time mean, vs .... "hard" real time? Real real time?


"Real time" is defined in terms of deadlines, a system is real-time if operations have specific deadlines, times they take to execute.

The level of real-timeness is what happens when an operation misses its deadline:

* In a hard real-time system, missing a deadline is total system failure. Deadlines tend to be tight, and system correctness or integrity may not be maintained in the face of a missed deadline. Rocket guidance systems for instance, if a process misses its deadline the rocket can veer off-course or blowup or abort or…; other examples of hard real-time system are engine controls or medical devices.

* In a firm real-time system, missed deadline leads to QoS degradation, and the result of the operation is ignored/discarded after a deadline miss. Manufacturing systems tend to be that, missing deadlines will usually ruin parts (possibly for some time afterwards), but the production can usually recover and continue.

* Soft real-time are a relaxation of firm, where the result of the operation is still valid after a deadline miss, but its value usually diminishes as the deadline recedes into the past, real-time audio transmission (phones & stuff) is usually there: you want your audio to be delivered "immediately" but can still use it if there's a small delay (missed deadline), however as delay increases delivering the data becomes less and less useful e.g. a few hundreds milliseconds delay on a phone line is annoying, a few minutes makes it useless.


This is a fantastic explanation of the differences with really good analogies. Thanks.


hard vs. soft real time is a matter of latency tolerance. human-facing systems (like telecom) need to be low latency from a human perspective, which is a higher threshold than what you could tolerate in a mechanical context where 100ms could mean damaging some expensive hardware.


The difference is if your system requirements tolerate missing a timing deadline.

Hard real-time: antilock brakes

Soft real-time: video game frame rate



The JVM is extremely good at dynamic compilation. I think BEAM only has very basic dynamic optimisation capabilities and I don't think BEAMJIT or HiPE were ever as successful as was hoped. But I'm not an expert in BEAM.


BEAM is phenomenal but one area that's still preventing people from adopting it is pure raw performance.

People have to write NIFs far to often to obtain the raw perf they need and in doing so, negates all the tremendous benefits of the BEAM.

I hope more people push BEAMJIT development forward.

http://www.erlang-factory.com/sfbay2017/lukas-larson.html

Maybe Facebook would fund this effort given their huge BEAM investment from WhatsApp use.


I think it depends on project domain. perf. is fine for most web apps for example.


>> "perf. is fine for most web apps for example."

That same argument is made for Ruby and look at how many people leave due to slowness


The thing is that MRI Ruby is effectively single-threaded (unless you use things like concurrent-ruby which aren't that performant anyway) and threads can be blocked waiting for I/O, in fact they do most of the time, and is in I/O where Erlang really shines thanks to BEAM's nature of lightweight preemptive processes.


CRuby is multi-threaded with a GIL. CRuby also supports light-weight cooperative threads called Fibres, which are used to build things like: https://github.com/postrank-labs/goliath/

JRuby has no GIL, JIT, and is significantly faster than BEAM.


Many of them leave to a BEAM language actually, for speed reasons.


CRuby/MRI/YARV is almost as fast as BEAM at web stuff in terms of total throughput and JRuby is significantly faster. Sinatra + Sequel is faster at some things and slower at others. The default Rails setup makes huge tradeoffs away from performance.


Always happy to see BEAM get faster but it won't match a NIF where you can control memory layout, mutate anything, use raw vector instructions. It's a killer combo though. Networking, memory management, high level application control in Elixir. Hot spots in NIFs. Kind of gets you the best of both worlds in terms of performance vs maintainability.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: