Hacker News new | past | comments | ask | show | jobs | submit login
Enigma: Erlang VM Implementation in Rust (github.com/archseer)
352 points by adamnemecek on April 30, 2020 | hide | past | favorite | 94 comments

Thanks for sharing, author here! (AMA?)

I had a lot of fun working on this project, having implemented enough of the VM to run both Elixir and IEx before I stopped.

Ultimately development stalled since I couldn't get any community interest (I was hoping to give an ElixirConf talk but wasn't accepted either). Was hoping to raise some interest and find some contributors in similar vein to https://github.com/RustPython/RustPython

Nowadays I write a lot less Elixir and a lot more Rust.

I love seeing new implementations of popular languages.

Curious: Did implementing this in Rust expose any bad or interesting behavior when replicating the Erlang language spec (https://github.com/erlang/spec) or whatever reference implementation you were targeting?

If I remember correctly I found a few edge cases, but they weren't ever hit by OTP, just by partially implemented VMs like mine :)

It was kind of interesting exploring the OTP internals, especially some of the parts that haven't changed in a long time. One example is the PAM: I think it stood for "patrick's abstract machine" and it would compile erlang terms into bytecode for pattern matches (intended for fast ETS lookups). It's all there in one file and it took a fair bit of digging to figure out how it works since it's been static for a long while and nothing on the internet really documented it.

The pattern matching algorithm was originally based on the algorithm described in `The implementation of Functional Programming Languages`, the 1987 edition (there are two versions, one is more basic).

Edit: this book is available for free here: https://www.microsoft.com/en-us/research/publication/the-imp...

I was really hoping this comment was going to be by patrick in that weird synchronicity that is demonstrated on hn :-)

Hah! Great find!

If this is still the case you should definitely consider contributing to the documentation of those files. Odds are they'll be used by the next person to try something similar. :)

>"It was kind of interesting exploring the OTP internals, especially some of the parts that haven't changed in a long time."

I have heard similar before at an Erlang meetup. Are there elements of the VM you encountered that were also static and lacking sufficient documentation? I'm guessing much of this is just tribal knowledge deep inside Ericsson then? It would be great if there were a public repository for these things.

There are really two major pieces of the BEAM, the parts implemented in Erlang (e.g. the compiler, OTP), and the parts implemented in C (the VM, or emulator as it is called in the codebase). Virtually all of the C code is undocumented in any meaningful way, short of some internal documentation on a handful of topics, as well as a those parts of the code which have thorough comments that explain some tricky aspect of the implementation.

From my own experience, those parts that are commented or documented tend to clarify some specific design constraints (for example, why processes have multiple locks on different parts, and why they are locked in a specific order, or the rationale of the carrier design); but you never really get a clear picture of why things overall are architected the way they are overall, what designs were considered and discarded due to some deficiency, what tradeoffs were made, etc. I think much of the actual content like that which may exist, is either buried in the minds of the original engineers, or in some internal documentation at Ericsson that has never been released. My suspicion is that you'd need to dig through mountains of emails and such to piece together a more complete picture of how things where put together over time.

It's also the fact that the BEAM just has a lot of really complex pieces built in to it after all this time. Everything from binary pattern matching and construction, to garbage collection and memory management, ETS, Mnesia, etc. Each one of those things is not only non-trivial, but have evolved significantly over time, through the hands of many engineers. It also doesn't help that large portions of the C implementation are written in an extremely macro heavy style, which makes it quite hard to read without knowing what all the macros do and how they play together.

Projects like Enigma, or Lumen, have a lot to give back to the community in the form of documenting how these pieces are built. Unfortunately, the lack of a specification for the Erlang language and its runtime, means it is very much a grind to work out how things are currently implemented, and why.

Thanks for the comprehensive insight. This does kind of invite the question though that how they themselves maintain these code bases if it's steeped in such arcana. Is there not the danger that this becomes similar to the situation with mainframes and Cobol for them?

I suspect there exists some internal documentation at Ericsson that just hasn't been cleaned up and added to the source repository - but it's entirely possible that the core Erlang/OTP team solely relies on passing the knowledge on from engineer to engineer.

I'm also not saying that the C code is unmaintainable. It's definitely a bear to dive into, but by spending enough time with it, it starts to unfold in front of you. The main issue I have, is that none of the specification/design documentation exists as part of the source repository. Maybe it doesn't exist at all, but in that case, I'd really hope that some of those core engineers would have taken the time to write some of that stuff down. In any case, none of it is readily available AFAIK.

A new Erlang VM with just replicated functionality is a fairly hard sell to the Erlang/Elixir community, who brag with the industrial track record of the BEAM.

I believe you'd get much more interest if there was some ambitious new promise for this new VM, such as 10x sequential performance etc.

Someday this is going to need to happen though. IMO, "the right way" to do this is via the strangler pattern:


Probably the language that is most poised to achieve this is Zig; it would be feasible to start by wrapping the entire BEAM in a zig compilation unit; which at the very least potentially offers an easier path to maintaining the codebase across multiple platforms. Followed by hodgepodge doing bits and pieces in zig, which could be achieved via straightforward transliteration at first.

The very different mindset of the rust PL lends itself to total rewrites, which I don't think will sit well in the BEAM community. On the other hand erlang has tons of strange rewrites happening over its own internal ecosystem all the time (gen_fsm -> gen_statem, pg -> pg2 -> pg), etc.

Do you know of any examples of this being done with Zig? I can think of a couple with Rust:

- https://gitlab.gnome.org/GNOME/librsvg completed a migration to Rust.

- https://github.com/RazrFalcon/rustybuzz and https://github.com/immunant/rexpat are making decent progress.

No, because zig is still in 0.6.0, and the BDFL says "don't use this in prod yet"? Yeesh.

Perhaps then, for the sake of setting those expectations, the wording "the language that may be most poised to achieve this in the future is Zig" would be preferred?

Last I checked "poised to achieve this" implies the future. Moreover I have specific reasons to say that, namely that zig now ships with C compilation with transparent cross-compilation support so you can use it as a direct replacement for C and (I believe make), so the statement is based on current zig features whose importance I alluded to in the comment.

Fwiw this is also a real problem as I am often hearing complaints about windows ABI support in elixir, anyways. Moreover, having written nifs, I have to say the c header for nifs is basically unintelligible without manually parsing the DEFINES because of windows cross-support needs.

Please let me know if I'm using "poised" incorrectly.

To my mind, "poised" in this context is roughly synonymous with "ready". If the official position of the project is "don't use it yet" then it's not ready yet.

The future implied is the completion of the action one is ready for, not further work of preparation for that action.

Your take may, of course, differ.

Yeah I definitely didn't see it being production ready any time soon, but I thought it was an interesting project for people that wanted to learn BEAM internals. That's how I got started with it at least, I had problems trying to contribute BEAM just because of the sheer size of the codebase and lacking the domain specific knowledge.

I do think that having alternative implementations is good for experimentation though, similar to how Ruby was improved upon ideas from JRuby and Rubinius, even if most users never used those two directly.

> That's how I got started with it at least, I had problems trying to contribute BEAM just because of the sheer size of the codebase and lacking the domain specific knowledge.

And the fact that a LOT of BEAM was terribly undocumented. I'm impressed you got anywhere, personally. The last time I looked at BEAM to understand some weird behavior (that eventually did turn out to be a problem in Erlang/BEAM), it was completely impenetrable.

Your implementation is probably useful as documentation, if nothing else.

100%, when I started building Lumen, I spent an enormous amount of time working out how various parts of the BEAM runtime were implemented, and it was (and still is sometimes) a grind. That macro-heavy C code is just such a bear to read. I understand why it was written that way, but working with it is just unpleasant.

Having Enigma, and other implementations like it, provides a huge value in terms of understanding how it all fits together - understanding the Enigma implementation and then going and trying to make sense of the BEAM would probably be a way better path than trying to dive into the BEAM straight away.

We do a lot of Erlang work here. The BEAM is so reliable that I wouldn't spend 1 minute looking at an experimental alternative.

Yeah, this. I'm just getting started with Erlang and I already feel like an idiot for not using it sooner. When I think of some of the things I've done to try to achieve what the BEAM does out-of-the-box...

Can you give some examples? I've been getting more and more interested in Erlang.

RPC is basically built in, so you'll probably never use REST internally. There's an in memory database built in (ETS) that will replace redis for most key value storage cases. There's easy recovery from crashes via supervision trees and associated features. You can do hot upgrades while your system is fully operational.

A few protips: hot reloads are outside of a few corner cases generally unnecessary these days. Be careful about erlang rpc if you expose it in any fashion over the network and follow the eef security working group guidelines re: erlang term format. I use Elixir's plug.crypto.non_executable_binary_to_term/2

Thanks, I knew both of those bits of info, but I didn't know about the guidelines and how to deal with the RPC issue, very helpful.

to be fair, the guidelines are hot off the presses. I literally went to CodeBEAM, ran into to Bram Verburg (whose fantastic X509 library i had just found a week prior) and he told me about those guidelines, which had just hit hours before we spoke. Bram is great, he's very modest, and was amused that I thought he was some sort of superstar.

Well I'm "hot off the press" when it comes to elixir, so it's a good time for me to learn this stuff, thanks for the link below, too.

You wouldn't happen to have a link handy would you? It would be much appreciated.

Bless you! "You're a scholar and a gentleman." :-D

Actually I'm a bit of an asshole if you really get to know me, but I'm usually nice on the internet. Thank you!

When I used Erlang in 2005, there was a problem with message queues growing and OOM-ing the process. The standard solution to that is RPC flow control and backpressure. Does this problem exist with modern Erlang?

Google's gRPC/Stubby has the same problem. It's one reason why gRPC servers are never deployed on the public Internet and always use hefty reverse proxies.

I always feel like Erlang and more generally BEAM is the language for a lot of today’s problems but will never see any kind of traction because of “worse is better”.

Can you share something on this reliability? Is it more reliable than the JVM? Or is it more predictable in terms of performance? I’ve found the jvm itself to be rock solid as well.

Erlang VM is consistent/predictable in terms of latency, it's engineered that way. For pure computational performance you won't find it optimal.

Erlang/OTP and BEAM have been around a long time and is probably just as mature as the JVM and Java, but they have slightly different goals: very low tail latency, easy error recovery, and easy distributed compute in erlang vs. better customization and median performance for the JVM. I'm sure there's more differences and more nuance than that, but I'd be very happy to use either language/VM in production.

It's also harder to write abjectly wrong code in the BEAM. I found out that my code was effectively self-forkbombing, because I had hooked up an error reporter (I adapted myself because the version I wanted had bitrotted) to rollbar which (very rarely, when the service was in outage) cause a crash, which would try to report, crash, and exponentially grow. Amazingly, this caused no service degradation for our customers.

There is some go projects like [1] that can connect to Erlang nodes and claim to be speedier.

[1] https://github.com/halturin/ergo

If the VM is in Rust could it be compiled to WASM?

You wouldn't want to, for various reasons. See this blog post about Lumen and the decision decisions:


You wouldn't want to _right now_. However for almost all points there is a solution underway/in planning. In a year or two it might be feasible.

There would however be other limitations, like filesystem APIs etc. not being available in the browser that a lot of frameworks in BEAM languages expect, that would severely limit the usefulness, though I guess that applies to either implementation strategy.

I can't imagine a situation in which it would be acceptable to download an entire (WASM compiled) copy of the BEAM from every new website I visit.

Full AOT compilation with dead code elimination seems like a much better fit for minimising that bundle size.

WASM isn't just for the web, though. It's beginning to look a lot like the next "portable ISA" (successor to Pascal's pcode, applet-era JVM bytecode, Google's PNaCl, etc.)

Add a progressive web-app to your mobile's home screen—get a cached WASM bundle, for near-native performance of the resulting "app." Download an Electron app—get a WASM bundle wrapped in a projector.

Even then, there are still browser use-cases where "full fat" apps are no problem. The "apps" in the Chrome Web Store—Adobe has a version of Photoshop in there, I think—make a lot of sense to be WASM. You'll launch them once and keep them open for hours.

I think the point though is that architecturally there are performance hits if you don't respect the fact that WASM has a different architecture than what the BEAM expects (harvard vs von neumann IIRC), so you may NEVER want to if you get it right in the first place.

The Harvard-Von Neumann dichotomy is wrong. C also works perfectly fine on Harvard architectures--it's why function pointers in C are special, aren't guaranteed to be convertible to a void pointer, and why uintptr_t is optional. POSIX adds these additional guarantees to support dlsym, which returns function addresses as void pointers.

The problems with compiling other languages to Web Assembly are primarily 1) lack of goto and 2) inability to instantiate and jump between different stacks. These limitations are especially problematic for languages like Erlang/BEAM and Go because Web Assembly-based VM implementations require an extra level of indirection in order to implement some of their core language semantics, resulting in quite slow performance compared to even a pure, strictly compliant C implementation (and presuming the WASM VM itself adds no overheard, which is not actually the case).

WASM excluded goto support because it was argued that the relooper algorithm required to translate goto constructs to structured WASM statements was sufficiently capable to cover the vast majority of existing code. And they provided evidence to back up that claim. The flaw in that reasoning is that language implementations and similar niche cases have special needs that application code rarely requires, and in that space constructs like goto are crucial to both simplicity of implementation and performance; the inadequacies of relooper become the norm rather than the exception.

thanks for the clarification!

Just being able to amend job requirements with "or rust experience" is most probably worth it.

Do you think it's reasonable to make a Rust library that allows you to do Erlang style binary matching?

That's what I was originally looking for when I found this.

If your okay with macros, probably nightly only, then it seems reasonable.

There is also slice matching on stable, which let's you match on parts of slices: https://github.com/rust-lang/rust/pull/67712/ . It went out in 1.42. It has some stuff which makes binary stuff easier, but not by much. But perhaps someday you'll get native binary matching in the language that's closer to what Erlang offers.

It made it in the 1.42 release.

^ agree with this, slice_patterns fulfilled my needs on the byte level at least. Bit level you could extract some parts of https://github.com/archseer/enigma/blob/master/enigma/src/bi... and wrap them in a macro but it's a bit clunky. I tend to just do it by hand (match a { a if a ^ 0b1100_0000 => ... } etc)

Nice work! If I wanted to learn how to build VMs, how would I start? My experience is in backend development/distributed systems in Java and Go (so assume I know nothing outside of an introductory OS course)

The "Crafting Interpreters" book by Bob Nystrom is probably a good way to dig in. It has a whole chunk on implementing a VM: http://craftinginterpreters.com/a-bytecode-virtual-machine.h...

Thank you for posting this! Thanks to you I'm digging in to this book right now and having a blast. The author is an engaging writer and it's been tremendous fun thus far.

His book, "Game Programming Patterns," is great as well: http://gameprogrammingpatterns.com

This is pretty engaging, thank you!

Have you considered using C2Rust[0] where applicable instead?

[0] https://c2rust.com/

How did you implement the GC? Is it possible to implement an allocator + GC in Rust without hitting UB?

You can simply use unsafe as an escape hatch

Sure, but would that eliminate UB? C++ is all unsafe, and still full of UB gotchas.

A VM is more than enough of a project, but are there any thoughts of a Phoenix port?

Why would you want to port Phoenix? As long as the underlying VM follow the spec, Phoenix should be oblivious of on which it is running.

Why start if you have no intention to see it through? Talks and adoption by the community are a chicken and egg problem, if you don't believe enough in the project to give it staying power then the community is right to not adopt it: they already have a VM for BEAM and it works well. Without an additional selling point 'now in Rust' doesn't cut it.

> Why start if you have no intention to see it through

The author had a threshold of good feedback they needed from the community in a certain amount of time. They got the feedback they needed - people aren't interested in it, probably because of the latter part of your comment.

I don't think that's a valid reason to ask why someone started a thing, people start things for a variety of reasons. As far as I am concerned, they saw the development of a reimplementation of solid tech through and learned a lot from it.

> they already have a VM for BEAM and it works well. Without an additional selling point 'now in Rust' doesn't cut it.

This is spot on though.

I find this attitude puzzling. If they wanted to start, why not? It seems, according to the comment by the author itself, that they had a lot of fun, and it seemed like an interesting project. So I don't see any reason to back away from it? On the other hand, for the precisely same reason, I see perfectly reasonable that they didn't continue with it. If the fun is gone, why continue?

I see this tendency a lot from the Rust community where there's a lot of "now in Rust" being built, expectations had and then hurt feelings when they're either ignored or shown the door. The community seems to think that "now in Rust" _is_ the selling point. Tools are just tools.

What they don't realize is that they're often building solutions that are looking for problems, rather than solutions to solve problems. It's also vaguely cultish in the approach.

It's a terrific language and there's a lot of learn from it, but I'd like to see it solve real world problems on its own versus try and screw itself into everyone else's.

Projects can sometimes be just for fun. I started with two goals: 1) Learn how Erlang internals work 2) Learn Rust (this was the first time I was writing Rust)

After I achieved both of those I decided I'll keep going if there's any interest in the project. There wasn't, so I moved on.

That's a shame. I have a very keen interest in the project and would contribute if I wasn't on the job hunt currently.

Before even checking the GitHub repo, how are your docs? Can an outsider make a deep dive into your code without needing to pester you with questions?

While it's undeniable the majority of things to come out of Rust are mostly superfluous rewrites of already solid projects (to your point about "now in Rust!" being the selling point) I think it's clear the author in this particular case was just looking into BEAM internals and started a fun project, so I don't think it applies here.

In general, though, I think people ought to consider that if they are putting the language they wrote their project in in the marketing blurb for it (given a more serious project), maybe that indicates that the project itself is of little value to other people. "* Written in Rust" isn't a value proposition, it's just an implementation detail. Make real claims about zero crashes, zero leaks, something actually concrete and it can be scrutinized for real.

I think you're possibly making a faulty assumption here. You're right that "written in Rust" has no particular value to people who just want to use the thing, but that's not the only perspective people bring to open-source software. When somebody markets a project as "X written in Y," I generally assume they're marketing to people who might want to hack on it, and it is relevant from that perspective.

Even for users it can be valuable. For example, an Electron application gets an automatic pass from me.

Written in X can definitely be a value proposition.

If you are evaluating a tool/lib/etc that moves at a fast pace and your whole shop is extremely fluent in language X there's huge value add to being able to dive in without a context switch to understand how it works, especially when debugging harder problems.

I don't think it applies to _this_ case where we're getting a VM that is _extremely_ battle-tested. Am I going to use a new OS instead of linux in production because someone tried to write an OS in zig? No. Will I congratulate the author for writing an OS in zig? Yes.

If I am looking for a key-value store and two are equivalent in their purported features and stability, I will choose the one that is written in X that my shop is most fluent in.

A similar project is Lumen [1], which is targeted primarily at WebAssembly.

[1]: https://github.com/lumen/lumen

Yep! They appeared shortly after I started and were backed by Dockyard. I think they didn't have much success finding external contributors either though :/

The project is still under heavy development to get to a point where external contributors make sense. We hope to be there soon!

Best of luck!

Have you considered joining efforts with the Lumen team and merge your work there?

I think we're coming at things from sufficiently different angles that it's not really necessary to merge our projects, other than perhaps some common parts that could benefit from that kind of sharing. Enigma is really a faithful reimplementation of the VM in Rust; while Lumen is an ahead-of-time compiler, which requires a very different approach.

I think the goal of Enigma in making the BEAM architecture easier to understand, and providing a great learning platform for getting involved in working on the BEAM itself, or just on Enigma as an alternative is a great idea, and something I know I wish I had been able to have on hand when I was trying to understand the deep inner workings of the BEAM implementation. I if Enigma was only ever that, it would still be worth the effort spent on it.

Lumen does have parts that could likely be shared with projects like Enigma - namely the high-level IR we use in the frontend (EIR, Erlang Intermediate Representation). That IR could be used in any Rust-based VM/compiler targeting Erlang, or an Erlang-derived language, and work there would directly benefit downstream consumers of the IR in terms of better optimization, etc. We've also put work into our term representation, and various parts of the runtime, like reproducing the core parts of the BEAM garbage collector, etc. While some of those things are intertwined with the compiler, much of it could be easily extracted and used elsewhere.

Agreed! I think both projects can coexist, but there could be some components that we could share. I was in talks with Hans last year about potentially supporting EIR opcodes in Enigma.

I do think that Lumen is potentially more interesting in the real world usecases though, and I wish you best of luck. Having an alternative Erlang compiler is a lot more complex but allows us to explore new techniques easier that could potentially be ported back to OTP if nothing else.

Absolutely! I strongly believe we should be building alternative implementations of the Erlang compiler, the runtime, the virtual machine - all of the major components. It provides a lot of useful data that, if nothing else, can be used to improve the original implementations in the BEAM. Enigma is a great example of how reimplementing such a core piece of that infrastructure can provide better learning opportunities for the community, as well as a way to clean the slate and start with a fresh look at the problem with modern tools.

If nothing else I'd really like to see projects like ours demonstrate the value to the core Erlang/OTP team in addressing the lack of documentation in some areas - ideally in the form of one or more specifications. Erlang deserves a specification at this point - it is very stable, and a spec would at the very least provide additional structure for future evolution. Core Erlang had a specification, but it is very much out of date at this point - considering how widely it is used as an IR for BEAM languages in general, it's disappointing it hasn't been kept up to date.

Another one from a different author (with slightly different goals): https://github.com/kvakvs/ErlangRT

Also check out kvakvs' BEAM Wisdoms site: http://beam-wisdoms.clau.se/en/latest/

It was a big help initially to grok the bytecode format.

> sans the distributed bits for now

I thought the very point of Erlang was the distributed nature of BEAM? Failure is normal etc?

I wanted to tackle distribution eventually, but it was also by far the most complex part of the implementation.

I would love an erlang implementation where there can be many versions of the code in memory, where you could re-order the message pattern matching at runtime, where you can specify arguments to functions in terms of a map, specify the args and the types of that map specification, and have it compile into numbered argument, that way you don't have to add update many many functions to add another argument.

> many versions of the code in memory,

You can model that today if you script compilation/loading. When loading a module, you could first load it as {?MODULE, ?VERSION} or ?MODULE_?VERSION if we can't stuff a tuple there, and then also load it as ?MODULE, to use when you don't specify a version.

The hard part is deciding what version to call when, and passing that through to the call sites. And also, to figure out how to signal a process that you want it to update its version.

> where you could re-order the message pattern matching at runtime,

Pattern matching order is part of your code, and hot loading is the way to make changes to your code.

> where you can specify arguments to functions in terms of a map, specify the args and the types of that map specification, and have it compile into numbered argument, that way you don't have to add update many many functions to add another argument

You could do this yourself today as well; a function could check if its argument is a Map and demapify the arguments, or you could make a utility call_function(Module, Function, Arity, Map) that demapifies and calls erlang:apply on the function. Or, you could have your rapidly changing functions all just take Maps; I did that in the past with Proplists.

Thanks for these suggestions!

Shouldn't there be a way for the compiler to know enough to convert the map to positional arguments, so long as the map params could be put into a spec? Something like that would be super nice, because I think tail calls in Erlang cost nothing so long as you keep the same arg positions. Having a map is always a convenient way of starting a function and evolving it, but having it with a few more constraints and with equal performance would be better. If I'm not mistaken I remember dart made a similar optimization.

If the functions are exported, the compiler wouldn't be able to optimize calls. Dynamic modules and hotloading means there's no expectation that what you had when compiling is what you're running against in production.

If they're in the same compilation unit, the newer SSA based optimizer might do some cool stuff for you; it's always interesting to look at the optimized code that comes out of compiling/loading.

In general though, I would tend not to worry about efficiency of passing a map instead of traditional arguments. Most likely there's other, bigger, things to worry about; usually finding the right communications patterns and algorithms makes a bigger difference.

You could do that already. You can compile code at runtime from binary (self modifying code) if you need or you could extract that into several dynamically compiled modules and call them based on argument.


Different versions of code in memory - that sounds like nightmare to debug. Erlang already stores two versions of newly compile code, old one for processes currently actively using that code and new one for all others. Once all processes jump to new code (by exiting from old function module or calling into new version with module:function call) old code is purged.

I love it; I don't think everything has to be useful to be worthwhile.

What are some good books on the topic of VM implementation? If no books, then papers also welcome.

(Not necessarily related to Erlang)

Crafting Interpreters is a good book, free to read online. For papers you can read "The implementation of Lua 5"

This is cool, but looks like development has stalled. Last commit was Sept 2019.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact