
Rustler – Safe Elixir and Erlang NIFs in Rust - hansihe
http://hansihe.com/2017/02/05/rustler-safe-erlang-elixir-nifs-in-rust.html
======
moosingin3space
This, IMO, is quite possibly the best instance of "best tool for the job" I've
ever seen. Rust is great for writing high-performance code, and Erlang is
fantastic for distributed systems.

~~~
jbhatab
code like this gives me polygot chills. Rust + Go = love :DDDD.

------
kibwen
I've been very excited at Rust's prospects as an alternative to C for
extending all sorts of higher-level languages, but I'm most excited at the
prospect of combining Rust with Erlang given the latter's focus on
reliability. I'd love to figure out where Erlang programmers hang out to ask
them what they think about using Rust for NIFs.

~~~
im_down_w_otp
Mostly the Erlang mailing list.
[http://erlang.org/mailman/listinfo](http://erlang.org/mailman/listinfo)

As an Erlanger, and maker of a NIF -> Rust library I wasn't allowed to open
source (sadly), I can tell you I think this is excellent. My current science
project is trying to rebuild some small components of BEAM and the Erlang
Runtime System in Rust that are currently in C.

Mostly the above is an exercise in understanding the internals of BEAM better,
because my ideal end state is to figure out what it would take to make an
Erlang/OTP-like library for Rust (with task/thread/process/whatever
preemption)... which is actually what originally drew me to Rust 3-4 years ago
back when it was shaped more like a systems-Erlang (task supervision
hierarchies, etc) and less like a renovated-safe-C/C++.

~~~
tormeh
>Rust 3-4 years ago back when it was shaped more like a systems-Erlang (task
supervision hierarchies, etc) and less like a renovated-safe-C/C++

Wow, any chance Erlang-like supervision and actor model in general might make
a comeback in Rust? I'm sure the Erlang guys has thought about this, but no
static type checking makes me a bit nervous. Erlang can be surprisingly strict
(a good thing) in some ways, but as far as I can tell, you only discover any
failures at runtime.

~~~
moosingin3space
> any chance Erlang-like supervision and actor model in general might make a
> comeback in Rust?

As a library, sure, but like green threads, actors aren't ideal for a systems
language.

~~~
pjmlp
The guys at ETHZ thought otherwise with Active Oberon.

~~~
moosingin3space
Interesting. Were actors available at the kernel level? If so, how?

~~~
pjmlp
Not actors. I should have been more explicit, as I was referring to green
threads.

------
andy_ppp
Really nice to see this linked here, I was chatting with hansihe about getting
the html5ever parser from servo into a nif and he magically created it here:

[https://github.com/hansihe/html5ever_elixir](https://github.com/hansihe/html5ever_elixir)

It shows how create a threadpool and avoid the 1ms maximum nif execution time
too.

------
fabian2k
NIFs are the fastest method to call external code from Erlang/Elixir, as far
as I understand. But I wonder how high the actual overhead is.

My understanding is that if I still want to ensure that the Beam VM can
continue to schedule all processes efficiently, these NIFs shouldn't calculate
forever, but return quickly to avoid blocking all other processes. So the
straightforward way to ensure this for longer calculations would be to split
the calculation in smaller, parallelizable jobs, if that is possible. But then
the overhead of calling NIFs might actually matter, if you split them into
chunks that are too small.

I've no idea how big the overhead actually is, I'd be interested in a rough
estimate. Is it small enough that I can just ignore it entirely?

~~~
jerf
If you look at what NIFs exist up to this point and where Erlang itself tends
to use that mechanism, they are all for very tiny uses, like type checks and
such. But that's probably precisely because they're too scary to use for
anything very large. The existence of a provably-safe and practical mechanism
for writing them may cause and/or require some changes in the ecosystem to
account for the fact it's now possible to write NIFs that may take a long time
to run.

I wouldn't be surprised that at least one of this library or the Erlang VM
itself develop an official way to easily use a NIF to run a longer-running
process safely. There hasn't been a need up to this point because there hasn't
been a such thing as a long-running NIF.

At the moment, if you're looking at something that may take several seconds
you're probably still better off coordinating something over a port to an
external process or something. Or working with this project to make it
feasible to run a long-running native process.

Trying to write a NIF that could somehow yield back and then be "continued"
later would be a royal pain. Cooperative scheduling was enough of a pain when
we were in the hundreds of processes mostly doing nothing on OSes; trying to
use it on an Erlang server is probably just infeasible.

~~~
dilatedmind
it is almost trivial to write nifs which yield and continue, enif_schedule_nif
exists for this purpose.

nifs are ideal for operations which mutate binaries, eg hashing or unmasking a
websocket frame

~~~
jerf
I didn't just mean the mere ability to yield, which isn't that complicated; I
meant code supporting the common use cases around having a long-running NIF,
such as the person I replied to's comment about having some sort of build in
support for being able to answer back to a PID or something. I could easily
imagine a NIF library that provided an easy ability to set up an independent
thread pool and work sharing mechanism specific to that NIF, which sends
answers back to given PIDs, has timeout support or the ability to query it for
progress, etc. All fairly straightforward stuff to expect to develop over
time, but in a world where long running NIFs are too dangerous to hardly even
contemplate, they haven't developed yet.

There's some some parameters in there that are going to be tricky to set up
correctly (number of workers in the pool for a given NIF has a lot of
implications at scale), but in the end it's not significantly different than
communicating to a separate process on the some machine; it's all the same
resources being used.

------
Anderkent
> This means you can write and run totally safe code in Rust, no worrying
> about segfaults.

This is not quite accurate; you can still segfault rust writing 100% safe
code, for example if you have a large stack overflow ever since __morestack
was killed.

Though these cases are fortunately rare

~~~
hansihe
That's very true.

Do you think I should have written it differently? I might want to add a
clause in the caveats section at the very least.

~~~
Anderkent
Nah, just nitpicking. Though I guess the point is that AFAIK there's no
documentation for all the ways that safe rust code is allowed to segfault; the
distinction between a in-rust panic is that a segfault would probably crash
beam, whereas an in-rust panic might be recoverable?

~~~
kibwen
_> there's no documentation for all the ways that safe rust code is allowed to
segfault_

I wouldn't say "allowed" to segfault. :P The behavior you're referring to is
currently due to a deficiency in LLVM for non-Windows platforms. Here's the
bug on the Rust repo tracking it: [https://github.com/rust-
lang/rust/issues/16012](https://github.com/rust-lang/rust/issues/16012) Its
resolution is long-awaited, to say the least, but it requires someone
proficient with LLVM to do the legwork...

 _> an in-rust panic might be recoverable?_

Anytime you're writing an interface where code is going to be calling into
Rust via C FFI, you ought to be using [https://doc.rust-
lang.org/std/panic/fn.catch_unwind.html](https://doc.rust-
lang.org/std/panic/fn.catch_unwind.html) , which is specifically intended to
prevent panics from crossing FFI boundaries.

~~~
iopq
> This function only catches unwinding panics, not those that abort the
> process.

How do you expect a stack overflow to unwind? I have difficulty imagining the
implementation

~~~
kibwen
Anytime our guard pages detect a stack overflow, an abort is issued with no
chance to unwind. (The segfault bug mentioned above is caused by crafting data
on the stack such that you bypass the guard page, which would go from a
segfault into an abort in the presence of stack probes.) I did not intend to
imply that there was any chance that a stack overflow wouldn't bring down your
process, only to clarify Rust's stance on the theoretical memory safety
implications. :)

