Hacker News new | comments | show | ask | jobs | submit login
Erlang Scheduler Details and Why They Matter (hamidreza-s.github.io)
188 points by hamidreza-s 414 days ago | hide | past | web | 30 comments | favorite

I am enjoying the steam that both Erlang and Elixir are gaining here on Hacker News. They're both interesting, well designed languages built atop a rock solid VM with some great abstractions for concurrency.

I'm hoping that this trend continues because out of all the noise of new languages and frameworks I feel like Erlang/Elixir are very deserving of becoming the new way forward for developing anything that runs on a network.

Absolutely this. I've enjoyed JavaScript the past few years. (Dare I admit that). And the only light I see is something like Elixir that has a fundamentally superior foundation for scale and concurrency. I'm intrigued and am hoping elixir / erlang / Phoenix keep growing.

I stumbled onto an Elixir article a few months back (on HN) and was enchanted by it, ever since then I have been banging away in my free time at Elixir books and tutorials, finished intro to Elixir last night and half way through Programming Elixir.

It is a super exciting language to learn and those moments where the ball finally drops on concepts like pattern matching are glorious, it has reignited that magical feeling of being some sort of tech wizard that I first got when I started programming as a child!

Anyway, enough of the gospel of Elixir, I too look forward to seeing how it/Erlang develop and feel like it is already getting some noteworthy traction.

P.S. I also scan HN daily for anything Erlang/Elixir, and frequency of posts seem to be increasing.

The Erlang scheduler is very interesting. I attempted to reimplement it in JavaScript with some success. It works like the first version of the scheduler but I could see web workers being used to make it more like the current version. https://github.com/bryanjos/processes

Lots of Erlang articles making it to the front page... I like.

Is there a resurgence of interest in Erlang because of Elixir? Or has the Actor model really taken a hold of developers because of it's direct and easy to understand model of concurrency?

The short version:

* 'event driven' things like Node.js got popular because they use fewer resources to serve the same amount of data (generally). But for a lot of people, JS is not really their idea of a good time. Erlang (and Go) fill this niche pretty well.

* Elixir finally made Erlang more palatable to more people. Jose has done a superb job with it, because it's not just a nicer syntax, there is a bunch of nice stuff he's built into it. It doesn't hurt that as a former(?) member of the Rails core team, he has a deep understanding of web programming and its requirements.

* It is solid, solid tech. There's lots of new development happening on top of, say, Elixir, like Phoenix, but the underlying system is pretty hardened.

There are lots of demands for making things concurrent, fault-tolerant and distributed nowadays, not only in big companies but also among startups. Erlang already has those features as well as standard tools (OTP) which lets everyone to create something simply.

However among all the languages which run on top of Erlang virtual machine (BEAM) like LFE, Elixir, Efene, Luerl, Erlog, and such the Elixir has an active and bigger community, more interesting frameworks and wider adoption. So it could be claimed that it helped to introduce Erlang values to a wider range of developers.

I think there's a growing appreciation for functional style more generally (e.g. pattern matching) and Erlang's just caught up in it. Compare OCaml.

A lot of the rails community is migrating to Phoenix so I would imagine it has something to do with it.

    in case of high level platforms, languages or libraries     
    it can be claimed that Erlang virtual machine is almost 
    unique because JVM threads depend on operating system 
    schedulers, CAF which is a C++ actor library uses 
    cooperative scheduling, Golang is not fully preemptive 
    and it also applies to Python’s Twisted, Ruby’s Event 
    Machine and Nodejs.
Most languages/runtimes that are taking concurrency seriously do so using libraries or frameworks (eg Akka on the JVM) that look a lot like Erlang's under the hood. Erlang is not quite as unique as it was, but obviously baking it into the language is v useful.

Akka is not preemptive. That's the problem with the library solutions, it's still a squarish peg in a round hole.

As far as I understand, you can make an actor reentrant which frees the thread. It's definitely a leaky concern compared to Erlang's process implementation.

My understanding is that isn't really preemption, rather it's cooperative scheduling. As soon as the scheduler hand execution to the actor, everything else in the system has to hope there isn't a bug in the actor that permanently ties up that thread.

Good point. I guess I find it like a hybrid where you don't benefit from the cooperative side and don't achieve true preemption. I think either end would be better than keeping the middle ground.

to be fair, the original article is fundamentally incorrect and Erlang is not truly preemptive either. It uses a reduction-counting based cooperative multitasking system which yields at function calls, but a badly programmed NIF can still ruin your day if you don't put it on a dirty scheduler.

You could say that no scheduler is truly pre-emptive, because it can't interrupt a process in the middle of a machine instruction.

Implementing NIFs should be done with the same care you would use adding a new machine instruction to your processor ;-)

There are N ways that badly coded C code loaded into a Unix process can send things to hell in a handbasket. "All bets are off" as they say.

In this case just taking more than a millisecond can cause scheduler collapse. So it's a pretty easy mistake to make.

Although writing C code for NIFs is not a regular task for Erlang developers, it must be done with extreme care because not only a long-running NIF could degrade the responsiveness of the VM, but also when it crashes the whole VM will crash.

However when there is no other options except writing a NIF, there are ways to protect yourself:

1. Your NIF should return less than a millisecond.

2. If the item 1 is not possible, split it into shorter NIF calls.

3. If item 1 and 2 are not possible so you have a dirty NIF. It is a NIF that cannot be split and cannot execute in a millisecond or less. There is an experimental feature in Erlang virtual machine which is called "dirty scheduler". When it is enabled some other schedulers are ready to execute the dirty NIFs, so they won't interfere with the normal operation of schedulers.

4. If item 1 and 2 are not possible and you don't want to use dirty schedulers, the +sfwi emulator flag is available to force normal schedulers to wake up again from the collapse situation.

These items are some solutions to remain in normal scheduling state even in case of writing the native functions in C (NIF), but what the article says is about just Erlang code which is run by schedulers and are preempted with no trouble as soon as they reach the reduction limit.

This. Behavior is much better these days but dirty schedulers and ports are still the first places to consider putting C code. Only when you know you've got a solid implementation should you upgrade it to a NIF.

A short note on how hard this is: Until recently in 18.0, there were many BIFs (built-in functions that are part of the VM) that could possibly cause the same scheduler collapses. If the VM developers don't always get it right, the chances that some C code will, is very small. Tools like QuickCheck can help in testing the inputs and outputs but it's hard to setup complex VM stress states and thus very hard to make guarantees about NIFs.

While there is definitely overhead, I'd say regular external "ports" (OS-level subprocessing) are quite underrated from what I see in more recent Erlang code. There's a lot that can be done this way if port communication is carefully designed.

I like to say that in NIFs (and port drivers) all the bets are off anyway, most notably process isolation and fault-tolerance. The guarantees such as "preemption" and fault-tolerance happen in the layer above (i.e. pure Erlang), and can only be satisfied if the underlying layer behaves properly.

To my knowledge, Erlang uses a cooperative scheduler. The process itself counts downs its reductions and yields when it is time. The programmer can call erlang:yield() to do this before time if he wants. Calling something like receive will also yield. I guess it comes down to definitions, but since Erlang is said to have only soft real-time (as opposed to hard real-time) properties, this makes more sense. The schedulers doesn't give any guarantees as to when a process gets to run and the scheduler never really preempts a process. Again, to my knowledge.

Great article though!

That's a good pint! I also can say that it is a matter of perspective. From an Erlang developer point of view, the processes are executing in a preemptive manner without telling them when to yield. The important factor is the reduction limit of the actors as well as being reentrant. It guarantees that each actor will sure yield even if it is still running and have something to do, and will be selected whenever scheduler select them again for execution.

I was also confused by this phrasing. The only thing I could think of was that from the programmer's perspective, it's not cooperative (you don't have to call yield), but that's also true of threads.

I see people down thread repeating this claim. Can anyone explain how the scheduler is preemptive?

You are absolutely correct, the schedulers never preempt a process in the classical sense, but from an Erlang code perspective, the system behaves as if it would be preemptive. The schedulers are actually running a byte code interpreter and the interpreter will switch processes once a certain number of reductions is reached (or the code yields). This breaks down when running native code in the form of NIFs - it is entirely possible to block a scheduler indefinitely while running native code.

I like to think of it as preemptive because (putting NIFs aside) it's not possible for a single process to completely block the entire scheduler, even if it ends up in an infinite CPU bound loop.

So in other words, a task (i.e. process) doesn't have to cooperate with others. It can do whatever it wants, be it I/O or CPU bound, and doesn't have to yield explicitly. Due to aggressive "preemption" it won't block other pending tasks significantly.

In contrast, if this is not guaranteed (which is not in most other Erlang "alternatives" I've looked at), then a task might accidentally paralyse large portion of the system.

I've been into Erlang for about 5 years now and once I found it, I never looked back. They really nailed it and many competitive technologies like Go don't have a preemptive scheduler yet. I wish Elixir had a Pythonic syntax, but I'm all for anything BEAM gaining steam and have enjoyed using Elixir.

I personally feel that if someone made a Pythonic version of Elixir, say "Hydra", it would take over the world. If anyone starts such a project, look me up!

I second that. Elixir is great, but I was not a Ruby programmer, and prefer other syntax. I've tried LFE, and I love Lisp, but it has not gained traction. Perhaps I'll look into Luerl. Both LFE and Luerl were created by Robert Virding one of the original workers on Erlang with Joe Armstrong. I just might go with straight Erlang. I don't think the syntax is that bad.

FYI racket and gambit VM use preemtive green thread, but they have nothing to do with SMP.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact