Hacker News new | past | comments | ask | show | jobs | submit login

Yes, that's rather the point: we're talking about highly-multicore server machines (e.g. 16/32 cores, or perhaps far more) entirely dedicated to running your extremely-concurrent application. You want all but one or two of those cores just running the app and nothing else. You leave one or two cores for the "control plane" or "supervisor"—the OS—to schedule all the rest of its tasks on.

It's a lot like a machine running a hypervisor with a single VM on it set to consume 100% of available resources—but actually slightly more efficient than that, since a guest can intelligently pin allocate cores to itself and pin its scheduler-threads to them and then just stop thinking about the pinning, while a hypervisor-host is stuck constantly thinking about whether its vCPUs-to-pCPU mapping is currently optimal, with what's basically a black box consuming those vCPUs.




Is this something Erlang supports, or is it something that you get merely because you've pinned threads to a specific core?

I know with cgroups in Linux you can pin processes to cores pretty easily. Just curious how Erlang makes this easier.


If you mean that in terms of "can you ask Erlang itself to do this for you", then yes: http://erlang.org/doc/man/erl.html#+sbt

If you mean that in terms of "does the Erlang runtime intelligently take advantage of the fact that its schedulers are pinned to cores to do things you don't get from plain OS-level pinning", I'm not sure.

I think it might, though. This is my impression from reading, a year or so back, the same docs I just linked; you can read them for yourself and form your own opinion if you think this sounds crazy:

It seems like ERTS (the Erlang runtime: BEAM VM + associated processes like epmd and heart) has a pool of "async IO" threads, separate from the regular scheduler threads, that just get blocking syscalls scheduled onto them. Erlang will, if-and-only-if it knows it has pinned schedulers, attempt to "pair" async IO threads with scheduler threads, so that Erlang processes that cause syscalls schedule those syscalls onto "their" async IO threads, and the completion events can go directly back to the scheduler-thread that should contain the Erlang process that wants to unblock in response to them.†

In the default case, if you don't tell ERTS any different, it'll assume you've got one (UMA) CPU with N cores, and will try to pin async IO threads to the same cores as their paired scheduler-threads. This has context-switching overhead, but not much, since 1. the async IO thread is mostly doing kernel select() polling and racing to sleep, and 2. the two threads are in a producer-consumer relationship, like a Unix pipeline, where both can progress independently without needing to synchronize.

If you want, though, you can further optimize by feeding ERTS a CPU map, describing how the cores in your machine are grouped into CPU packages, and how the CPU packages are further grouped into NUMA memory-access groups. ERTS will then attempt to schedule its async IO threads onto a separate core of the same CPU package, or if not possible, the same NUMA group* as the scheduler-thread, to decrease IPC memory-barrier flushing overhead. (The IPC message is still forced to dump from a given core's cache-lines into the CPU or to NUMA local memory, but it doesn't have to go all the way to main memory.)

ERTS will also, when fed a CPU map, penalize the choice in its scheduling algorithm to move an Erlang process to a different CPU package or NUMA group. (It will still do it, but only if it has no other choice.)

---

† This is in contrast to a runtime without "native" green-threads, like the JVM, where even if you've got an async IO pool, it just sees an opaque pool of runtime threads and sends its completion events to one at random, and then it's the job of a framework like Quasar to take time out of the job of each of its JVM runtime threads to catch those messages and route them to a scheduler running on one of said runtime threads.

The same is true of an HVM hypervisor: without both OS support (paravirtualization) plus core pinning inside each VM, a hardware interrupt will just "arrive at" the same pCPU that asked to be interrupted, even if the vCPU that was scheduled on that pCPU when it made the hypercall is now somewhere else. This is why SR-IOV is so important: it effectively gives VMs their own named channels for hardware to address messages to, so they don't get delayed by misdelivery.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: