
The Rust standard library no longer has any scheduling baked into it - steveklabnik
https://mail.mozilla.org/pipermail/rust-dev/2013-December/007565.html
======
haberman
> libstd is that much closer to being used in a "bare metal" context. It's
> still aways off, but we're getting closer every day!

This is so cool. Most programming language implementations come with a lot of
baggage. They assume that the VM can stop the world, or that the VM will be in
control of all threading, or that the VM will use global variables, or that
the VM can move objects around in memory whenever it feels like it. VMs like
this are like demanding houseguests that insist that everything conform to
their expectations. VMs like this are not easy to embed and do not "play
nicely" with other software.

Doing the work to factor these assumptions out of the core
language/VM/implementation and into libraries is not easy work, but you end up
with something so much more versatile. Lua is this way, and of course C. It
sounds like Rust is going this route too, and that's so cool to see.

~~~
jerf
On the flip side, though, those assumptions are also what lets a runtime
actually _do things_ for you. In the limit, a runtime that assumes nothing is
simply machine language. Even C has a runtime, whose assumptions can be
violated quite thoroughly by other languages.

The less the language actually specifies, the more likely you are to encounter
the situation where two Rust libraries can't work together because they
actually work under Rust-Alpha (some twiddling of the knobs) and Rust-Beta
(some _other_ incompatible twiddling of the knobs).

I don't know that this is a problem for Rust yet, but it is the first
announcement from the Rust team which has actually dropped my level of
interest in the language. Is there recognition that this is a tradeoff and not
something you can think of as purely a "feature"? Do they realize that this
was a huge step towards potential fragmentation of the language? (Not just in
this particular decision, either, but the general mindset that leads towards
this decision leads to fragmentation.)

~~~
haberman
> Even C has a runtime, whose assumptions can be violated quite thoroughly by
> other languages.

C has a little wrapper around main(), and it has a standard library, but it
has no runtime to speak of.

~~~
qznc
The C runtime consists of stuff like malloc and free. Newer versions of C also
require support for thread-local storage for example. It is not a lot, but
there definitly is a small runtime called libc.

Of course, the C standard also defines the freestanding variant, but there is
not even main() anymore.

~~~
haberman
A standard library is not the same as a runtime. A library you explicitly call
out to, and if you do not call it you can avoid linking the library at all. A
runtime inserts calls to itself in a way that cannot be reasonably avoided,
and must therefore be linked/loaded by every program.

------
wmf
For context, there's been some discussion over the last two months about
whether Rust should use 1:1 or M:N threading or both.
[https://mail.mozilla.org/pipermail/rust-
dev/2013-November/00...](https://mail.mozilla.org/pipermail/rust-
dev/2013-November/006550.html)

~~~
bsdetector
Half a year ago: "I wouldn't be surprised to see Rust at least abandon M:N
soon though once they start really optimizing performance."

With Rust I'm sure they'll drop M:N, if not now then eventually, like
essentially _everybody else in CS history_. It seems great on a superficial
level, but when you really start caring about performance and latency and
fairness it's the pits.

My question is when Go will drop it. Imagine how easy it would be to have C
call Go code if Go didn't have its own threading and segmented stacks; simply
push the arguments and call/jmp. Maybe reference a few objects. None of that
allocating a stack, locking a goroutine to a thread, stealing work to another
thread, etc. You could even embed Go into a C program instead of having to do
it the other way around, and you wouldn't have the problem of adapting
existing programs or rewriting them in Go.

But I bet Go will continue with M:N and segmented stacks for a decade because
it seems like they actually _want_ it to be difficult to interface with
anything else. Like Pure Java, they want you to rewrite everything in Go
instead of just using some existing library.

~~~
qznc
I consider 64bit machines the final nail in the coffin of N:M for next decade.
Since you can spawn millions of kernel threads the argument of easier-than-
callbacks in favor of N:M disappears.

~~~
jlouis
The problem with kernel threads are they require kernel context. This is
usually at least 4-8 kilobytes of memory. Which is about 8-16 times as much as
an Erlang process...

------
Pacabel
I really want to be able to use Rust, but then I keep seeing changes that seem
to go around and around in circles. I do appreciate that it's very very hard
to get things right the first time around and I understand that some
experimentation is necessary. But when it comes to Rust it seems like one
approach is deemed the correct way, then it's built and used, and like most
things it has flaws. Then the complete opposite approach is tried but it runs
into a different set of flaws, usually the ones that caused the other approach
to be chosen in the first place. The Rust home page describes it as a
"practical language" but with all of these back and forth changes it is really
really difficult to actually use it. I don't know if I can keep waiting for
it. When I've got new code to write I'm just going to have to use a language
that I know I can depend on for more than a month or two, like Go or Haskell
or Ruby.

\- Pacabel

~~~
Daishiman
Rust is pre-alpha; it's as green as it can be, and it is first and foremost
(for the moment) a PL research project. You should _not_ be using it in
production, or expecting things to not break between releases.

That said, breaking changes are becoming less and less common. The language
syntax has been mostly stabilized and most changes are going into corner cases
and managing modularity, and much of the possible experimentation has been
fenced to the post-1.0 release.

------
mtanski
I've also have come full circle on N:M / green threads. Now I'm back to
thinking they aren't worth the effort / pain. My experience comes from C / C++
land and not Go and Rust but I think same lessons apply. In experience I ended
up using a few different framework from libcoro to Mordor to raw
swapcontext().

I was initially sold on the N:M model as a means of having event driven
programming without the callback hell. You can write code that looks like pain
old procedural code but underneath there's magic that uses userspace task
switching whenever something would block. Sounds great. The problem is that we
end up solving complexity with more complexity. swapcontext() and family are
fairly strait-forward, the complexity comes from other unintended places.

All of a sudden you're forced to write a userspace scheduler and guess what
it's really hard to write a scheduler that's going to do a better job that
Linux's schedules that has man years of efforts put into it. Now you want your
schedule to man N green threads to M physical threads so you have to worry
about synchronization. Synchronization brings performance problems so you
start now you're down a new lockless rabbit hole. Building a correct highly
concurrent scheduler is no easy task.

A lot of 3rd party code doesn't work great with userspace threads. You end up
with very subtle bugs in you code that are hard to track down. In many cases
this is due to assumptions about TLS (but this isn't the only reasons). In
order to make it work you now can't have work stealing between your native
threads and then you end up with performance problems and starvation problems.

Next thing you realize is that you're still spending lots of memory on
creating stacks for your green threads. Then you realize pthreads just let you
create small stacks for your native threads and then you realize that 8Mb
stacks don't mater much due to delayed allocation. I think that both Rust and
Go have back tracked on spaghetti stacks since they are a lot of work require
the compiler to generate extract code and they have some bad worst case
scenario behavior (where you can get in a look of growing and shrinking a
stack reputably due to a function call in a loop).

The final nail in the coffin for me was disk IO. The fact is that non network
IO is generally blocking and no OS has great non-blocking disk IO interfaces
(windows is best but it's still not great). First, it's pretty low level, eg.
difficult to use. You have to do IO on block boundaries. Second, it bypasses
the page cache (at least on Linux) which in most cases kills performance right
there. And in many cases this non-blocking interface will end up blocking
(even on windows) if the filesystem needs to do certain things under the
covers (like extend the file or load metadata). Also, the way these operations
are implement require a lot lot of syscalls thus context switches which
further negate any perceived performance benefits. The bottom line is that
regular blocking IO (better yet mmaped IO) outperforms what most people are
capable of achieving using the non-blocking disk IO facilities.

This is clearly based on my own experiences. It looks like the Rust folks had
similar experiences. So my hope is that anybody thinks long and hard before
going down the N:M rabbit hole. I ended up studying my mistakes and history is
chock full of people abandoning the N:M model. You can read about the history
of NTPL (which is the threading model in Linux 2.6+/ glibc) versus NGPT which
was the N:M threading model purposed. The 1:1 NTPL model was simpler and
performed better. Freebsd and Solaris moved from their N:M threading models to
1:1 models.

I think the N:M model is going to keep rearing it's head in academic papers
about performance of highly scalable systems but in the real world it's
benefits / performance will keep being elusive. The only counter point to this
is Go that seams to be making a run for it with Go routines.

I should have titled this comment "How I learned to stop worrying and love
plain old threads."

~~~
pron
Here's my experience developing a JVM lightweight thread library[1]:

1\. For the scheduler, we use the JDK's superb and battle-tested ForkJoinPool
(developed by Doug Lea), which is an excellent work-stealing scheduler, and
continues to improve with every release.

2\. For synchronization, we've adapted java.util.concurrent's constructs (we
use the same interfaces so no change to user code) to respect fibers, but
users are expected to mostly use Go-like channels or Erlang like actors that
are both included.

3\. As for disk IO, Java does provide an asynchronous interface on all
platforms, so integrating that wasn't a problem.

4\. Integrating with existing libraries is easy if they provide an
asynchronous (callback based) API, which is easily turned into fiber-blocking
calls. If not, then ForkJoinPool does handle non-frequent blocking of OS
thread gracefully.

All in all, the experience has been very pleasant: callbacks are gone and
performance/scalability is great. Things will get even better if Linux will
adopt Google's proposal for user-scheduled OS threads, so that all code will
be completely oblivious to whether the threads are scheduled by the kernel or
in user space.

Regarding performance, Linux does have a very good scheduler (unlike, say, OS
X), but while there's little latency involved if the kernel directly wakes up
a blocked thread (say, after a sleep or as a response to an IO interrupt), it
still adds _very_ significant latency when one thread wakes up another. This
is very common in code that uses message passing (CSP/actors), and we've been
able to reduce scheduling overhead by at least an order of magnitude over OS
threads.

I would summarize this as follows: if your code only blocks on IO, or blocks
infrequently on synchronization, then OS threads are quite good; but if you
structure your program with CSP/actors then user-space threads are only
sensible way to go for the time being.

[1]:
[https://github.com/puniverse/quasar](https://github.com/puniverse/quasar)

~~~
reginaldjcooper
That is really interesting. If it's not too much trouble to write out, could
you explain what causes the latency difference between kernel wake-up and
other thread wake-up?

~~~
russell_h
Paul Turner explained this really well at this year's Linux Plumbers
Conference. The whole talk is fantastic, but the explanation of what pron is
describing in particular (and how it could be improved) starts around 8:39:
[https://www.youtube.com/watch?v=KXuZi9aeGTw#t=519](https://www.youtube.com/watch?v=KXuZi9aeGTw#t=519)

~~~
reginaldjcooper
thank you very much I love this stuff

------
blt
I think this is a good decision. A language that aims to compete with C/C++
needs to expose the low-level fundamental building blocks that the hardware
and OS provide. Anything high level like a mandatory GC or scheduler will
scare away systems programmers.

