
Go-style concurrency in C - Jerry2
http://libmill.org/index.html
======
dfritz
This is interesting, but I don't really understand the differences with the
existing coroutine libraries like libtask, libconcurrency or libcoro.

Of course, it uses the longjmp/setjmp trick, as suggested by Dmitry Vyukov.

[http://www.1024cores.net/home/lock-free-
algorithms/tricks/fi...](http://www.1024cores.net/home/lock-free-
algorithms/tricks/fibers)

However, I'm not sure of the benefit of this approach, compared to writing a
few lines of stack switching assembly code per architecture. Especially, how
portable are the implementations of longjmp and setjmp across the multitude of
C libraries and operating systems?

But more importantly, it doesn't seem to solve the two main issues of
coroutines libraries written in C that were already solved in Go:

\- How does the stack can grow or shrink? (specifying the stack size in the
worse case scenario doesn't scale well)

\- How can a coroutine be moved to another operating system thread and back?
(useful when blocking or when parallelism is needed)

------
aji
This is really dang cool. I started a similar project[0] but have tabled it
for now, since I'm not sold on the usefulness of making C easier to use this
way.

The way I see it, there are two ways to use a library like this that are worth
considering: adapting existing projects, and starting new ones. In my opinion,
the gains to be had from rearchitecting an existing project to use such a
library are minimal, and for the vast majority new projects, C is not the best
choice these days. If you want the performance of C with the programming style
of Go, you can probably get away with just using Go!

That said, it's nice to see a polished, working implementation of this idea.
Definitely something worth being aware of.

[0] [https://github.com/aji/libco](https://github.com/aji/libco)

------
gluczywo
> "Libmill is intended for writing single-threaded applications."

You can do concurrency on a single thread. This is the message I find it hard
to get across when explaining CSP concurrency libraries.

Some people argue about usefulness of such libraries as being not native and
leaky abstraction. I developed Go-style concurrency "port" to Tcl:

[https://github.com/securitykiss-com/csp](https://github.com/securitykiss-
com/csp)

and it is one of few developed-in-house things that became the productivity
booster in my toolbox.

The interesting thing about it is that thanks to the expressive power of Tcl I
was able to port not only the semantics but also almost completely mimic Go
syntax.

~~~
vanderZwan
I don't use Tcl but that looks nice.

> _You can do concurrency on a single thread. This is the message I find it
> hard to get across when explaining CSP concurrency libraries._

Heck, there's an entire language built around single-threaded concurrency!
Compiles to C, coincidentally - it targets embedded platforms. It's not quite
CSP style though, it takes after Esterel and uses synchronous reactive
concurrency:

[http://ceu-lang.org/](http://ceu-lang.org/)

However, as soon as you have multiple Céu programs communicating (for example,
multiple Arduino's) the program-to-program communication is fairly CSP-like.

~~~
geon
Don't forget Javascript.

------
Dowwie
There's more to report than simply an awesome library. Two days ago, someone
allegedly unfamiliar to the project (Paul Banks) appeared from the ether,
applied a few "simple" patches, and improved the performance of libmill _7
fold_ (for OSX at least). That just doesn't happen, ever, in open source.

[https://twitter.com/sustrik/status/666360179802402816](https://twitter.com/sustrik/status/666360179802402816)

~~~
thechao
The twitter message you linked to says this:

    
    
        The speedup was caused by replacing setjmp by _setjmp though.
    

I know this may sound mysterious, but for BSD/OSX... "yes". It is "well known"
(if you're the sort of person who writes coroutine runtimes [1]) that _setjmp
is far faster than setjmp. It's morally equivalent to 'sigsetjmp', but more
portable, if less standard.

[1]
[https://github.com/jaroslov/coro/blob/master/coro.hpp](https://github.com/jaroslov/coro/blob/master/coro.hpp)

------
intull
Martin Sustrik, the author of libmill was also interviewed in FLOSS Weekly
podcast a month ago - [https://twit.tv/shows/floss-
weekly/episodes/358](https://twit.tv/shows/floss-weekly/episodes/358)

------
eps
From an earlier discussion
[https://news.ycombinator.com/item?id=9947749](https://news.ycombinator.com/item?id=9947749):

> _If I understand correctly, this is a framework for non-preemptive
> cooperative multitasking whereby provided API functions serve as yield
> points. Pretty much what Windows API was before NT /2000\. Not exactly a co-
> routine library, at least not in a conventional definition of co-routines._

> _Yes, that 's exactly what it is. Yes, that's exactly what it is. But it
> also matches wikipedia's difinition of coroutine: "Coroutines are computer
> program components that generalize subroutines for nonpreemptive
> multitasking, by allowing multiple entry points for suspending and resuming
> execution at certain locations."_

~~~
tomp
_All_ coroutine libraries are non-preemptive and cooperative. E.g. in Haskell,
threads will switch only on GC allocations. In Go, they only switch on I/O
(and maybe function returns). In theory, the compiler could automatically add
`yield` statements into non-allocating, non-I/O loops. But nevertheless, _all_
user-space threading is cooperative.

~~~
vidarh
> But nevertheless, all user-space threading is cooperative.

Not necessarily. You can do user-space pre-emptive threading on any system
that support pre-emptive signal handlers and that lets you mess around with
the stack, particularly if you can generate signals from timers and/or IO
operations (both of which you can with POSIX timers)

~~~
tomp
Then it's not really user-space, is it? You're still paying for a kernel
context-switch or two.

~~~
gpderetta
It is user-space threading if the scheduler is in user-space.

~~~
tomp
I disagree. If it's the kernel who's interupting threads, it's not user-space
threading. The scheduler just decides which thread gets run next, but the
kernel decides _when_ it gets run.

~~~
gpderetta
Technically it is not the kernel, it is the hardware clock.

At some point any user space scheduler will switch to the kernel in an event
demultiplexing syscall and potentially block there if no user space thread is
runnable. By your definition any threading system which doesn't handle all IO
purely in user-space can't claim to be user-space threads.

~~~
tomp
But that is when the userspace _must_ (and _wants_ ) to communicate with the
kernel. Ultimately, that's the programmer's choice. You could easily write a
program in Haskell or Go with two lightweight threads that would keep sending
ping/pong messages to each other, and all the waiting (on locks, channels,
etc.) and switching would be implemented in userspace.

I don't know enough about userspace/kernelspace communication, maybe I/O
(refilling buffers or calling select) and timers are much more lightweight
than full thread context switches, but I'm guessing it's still much slower
than pure userspace scheduling.

~~~
gpderetta
How is a clock signal different? It is just another external event userspace
wants to be notified about.

Note that on Unix you can use signals to get IO readiness notification and in
fact for sometime on Linux (before epoll) realtime signals were the preferred
method to do get notification over a large amount of fds.

Anyway, as a general guideline, user/kernel transition is cheaper than a
(kernel) thread/thread transition which is in turn cheaper than a
process/process transition.

------
userbinator
I haven't looked at the source yet so I don't know what the implementation is
like, but this makes me wonder if the author has read this article:
[http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html](http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html)

Of course, doing it in portable C is going to be much slower than raw stack-
switching style.

~~~
rumcajz
Sure I did. It's a classic.

------
jkarneges
Someone needs to make a tiny C lib that contains just the stack switching
stuff (basically whatever is used as a substitute for ucontext), that the
various coroutine libs can share. I like all these coroutine libs (libtask,
lthread, libmill, etc), but it makes me uneasy to have non-portable assembly
code lying about that might go unmaintained.

~~~
versteegen
There are oh so many libraries for that:

[https://github.com/baruch/libwire/wiki/Other-coroutine-
libra...](https://github.com/baruch/libwire/wiki/Other-coroutine-libraries)

And if you don't like any of those, my next choice would be to use the
implementation from Luajit 1.x and 2.x, called Coco but not packaged into a
separate library:

[http://coco.luajit.org/portability.html](http://coco.luajit.org/portability.html)

~~~
jkarneges
Cool! I wasn't aware of those low level libs. Now I should submit a patch to
lthread to use fcontext. >:)

------
noselasd
Is there any big difference from this vs
[https://swtch.com/libtask/](https://swtch.com/libtask/) , which is a port of
much of the same concept from Plan 9 - which again ties back to Go.

~~~
rumcajz
It's basically the same thing, but simpler and more familiar (if you dealt
with Go, that is) API.

------
aosmith

        It can execute up to 20 million coroutines 
        and 50 million context switches per second.
    

Is entirely hardware dependent. Also seems to disregard the fact that certain
other things are much easier in go than C.

~~~
RyanZAG
Not sure what your point is. That there's some argument about performance?
There is no way that Golang can keep compete with performance of straight C
code.

~~~
pjmlp
6 year old compiler vs 40 years of research in optimizing compilers.

------
aggieben
So....what is that `choose` thing? Curious how that's implemented...

~~~
rumcajz
I wrote a blog post about it:
[http://250bpm.com/blog:56](http://250bpm.com/blog:56) Also, a blog post about
how coroutine spawning is done:
[http://250bpm.com/blog:48](http://250bpm.com/blog:48)

------
Dowwie
author is Martin Sústrik, who is also the creator of ZeroMQ, nanomsg, and
ribosome

that's quite a resume

------
sunnyps
Can anyone familiar with this say if it runs one coroutine per thread? How
does a coroutine get it's own stack? If coroutines share threads, is it
possible to switch in between a coroutine say when it's blocked on IO like it
is in go?

I'm not an experienced go developer but my understanding is that it's possible
in go to have millions of goroutines each blocked on IO. I don't see how a C
function could mimic that. I thought stack moving was an essential requirement
of any runtime that wants to mimic go in this way.

I've looked at some of the special macros like coroutine and go and it looks
like libmill runs the entire function fn when you call go(fn) so if fn is
blocked on IO then your entire thread is blocked on IO and you can soon grind
to a halt if you have even a few blocked coroutines.

~~~
rpedela
select(), poll(), etc are available in C which enables non-blocking I/O. I
dont know how libmill is implemented, but I assume it is using one of those
functions for non-blocking I/O.

~~~
sunnyps
So it looks like there are facilities for doing some IO, listening on a
socket, etc. But I don't see any facilities for waiting on locks, semaphores,
etc. Maybe they could build that in a future release or maybe they consider it
antithetical to the idea of "share via communicating as opposed to communicate
via sharing".

~~~
exDM69
> But I don't see any facilities for waiting on locks, semaphores, etc

There's a very good reason for this. First of all, you can use file
descriptors as locks, semaphores, condition variables, etc using pipe(2) and
other functions. On Windows you can use WaitForMultipleObjects, etc on Mutex
objects.

But select/epoll/kqueue/WaitForMultipleObjects is a system call that goes in
to the kernel. That is very inefficient if you want e.g. mutual exclusion
using a mutex. In a modern OS, a mutex (or CriticalSection in Windows) is
implemented using a futex ("fast userspace mutex"), which is just a spinlock
in userspace that only calls to kernel if the mutex is contended. An
uncontended futex can be locked and unlocked in less than 20 nanoseconds. A
system call can be 100x slower than that.

Some languages, e.g. Haskell have a green threading system that implements
mutexes and i/o in a consistent way with i/o using a userspace scheduler. But
this depends on language and runtime implementation.

------
tsavola
Can it take advantage of GCC's split stack support?
[https://gcc.gnu.org/wiki/SplitStacks](https://gcc.gnu.org/wiki/SplitStacks)

------
leonatan
Wouldn't libdispatch be a much better option if you don't mind Apple's blocks
extension to the C language?

~~~
sunnyps
My understanding is that libdispatch is not a replacement for go's
concurrency. With libdispatch each block (closure) that you dispatch runs on a
given thread, if that block stalls for any reason then your thread is blocked.
So even with a few blocked blocks (heh) your system could be stalled which is
not the case with go where you can have millions of goroutines that are
blocked.

~~~
jws
I've built systems with libdispatch and blocks. I found it much nicer than
traditional C style non-blocking IO or typical pthreads, both of which I have
also done.

• You don't need to use blocks, there are function callback versions of the
calls available, but blocks are nicer.

• There are asynchronous IO routines, that covers most of your stalls.

• Go probably stalls threads on page faults, as does libdispatch.

You can read about the Linux port of libdispatch at
[https://github.com/nickhutchinson/libdispatch](https://github.com/nickhutchinson/libdispatch)

~~~
stefantalpalaru
It looks like the Linux port of libdispatch uses OS threads through pthreads,
so it's not a candidate for the user space light-weight threads needed for
efficient CSP.

~~~
jws
I'm fairly certain that is just to get the worker pool. The blocks on the
queue don't each get a thread.

------
xvilka
Sadly it doesn't have support for Windows MSVC yet:
[https://github.com/sustrik/libmill/issues/16](https://github.com/sustrik/libmill/issues/16)

------
rumcajz
One feature worth pointing out is debugging support:
[http://libmill.org/documentation.html#debug](http://libmill.org/documentation.html#debug)

------
samuell
Sorry for the ignorance of a newbie, but I couldn't find any info on whether
libmill does anything similar to the M:N multiplexing of light-weight threads
upon OS threads?

~~~
awdomnooo
M:1.

------
Animats
What happens when something blocks for synchronous I/O?

~~~
rumcajz
A different coroutine gets a go.

~~~
Animats
How do they do that? Their own I/O library with asynchronous everything?

~~~
rumcajz
Simple stack switching. See here for a toy example:
[http://250bpm.com/blog:48](http://250bpm.com/blog:48)

------
f2f

        "finally! it's so ridiculously easy now that everyone will start using it and 
        the world will be roses and meatballs from now on!"
    

said no one ever.

even the plan9 guys who were the first to do it with alef and libthread didn't
feel like they'd solved concurrency in C.

------
_ak
If you want Go-style concurrency, why not use Go in the first place?

~~~
zobzu
I you've written advanced-enough go, you probably have a bunch of file that
start with some C code before the Go code starts.

Go simply can't interface everything the way C does. Go doesn't give you the
same level of flexibility either.

tldr: Go's great, but it's just not as powerful as C (unlike Rust or some
others, which can do anything C does).

~~~
_ak
What specifically are you talking about? Go can do syscalls perfectly fine,
you can mess in your memory perfectly fine (even if you have to resort to the
unsafe package), so what exactly is it where you think Go lacks flexibility
compared to C? Because so far, your posting sounds rather hand-wavy.

~~~
wishing_for_ffi
One problem is that it's very hard to call go libraries from other languages
(it requires a runtime). C libraries can be called from almost any language.

There are many go libraries I would love to use, but can't because of this. I
expect many others are in a similar boat.

