Hacker News new | past | comments | ask | show | jobs | submit login
Coroutines in C with Arbitrary Arguments (250bpm.com)
62 points by rumcajz on April 19, 2015 | hide | past | favorite | 22 comments

That is majestically vile. It took me a while before I figured out how it was assigning the new coroutine's stack pointer, and then I was filled with admiration and a little nausea.

I'm not sure if this will work in real life; it looks like it's relying heavily on C compiler magic to ensure that the stack's laid out the way it wants. (Plus, allocating stacks with malloc() doesn't work on some architectures, and I have encountered pthreads implementations that get very unhappy if you change stacks on them).

But it's still a deeply impressive piece of lateral thinking.

The code in question strikes me as being a proof of concept. Obviously, a more complete implementation would be more complex.

In the real world you'd create a stack with mmap with the MMAP_STACK flag. You would ensure the stack is property aligned and make sure there's a guard page at the end of it.

It should be possible to do this without malloc(), by allocating the co-stack as an array within the current stack. Something like this...

    char stack[STACK_SIZE];
    int anchor_[unoptimisable_];
    char filler_[(char*)&anchor_ - (char*)(stack + STACK_SIZE)];
(Haven't tested it, but it's basically the same principle. The same caveats about stack layout and VLAs apply.)

Except that if the parent function terminated before the coroutine it have launched, you would get memory overwrites.

You would also need to align the stack per. target requirements (gcc keyword, or offset the address). The aliment guarantees for char[] placed on stack are probably insufficient for many machines.

If none of the variables on the stack are smaller than your desired alignment size, and the stack starts out aligned (which it should), then it will remain aligned.

Can you elaborate on that.

It doesn't seem to work correctly on my machine. Writing beyond heap bounds and uninitialized reads.

It uses dynamically allocated array (char filler_[...]) to shift stack pointer to arbitrary memory address. If it doesn't work with your compiler you may try to use alloca() instead.

Here's my implementation in c++, using alloca(): https://github.com/jaroslov/coro. Without debugger support, coroutines are pretty difficult to work with.

Oh I see, it moves the stack pointer so it coincides with the heap allocated memory?

I like articles that combine hacking and C, but this one really needs more explanation.

This article is really about the arguments to the coroutine. The original idea for shifting the stack pointer comes from here: http://fanf.livejournal.com/105413.html

Would it be possible to make a coroutine implementation that allows the stacks to grow dynamically? (If there are N coroutines, you will need N stacks, so for large N it does not suffice to have a statically conservatively preallocated stack for each coroutine).

Also, how is Rust's support for coroutines?

I have recently found this: https://github.com/rustcc/coroutine-rs

Interesting, but the documentation does not seem to describe how stacks/memory management is done.

From a quick look at the code, it seems to use segmented stacks with memory mapping. https://github.com/rustcc/coroutine-rs/blob/master/src/stack...

There's -fsplit-stack option in gcc IIRC.

Semi off topic, but are there examples of coroutine usage in any relatively popular open source C or C++ code bases? The only thing that comes to mind is the MAME and MESS code bases (I could be wrong here). We see coroutine articles on HN from time to time but I've always wondered who or what software out there are using these types of techniques.

It's hardly well known, but I implemented an SMTP greylisting proxy called Spey which used a coroutine for each connection with a cooperative scheduler which switched between them as data arrived. (Cooperative scheduling has huge conceptual advantages, because now you don't have to think about synchronisation and concurrency issues.)

It worked really well --- on my machine.

Unfortunately on other machines there were weird bugs and instability, and really-hard-to-diagnose crashes; because it all worked fine on my machine, debugging was painful. Eventually I figured out that pthreads, which was being linked in by a library I depended on, when combined with a particular glibc and a particular Linux kernel, would store the TLS pointer at the top of the C stack --- it used alignment tricks to be able to figure out where the TLS pointer was from the current stack frame. (I assume this was to work around a kernel with no native TLS support.) Of course, my coroutine implementation was allocate its own stack with mmap(). This was causing pthreads to pick up either a garbage TLS pointer or, even worse, the wrong TLS pointer.

That was when I gave up on manual coroutines in C. Lovely idea, works really well on paper, so much simpler than threading (if you can live without multicore support), doesn't actually work in practice.

They're still worth checking out in languages like Lua, though. I'm still bitter that ES6 doesn't have proper coroutines, opting for the much less useful generator concept instead. Apparently they were too complicated to implement...

QEMU is another one that I know of: http://blog.vmsplice.net/2014/01/coroutines-in-qemu-basics.h...

Coroutines seem like the type of concept that relatively few programmers understand and use, but when they are used well, it can simplify the code flow greatly.

Speaking as a QEMU developer, I really don't like the coroutine use. They're a portability mess, they tend to expose bugs in dusty corners of the compiler, and they can be painful to debug around. I would much rather take a view that C is simply not a language with coroutines in it, and not try to retrofit them without explicit support from the compiler and runtime...

There is a branch of JOE which uses co-routines:


The main release doesn't use them yet, but the Windows port does.

My co-routine library also allows arbitrary arguments, but it just uses va_list for this.

Also I use setjmp/longjump and stack allocations (if ucontext is not available), but do not depend on c99 dynamic arrays to do it. BTW, the array trick doesn't work on some architectures: IA64 (there are two stacks) or Cray (the stack is a linked-list).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact