I'm not sure if this will work in real life; it looks like it's relying heavily on C compiler magic to ensure that the stack's laid out the way it wants. (Plus, allocating stacks with malloc() doesn't work on some architectures, and I have encountered pthreads implementations that get very unhappy if you change stacks on them).
But it's still a deeply impressive piece of lateral thinking.
In the real world you'd create a stack with mmap with the MMAP_STACK flag. You would ensure the stack is property aligned and make sure there's a guard page at the end of it.
char filler_[(char*)&anchor_ - (char*)(stack + STACK_SIZE)];
It doesn't seem to work correctly on my machine. Writing beyond heap bounds and uninitialized reads.
I like articles that combine hacking and C, but this one really needs more explanation.
Also, how is Rust's support for coroutines?
It worked really well --- on my machine.
Unfortunately on other machines there were weird bugs and instability, and really-hard-to-diagnose crashes; because it all worked fine on my machine, debugging was painful. Eventually I figured out that pthreads, which was being linked in by a library I depended on, when combined with a particular glibc and a particular Linux kernel, would store the TLS pointer at the top of the C stack --- it used alignment tricks to be able to figure out where the TLS pointer was from the current stack frame. (I assume this was to work around a kernel with no native TLS support.) Of course, my coroutine implementation was allocate its own stack with mmap(). This was causing pthreads to pick up either a garbage TLS pointer or, even worse, the wrong TLS pointer.
That was when I gave up on manual coroutines in C. Lovely idea, works really well on paper, so much simpler than threading (if you can live without multicore support), doesn't actually work in practice.
They're still worth checking out in languages like Lua, though. I'm still bitter that ES6 doesn't have proper coroutines, opting for the much less useful generator concept instead. Apparently they were too complicated to implement...
Coroutines seem like the type of concept that relatively few programmers understand and use, but when they are used well, it can simplify the code flow greatly.
The main release doesn't use them yet, but the Windows port does.
My co-routine library also allows arbitrary arguments, but it just uses va_list for this.
Also I use setjmp/longjump and stack allocations (if ucontext is not available), but do not depend on c99 dynamic arrays to do it. BTW, the array trick doesn't work on some architectures: IA64 (there are two stacks) or Cray (the stack is a linked-list).