If you are excited by golang-style in C, then Mr Sustrik's libmill is for you http://libmill.org/tutorial.html
LibDill is a thin experiment "above" of libmill. The main problem it tries to solve is "cancellations". For the uninitiated, in normal programming languages there is no way to cancel/interrupt a tread/coroutine from outside. Look at the mess in pthread_cancel(3) . In golang as well, there is no way to cancel a coroutine.
Why this is important? Well, read Mr Sustrik's notes . But basically - imagine you want to define the time limit for a completion of some task. Or - imagine what should happen if a HTTP request comes, initiates some actions (SQL quieries?) and then disconnects. Normal programming languages have NO way of expressing that - "client went away, stop processing its sql queries".
Libdill is an attempt to solve this. I must admit, I'm not fully on board with the "structured concurrency" train of thought  , but the whole prospect of defining semantics of killing/cancelling coroutines is absolutely amazing.
Also, review the golang "context" mess, which AFAIU tries to work around the same problem. 
I am not 100% sure about the structured concurrency concept myself, but comparing it to structured programming feels reassuring. Basically, a tree structure (whether a syntactic tree or the tree of running coroutines) is probably the most complex way to structuring stuff that's still tractable by a human being. Once you go beyond that, you'll eventually get lost. Thus, tools enforcing a tree-like structure are likely to be helpful to keep the program maintainable.
About your API though.
Why is there a load of inet/ip/socket functions, shouldn't they be in a separate lib? Why is the opposite of the fn int go(expr) the fn int hclose(int) and not the fn int stop(int) or something like it. open/close – go/stop – and so on …
My recommendation would be to just not use the socket functions and link with the library stitically. That way the unused code will be discarded.
As for hclose() that was just mimicking the POSIX close() function.
When you can't, defer them to a thread that can be ignored.
There's been talk of adding context to io.Reader and io.Writer, which would of course affect pretty much every single Go codebase. For backwards compatibility/convenience you therefore see a lot of APIs with complementary functions Foo and FooCtx, one which takes a context and one that doesn't. That's awkward, and even less elegant when interfaces are involved.
And the network packages (sockets, etc.), of course, have a parallel, non-complementary mechanism to provide timeouts and cancelation that doesn't use contexts. (The key/value pair context stuff is also unfortunate, in my opinion. It's not typesafe, and results in awkward APIs.)
All of this happened because goroutines, as designed, cannot be forcibly terminated. I don't know what the better solution is. I'm not sure why the runtime cannot provoke a panic at the next context switch, but I bet there's a good technical reason. It's unfortunate.
But it's not unreasonable that a large part of any codebase needs "contexts", and so if everyone wants it, why not make it mandatory? Make it an implicit argument threaded through every function call, a bit like "this" in languages such as C++ and Java. The compiler could optimize it away from those codepaths that are guaranteed not to need it. Go has a lot of built-in generic magic, adding some global helpers to set a scope's context wouldn't be too bad.
The other route is to imitate Erlang's processes, which have their own heap, and so killing a process doesn't leave cruft behind. Given how goroutines are allowed to share memory, that's not likely to happen.
Boy do I have good news for whoever wrote this:
Anyone else like that stuff? Or have any insight why it isn't more widely used? Maybe it's because it was a commercial thing?
It turns out that a coroutine is just a function that the compiler should not inline: https://github.com/sustrik/libmill/blob/master/libmill.h#L24...
Does someone have a mental model of how to take advantage of multiple cpus without creating a mess? If so, can you explain it like I'm 5 (or more like I'm 55 and tired)?
Now different tools work differently for this. If you use multiprocessing, then you have no implicit shared state (you can use the filesystem or shm &c. to share state, but not much is shared by default). Clojure e.g. defaults to no mutability, so you can share state safely by default.
That covers concurrency. For parallelism, there are some useful structures to. In Common Lisp, I usually write a program single-threaded, then I profile, and add use lparallel to parallelize the slow part. 99% of the time the slow part is a loop, so making it parallel is fairly straightforward. If it's a series of possibly interdependent calculations, then the calculations can usually be represented as a tree, and lparallel has a tool for that as well.
There is absolutely nothing lisp-specific in lparallel, any language with first-class functions and some form of thread-local variables could implement all that it does. It's not a silver bullet but it does solve the "My cores stopped getting faster, but I have more cores now" problem about 80% of the time.
For 99% of applications that means you take your favourite threadsafe queue implementation for your specific language and launch threads and only communicate this way. The messages should preferably be immutable, copies or at the very least marked readonly via const if copying is too expensive.
I'd recommend you to take a look at Erlang.
Also remember that there is no magic pixie dust. Functional programming languages are not inherently better than non functional programming languages at concurrency and parallelism. They merely leverage the "no multiple writer" principle by default. Immutable data is threadsafe primarily because it's guaranteed that there is only a single writer.
You typically have a cluster of machines, so you have to be able to run multiple instances of your application simultaneously to use all the processing power. And once you can run mutliple instances of the application in parallel you can as well run 16 instances of it on a 16-core box.
The upside is that you don't have two levels of granularity (threads, machines) but only a single one (processes) which makes both the coding and the ops as well as stuff like capacity planning much easier.
Don't start a thread per work item. Start worker threads and distribute work among them. Never ever block a GUI thread.
Always know which thread works on which data structures and don't even make more variables accessible in your worker threads. Keep the variables multiple threads can access to an absolute minimum and make sure access is always protected by a mutex.
If you repect these rules, threading is safe and fun.
At first I read 'goroutine' and thought something is wrong with my mind and that I have coded too much Go. But then I saw this:
Now I am pretty sure someone got inspired by Golang ;-) Nice to see some backporting of Go features to C.
Note: That doesn't mean that Go is the only language that supports easy concurrent patterns, but that the creators of Go wanted to improve C and while not everybody agrees that all they have done are improvements, making concurrency easier most certainly is one.
libdill was a development on libmill in two main ways: it was more idiomatically C and it supports structured concurrency. The latter is, in my opinion, very interesting. Martin has a blog post on it .
BTW I've got all this from following the nanomsg threads and reading Martin's blog. I'm not an active participant, so I may have some details wrong, but I think it's fairly accurate.
I'll also add a shameless plug to my recent 'port' of this essay for a VM with first-class continuation support: http://akkartik.name/coroutines-in-mu
> libmill was a project that aimed to copy Go's
> concurrency model to C 1:1 [...]
> libdill is a follow-up project that experiments with
> structured concurrency and diverges from the Go model.
the new C11 has thread built-in but not used widely yet.
libuv, has the follow (iirc) genealogy :
libevent -> libev -> libuv
given this context, i still don't quite understand how/where libdill might play a role here ?
fwiw, almost all of these event-processing libraries, supports multiple event loops, and thus an event loop is a first class citizen within the library, and implement functions for creating/destroying/starting/stopping loops. multiple event loops find their uses specifically in the context of multi-threaded servers for example.
Later I discovered that it is possible to do efficient epoll emulation on windows so then I wrote wepoll. With it you can just stick to the good ol' epoll/kqueue model and still support windows.
Call this overlapped-poll function on every monitored socket individually so you don't inherit poll()s scalability problems.
This is as much of an explanation I can type on my phone - I'll add more detail to the wepoll readme later.