Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Chan – pure C implementation of Go channels (github.com/tylertreat)
150 points by tylertreat on Aug 29, 2014 | hide | past | favorite | 44 comments



Good stuff, thanks for sharing. Random comments based on a quick glance through the code if I may.

  chan_t* chan_init(int capacity);
should be "size_t capacity", since negative capacity values are invalid. This also eliminated the need for the first check in the function.

  chan->buffered
is duplicate in meaning to chan->queue being non-NULL.

Why chan's mutexes and pcond are malloc'ed and not included in chan's struct? They are also not freed when channel is disposed, unless I'm missing something. Mutexes are also not destroyed when buffered chan is disposed, but they are created for it.

(Edit)

Also, my biggest nitpick would be that chan_t structure should really not be in chan.h header since it's meant to be transparent to the app and the API operates exclusively with pointers to chan_t. Then, if it is made private (read - internal to the implementation), then you can divorce buffered and unbuffered channels into two separate structs, allocate required one and then multiplex between them in calls that treat buffered and unbuffered channels differently. I.e.

  struct chan_t
  { 
      // Shared properties
      pthread_mutex_t m_mu;
      pthread_cond_t  m_cond;
      int             closed;

      // Type
      int             buffered;
  };

  struct chan_unbuffered_t
  {
      struct chan_t    base;

      pthread_mutex_t  r_mu;
      pthread_mutex_t  w_mu;
      int              readers;
      blocking_pipe_t* pipe;
  };

  ...
or better yet, just include a handful of function pointers into chan_t for each of the API methods that need more than the shared part of chan_t struct, initialize them on allocation and put type-specific code in these functions. It will make for a more compact, concise and localized code.

(Edit 2)

Here's what I mean with function pointers - https://gist.github.com/anonymous/5f97d8db71776b188820 - basically poor man's inheritance and virtual methods :)


Thanks, I'm not much of a C programmer, so I appreciate the feedback. I'm not sure what you're referring to with the mutexes and pcond not being on the struct because they are. Can you clarify? You are right with the disposal though, I need to clean that stuff up :)


What eps means is that, currently you have a pointer to a mutex in your struct:

   struct chan_t { pthread_mutex_t* m_mu;... }
And initialize it like so:

    pthread_mutex_t* mu =
        (pthread_mutex_t*) malloc(sizeof(pthread_mutex_t));
    pthread_mutex_init(mu, NULL);
    chan->m_mu = mu;
eps's suggestion is to put the mutex directly in the struct:

    struct chan_t { pthread_mutex_t m_mu;... }
and initialize it by passing its address to pthread_mutex_init:

    pthread_mutex_init(&chan->m_mu, NULL);
    chan->m_mu = mu;
that saves some mallocs and dereferences. If you are new to C, the notion of taking the address of fields in a struct may be unnerving, but you get used to it :)


The struct is currently storing a pointer to the mutex (which was allocated with malloc). The call to pthread_mutex_destroy does not free that memory.

Valgrind is your friend :)

http://valgrind.org/docs/manual/quick-start.html#quick-start...


You don't need to allocate pthread_mutex_t instances on heap, you can include them into the struct directly -

  struct foo
  {
     pthread_mutex_t bar;
     ...
  };


Got it, that makes more sense.


You shouldn't allocate heap space for a pointer in blocking_pipe_read. Also, it looks like you're leaking it in the else branch.

Just allocate a pointer on the stack, read into it, and then assign the value into the provided space:

    void *msg_ptr;
    read(..., &msg_ptr, sizeof msg_ptr);
    *data = msg_ptr;


I'm more of a c++ programmer, but I think i can chime in on the mutex/pcond being "part" of the struct. Essentially, OP means that why arent they on chan's stack? In the struct you have pointers to mutex and pcond which means you have to malloc them. OP is suggesting they not be pointers.


If he's gonna do it that way (Which I agree with and also recommend) he could probably use the container_of macro (You can implement it without gcc extensions - it has slightly less type checks however). Using the container_of he can safely turn the chan_t into a chan_unbuffered_t or chan_buffered_t or etc. assuming he does the proper checks beforehand to make sure it's actually inside of that type.

Also worth noting, he really shouldn't be using '_t' since it's reserved by the standard.


The container_of macro is intended for cases where contained objects are at some (possibly compile-time variable) offset into a structure. Eliding type-checking, it reduces to the simple cast in this particular case.

Not to say that it wouldn't work, just that it isn't really necessary - casting a pointer to a structure to a pointer to its first member (and back) is perfectly portable. Perhaps even idiomatic. (Though the casts in that gist seem to be backwards?)


That's exactly what we have here though, the 'base' chan_t structure is contained inside of the chan_unbuffered_t structure. And I disagree with your second point, I consider casting structures directly like that to be bad practice. It's ugly, error prone, and not guaranteed to work by the standard since struct ordering isn't guaranteed (IE. The standard doesn't say that 'base' will be at the beginning of the struct.).

That said, it is 'portable' since in general no compilers reorder the struct members, but container_of is guaranteed to work, works for multiple members in a struct exactly the same way, and will reduce to a simple cast when your offsetof is zero so there isn't any performance penalty in situations like this. IMO there's no reason not to use container_of in this case.


The standard permits padding structures, but not reordering their members. That's why container_of is necessary (to avoid constantly-updated manual offset calculations or writing redundant code). Any implementation that reorders structure members is non-compliant (and likely incompatible with much existing code).

To quote the relevant section (C99 6.7.2.1.13, same wording present in C89):

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

I'll repeat that I consider direct casts to be somewhat idiomatic. Macros are more traditional for offset members.


Using container_of is better because it doesn't lead to a broken code when someone (accidentally) splices another field at the top of the struct.

Or put differently, direct casting comes with an assumption, while container_of doesn't. The fewer obscure assumptions are there in the code, the better. I think we can all agree on that :)


I can agree with that almost without reservation :)


I agree re container_of and I put it in initially, but then saw tyletreat's "not quite C programmer" and took it out. I remember being majorly confused by this struct-by-field recovery way back when I was just starting up with C.


Libtask[1] by Russ Cox (one of the creators of Go) may provide some inspiration. It implements channels but uses coroutines instead of threading. As a consequence it has some cool stuff like networking and it does all of the hard parts of saving/restoring registers for task switching.

[1]: http://swtch.com/libtask/


Just a forewarning: that library, like most C cooperative threading libraries, is using ucontext.

ucontext is unbearably slow, to the point where it's just as fast to use real threads and mutexes, even on a single-core system. There is nothing lightweight about it. (The technical reason is because they call into kernel functions to perform their magic.)

The actual logic of saving/restoring registers and the stack frame requires about 5-15 instructions per platform, and is very easy to write in assembly. And for the platforms you don't do this for, there's a non-standard trick to modifying jmpbuf which works on x86/amd64, ppc32/64, arm, mips, sparc, etc. These techniques are literally hundreds of times faster.

Try libco instead. It's the bare minimum four functions needed for cooperative threading, which lets you easily build up all the other stuff these libraries provide, if and only if you want them. And if you benchmark libco against all of the other stack-backed coroutine libraries, I'm sure you will be stunned at just how bad ucontext really is. That people keep using ucontext in their libraries tells me that they have never actually used cooperative threading for any serious workloads.

http://byuu.org/programming/libco/


libtask isn't using the ucontext syscall (normally), just the ucontext_t struct. If you look at the source code, it provides local assembly versions of the context swapping code for the most common architectures.

That said, your library is quite nice in that it just provides the co-routine wrappers and nothing else, which I appreciate. Also, your switch code has less instructions (are there cases it doesn't handle?), but if you look at libtask's code, it probably not going to be 100x faster. :-)


Oh ... I looked at libtask/task.c and found:

    static void
    contextswitch(Context *from, Context *to)
    {
    	if(swapcontext(&from->uc, &to->uc) < 0){
    		fprint(2, "swapcontext failed: %r\n");
    		assert(0);
    	}
    }
But it looks like you're right, context.c is actually defining its own version of swapcontext to replace the system function that can use other backends. I'm sorry for the mistake, thanks for correcting me.

> are there cases it doesn't handle?

No, it conforms to the platform ABIs. I even back up the xmm6-15 registers that Microsoft made non-volatile on their amd64 ABI (compare to how much lighter the SystemV amd64 ABI's switch routine is; the pre-assembled versions are in the doc/ folder.)

Their own fibers library even has a switch to choose whether to back those up or not, which I think is quite dangerous.

Still, nothing can top SPARC's register windows for being outright hostile to the idea of context switching.

...

Oh, and it also supports thread-local storage for safe use with multiple real threads.

> it probably not going to be 100x faster.

No, definitely not compared to his ASM versions. Even with a less efficient swap, once you get past the syscall, most of the overhead is simply in the cache impact of swapping out the stack pointer, so his will likely be very close. In fact, even I had to sacrifice a tiny bit of speed to wrap the fastcall calling convention and execute non-inline assembly code (to avoid dependencies on specific compilers/assemblers.)

I do still strongly favor jmpbuf over ucontext for the final fallback, but with x86, ppc, arm, mips and sparc on Windows, OS X, Linux and BSD, you've pretty much covered 99.999% of hardware you'd ever use this on. That and libtask lacks Windows and OS X support.


I created another such implementation which is coroutines and channels but with the aim to be able to build an almost malloc-less applications.

https://github.com/baruch/libwire


I came here to link to libyaml too but I'm glad I was beaten to it. I'm not much of a C programmer but I love reading and studying concurrency. Reading Russ Cox's code is one of my favorite things to do.


*libtask.


Quick question: isn't channel's raison d'etre the fact that they need not be backed by system threads, that they are considerably cheaper and one can fire tens of thousands of them without worrying too much about performance. In other words aren't they primarily intended as a tool for solving a concurrency problem rather than a parallelism problem.

Although, as far as I recall Go does map a bundle of them to a thread and can thus handle parallelism as well.

Just to clarify, this is not a complaint, I am trying to get a better idea of these abstractions.

@tylertreat

> I believe you're confusing goroutines with channels. Goroutines are lightweight threads of execution. Channels are the pipes that connect goroutines.

That is indeed correct. I was thinking more in the line of fibres that meld the concepts of a coroutine and a channel into a single abstraction. So the idea is Chan connects threads rather than coroutines.


Channels are useful because they allow for easy synchronous communication between threads of execution. Whether the threads of execution are dedicated POSIX threads or drawn from a thread pool or simulated by an event-driven state machine (as in ClojureScript) is a different consideration.


I believe you're confusing goroutines with channels. Goroutines are lightweight threads of execution. Channels are the pipes that connect goroutines.


One of our students[0] at Monkey Project Organization[1] through Google Summer of Code[2], wrote a complete co-routine implementation for the Duda I/O[3] Web Services stack.

The feature is called "dthread" or "duda-thread" which aims to expose Channels and generic co-routines functionalities, you can check the following relevant links:

a. The co-routine implementation:

https://github.com/monkey/duda/blob/master/src/duda_dthread....

it does not need mutex or anything similar, short code but a lot of thinking behind it.

b. Example: web service to calculate Fibonacci using channels:

https://github.com/monkey/duda-examples/blob/master/080_dthr...

the whole work will be available on our next stable release, if someone want to give it a try, just drop us a line.

[0] http://blog-swpd.rhcloud.com

[1] http://monkey-project.com

[2] http://www.google-melange.com

[3] http://duda.io


That's very interesting work. I did a similar thing for Kore for its background task implementation so that page handlers could talk to its tasks it fired off and read its responses accordingly.

See example on https://github.com/jorisvink/kore/tree/master/examples/tasks


nice!, its good to see many people working on that area. We got a lot of support from the author of LWan (http://lwan.ws) project during the development process.


The circle is now complete... Plan 9 from user space implementation of channels. http://swtch.com/usr/local/plan9/include/thread.h


Here is some interesting reading: http://golang.org/src/pkg/runtime/chan.goc Go's implementation https://docs.google.com/document/d/1yIAYmbvL3JxOKOjuCyon7JhW... Dmitry's "Go channels on steroids" post


Thanks for sharing, that's a really interesting read. Also would have been very helpful while writing my own implementation, although maybe it's better I didn't beforehand. It's more fun to work on a problem without any preconceptions.



I think you need to add

    pthread_cond_destroy(&chan->m_cond);
After https://github.com/tylertreat/chan/blob/master/src/chan.c#L1...


> #include <pthread.h>

That is not "pure C". "pthread.h" (POSIX threads) is not part of the C language standard, it is a system-specific header commonly found on UNIX systems. Windows has a different threading system for example.


I agree, OP could have used the C11 feature, <threads.h>


To be fair, you'll probably have better luck finding people who can compile it if you use pthreads.h vs. threads.h. IIRC glibc doesn't have threads.h implemented, and I don't know of any C standard libraries that do. pthreads is fairly universal even if it's not standard C, and at the very least it's much more universal then threads.h is at the moment.


Musl libc will have one in the next release it seems. It is the only one I know of.


Correct, I was just going to chime in about this. Musl is very impressive with its compact size, correctness, Drepper-less-ness =P, and speed of implementing new standards.

http://thread.gmane.org/gmane.linux.lib.musl.general/5952


Pelles IDE has full C11 support.


There are pthread implementations for Win32 though.


Pretty nifty. I like the names of the functions chan_can_recv() and chan_can_send(). Made me think think of the old cooking show "Yan Can Cook."


Very nice. Any chance of Boost licensing it?


This is pretty awesome. I do a lot of Go and C and this just might come in handy. Thanks for sharing!


How does this compare to using ZeroMQ inproc sockets?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: