
Lthread - C coroutine lib with multicore support - halayli
https://github.com/halayli/lthread?id=1
======
Figs
This looks kind of neat. Too bad about the GPL license though -- I'll probably
never get a chance to use it.

~~~
forrestthewoods
My excitement level went from very high to instantly zero because of GPL. A
real shame. I wonder if they'd consider switching to a more open license.

~~~
halayli
I am not married to the license. I noticed several complaints, so I am going
to reconsider it. :)

~~~
naner
_I am not married to the license. I noticed several complaints, so I am going
to reconsider it. :)_

So did any of them actually make specific complaints (e.g. "I can't link this
to with my already-developed proprietary project because your code is GPL.
LGPL would work great."), or is it just the usual whiners who complain about
anything GPL?

~~~
forrestthewoods
I do iOS development where GPL source code can't be used so it's an automatic
non-starter.

~~~
beagle3
Then you'd be even more bummed to discover that even if halayli changes the
license to BSD (which I hope he does not), you won't be able to use it because
it is x86/AMD64 only.

~~~
daeken
There's no reason that a contributor can't develop ARM support. A contributor
can't change the license.

------
ralph
How does it compare with Russ Cox's libtask, which includes channels?
<http://swtch.com/libtask/>

~~~
dchest
"Libtask gives the programmer the illusion of threads, but the operating
system sees only a single kernel thread."

------
tcas
Looks really cool.

If I understand it correctly, the first call to lthread_create() in the main
thread will create a new pthread with a local scheduler. Each call to
lthread_create() in that lthread will create local lthreads in that scheduler,
so in essence each lthread pool is actually single threaded unless there is a
non exclusive or blocking operation you can run
lthread_compute_begin()/lthread_compute_end() on. This is opposed to something
like Grand Central Dispatch where you can assign "tasks" to the scheduler
which will schedule them on an available thread pool.

~~~
halayli
It will not create a new pthread in the local scheduler but a local lthread
scheduler gets created in the thread context. So if you want to create more
than one lthread scheduler, you just have to create a pthread first and the
new lthreads created in the pthread will be bound to that pthread.

lthread_compute_begin()/end() moves the lthread into a separate pthread and
resumes it there. That pthread is called lthread compute scheduler, its job is
to resume lthreads that will take relatively long time to finish a task.
lthread compute schedulers are created as needed and they stay alive for 60
secs after which they die of inactivity. If it fails to create a new pthread
(max pthreads reached for example) to resume the lthread, then it will get
queued in the least busy compute scheduler. When few lthread compute
schedulers get created, they act as a pool accepting new lthreads and resuming
them, when they cannot handle the load, the pool grows until the pthread limit
is reached and jobs will get queued up.

I believe this is close to what GCD does but probably not exactly the same.

~~~
willvarfar
rather than queuing in the least busy scheduler, they could perhaps be queued
centrally waiting for a scheduler?

Knowing when a long-running task finishes is not easy to guess.

~~~
halayli
I have a requirement that I couldn't get rid of yet, which requires me to know
which scheduler it is going to run on before I let go of it. Once I manage to
find a way around it I'll move to a global queue model.

------
jedbrown
The autoconf script checked into the repository assumes that I have
aclocal-1.10, but I have aclocal-1.11, so I autoreconf -fi to fix it. (I
recommend not checking this stuff into the repository.) Then the build fails
because of -Werror=unused-but-set-variable (it's better to cast return to void
than to try to trick the compiler by assigning to a variable that you don't
look at). You have two versions of the README that are both in markdown, but
differ in a few lines. There doesn't seem to be any automated support for
building and running the tests.

------
alpb
I believe we, developers, would love to see benchmarks or a few small code
snippets rather than pure documentation. By the way, I have noticed that Ryan
Dahl of node.js has mentioned you on twitter @ryah:

> Cute but only a fool would introduce this complexity and overhead for easing
> their C programming experience.

Anyway, that's cool. Keep up good work, tebrikler :)

~~~
zedshaw
Ryan has a vested interest in not having people adopt coroutines since they
show that his stupid "events with callbacks are faster and easier than
threads" is bullshit. The truth is if you have coroutines (and these are
really easy in unix with C), then you don't need callbacks and you can make an
event based system look and work exactly like a thread based system without
the shared resource drawbacks. With coroutines you can also do callbacks, so
you can get the best of all worlds, which you can't get with pure callbacks
only system like Node.js has.

~~~
strags
Coroutines do impose some overhead - each coroutine still requires its own
stack. You either have to allocate a stack big enough for the maximum depth
that a coroutine will need (and when you're making complex library calls, that
can be pretty deep) - OR - you save memory by assuming that the "suspend" call
will only occur when there's relatively little stack space used, in which case
you manually save/restore the stack by copying - and hope that the overhead of
copying a stack for each context switch isn't a killer.

I'm also guessing that it's also not entirely true that there are no shared
resource drawbacks. Whenever lthread moves a heavy computation into a pthread,
all synchronization bets are presumably off. If you've got two "CPU intensive"
workers that reference the same data structures, then you're still going to
need mutexes, right?

~~~
halayli
Yes a coroutine user has to be aware that allocating on the stack has a
penalty, similar to being aware that you cannot make a blocking call in an IO
loop for example.

On average, yielding ~10 calls deep results in copying ~75 to 100 bytes but it
all depends on what has been on the stack. One advantage in lthread is it's
easy to take advantage of cores which isn't very natural in IO loops.

Yes you'll need a synchronization mechanism when accessing shared data
structures from multiple CPU intensive workers.

~~~
beagle3
Wait a sec .. I just realized you're copying the entire stack. If I understand
correctly, that means that when you move stuff to a compute_lthread, the
addresses of local variables change, don't they?

I often take addresses of local variables -- if I understood correctly, this
deserves a huge warning in the documentation.

~~~
halayli
Correct. The local variables address change, but you can still access them,
and pass them to functions. What you cannot do is save a pointer of a variable
and access inside begin()/end().

I thought I added a warning in the lthread_compute_begin() section but
apparently not. I'll go ahead and add it.

~~~
beagle3
It might also be possible to have a "debug mode" that scans the stack while
copying it to the lthread_compute_begin() thread, and warns you if any of it
looks like pointers that point into the copied stack. It will probably be
negligible compared to a long-running thread (compare 60 pointers against a
lower and upper bound), and it might have false positives occasionally -- but
could save a lot of debugging time...

------
axylone
Very cool. I'm try to understand the stack handling in lthread_compute.c - Can
you explain briefly how this works? What is the memcpy for in
_lthread_compute_save_exec_state?

------
fzzzy
Looks extremely nice.

~~~
halayli
Thanks!

~~~
willvarfar
Very nice :)

[http://williamedwardscoder.tumblr.com/post/17112393354/a-sol...](http://williamedwardscoder.tumblr.com/post/17112393354/a-solution-
to-cpu-intensive-tasks-in-io-loops) might be an interesting distraction ;)

~~~
halayli
I am against IO event loops :).

Scroll down to the bottom of the README and look for fibonacci(35) to see how
I solve the problem mentioned in the link. The example is a naive HTTP server
that computes fibonacci on every request and replies back.

~~~
willvarfar
Yes I saw before I posted :)

You move it explicitly to its own thread. What would happen if you didn't do
that?

My article is applicable to `lthread` too. Its a discussion about what happens
to fib(35) or any blocking task running in a multiplexing-tasks-in-a-thread
setup really.

~~~
halayli
What happens is that it will block the other lthreads, bringing down the RPS
to its knees. lthreads are simply coroutines and cooperation/trust are
required to maintain fairness.

------
Alind
looks not bad. If can provide more benchmarking d be good

------
qwe123_troll
Lemme see if I understood this right: you rewrote a large part of the kernel
because you think your code will be better than code that's been tested by
(literally!) billions of people over the course of 15 years?

Not that there is anything wrong with this approach (I rewrote glib'c memory
allocator, which is atrociously broken for modern systems), but you better
have a _very good_ explanation for why you did this.

P.S. I won't touch on the fact that a good scheduler will necessarily need to
run in kernel-mode, not user-mode. But that's a subject for another
discussion.

~~~
halayli
This has nothing to do with the kernel scheduler. lthread is a coroutine
library, you can think of it as a micro task scheduler inside the process
(userland). It's ideal for socket programming because it avoids using
callbacks and minimizes complexity.

~~~
qwe123_troll
It has _everything_ to do with the kernel. You've taken a piece of code that
belongs in kernel-space and put it into a user-space process.

There _might_ be valid reasons to do so, but since I haven't seen any good
explanation I'd just assume this project was coded for lulz or out of
technical ignorance.

P.S. "avoids callbacks" and "minimizes complexity" is a red herring; all
threaded code has these properties, regardless of how the threading engine is
implemented behind the scenes.

~~~
beagle3
It has _little_ to do with the kernel. There, are we friends yet?

Here's a valid explanation, which wasn't explicitly given, although it was
hinted at: kernel threads take resources that become significant when you want
a very large number of threads (say, one million). The per thread overhead,
including user stack, kernel stack and control structures are at a minimum of
~8K. That's 8GB for a million threads before you actually get to do anything
useful.

However, lthreads can realistically take as little as 100 bytes per thread,
which puts us at a 100MB footprint for the same case. That's a huge
difference.

There are tradeoffs, but that's a potentially useful use case which kernel
threads do not support (and which, in general, requires async programming,
which lthread implements while emulating a thread API).

------
ExpiredLink
It's GPL, not LGPL and therefore uninteresting for most potential users.

~~~
zedshaw
Even though it's GPL you should still read the code. It's very well written
and educational. If GPL means you refuse to even read the code then I'm sad
for you.

~~~
eps
Technically if you read GPL'd code, you can't reproduce it as whole or in part
under anything else but GPL (as this creates a derivative work). One of the
reasons why some people take extreme care when dealing with GPL'd sources -
one person reads it, then describes to others and they act on that.

~~~
gillianseed
Wrong, GPL is protected by copyright just like all the other licences, so as
long as the reproduction is not verbatim but instead your own implementation
based on the information you gathered while examining the code then there's no
copyright violation.

~~~
Jach
Copyright violation isn't restricted to verbatim reproductions. I sympathize
with eps. There are also a good number of people, believe it or not, who think
that GPLv3 is too vague with its "distribution" criteria that a court could
determine that it's effectively the same as AGPLv3. So if there's a policy not
to use AGPL, which is fairly common because SaaS is an easy way to create the
"secret money making sauce" of a business otherwise built on open source,
GPLv3 will also suffer adoption.

eps' general concern is that this is entirely a court matter; the spirit of
the GPL and sharing is meaningless to the law because at the end of the day
there are only so many ways you can type "LIST_INIT(&new_sched->new);" and if
a shitty programmer/programmer's company is suing you and has evidence you
looked at that line of code that may be enough for a shitty judge to side with
them. For similar reasons I don't have a habit of looking at patents (though
that's usually because patents, even non-software-patents, are shit and
obvious to the layman let alone someone in the field), but I do read (and have
contributed to) GPL code.

