~/development/go/dev$ time ./8.out
~/install/stackless-2.6.4$ time ./python 100k.py
// Scheduling helpers. Sched must be locked.
static void gput(G*); // put/get on ghead/gtail
in the uncontended case,
* as fast as spin locks (just a few user-level instructions),
* but on the contention path they sleep in the kernel.
* a zeroed Lock is unlocked (no need to initialize each lock).
This is just a guess from someone that knows nothing about either (stackless does run in a vm, right?), so take many large grains of salt.
In short, you are comparing apples with bazookas. (Also, for a test that runs in such a short time, and running it only once, it is likely most of the time it is taken by things unrelated to what is intended.)
Thing is - it doesn't seem to have been missed from Stackless, which makes me wonder when it's needed.
On concurrency: you compared a parallelized CSP implementation, with a single-threaded one, where the test was dominated by communication costs. Single-threaded communication is much faster than potentially contended parallel communication.
On benchmark hygiene: there are a plethora of different machines, different performance profiles and different numbers flying around here. The kloc/sec numbers don't mean anything unless they're on the same machine; similarly, the performance numbers are dependent on degrees of hardware parallelism and the kind of workload (proportion of per-task vs communication vs task startup, all the different variables in cost). Several different benchmarks would need to be run to actually tease out these different variables, to really figure out which one is better.
On concurrency, I thought Go right now was configured by default for only a single kernel thread running goroutines, which would make both of them single-threaded implementations.
A few people now have reported benchmark numbers on their machines, with both Stackless and Go. In every case so far, Go has been slower than Stackless for the 100,000 tasklet/goroutine case. Of course, if Pike had a MacBook Air with an SDD then the compile times are absolutely not comparable.
Agreed also about how to tease out the different variables. Still, recall that I mostly want to know why Pike, for example, stressed that there were no tricks going on underneath the covers to make the performance fast, with the implication that people wouldn't quite believe how fast it was, when the performance does not seem exceptional compared to other similar languages.
"Go is stackless where Stackless is not. Its goroutines use allocated
stacks (starting at 4k in size) and can continue running on different
threads where Stackless tasklets cannot. In fact, when a goroutine
blocks on a system call, the other goroutines in its scheduler are
migrated to another thread."
So stakcless is not really concurrent at all, given that, it is no wonder performance is different given that the functionality provided by stackless 'microthreads' is in no way comparable to goroutines.