Wouldn't any linux/nptl thread require at at least the register-state of the ent...

jeffbee · on July 19, 2021

I don't think the question is dominated by machine state, I think it would be more of a question of stack size. They are demand-paged and 4k by default for native threads, 2k by default for goroutines but stored on a GC'd heap that defaults to 100% overhead, so it sounds like a wash to me.

dragontamer · on July 19, 2021

Hmmm.

It seems like you're taking this from a perspective of "Pthreads in C++ vs Coroutines in Go", which is correct in some respects, but different from how I was taking the discussion.

I guess I was taking it from a perspective of "pthreads in C++ vs Go-like coroutines reimplemented in C++", which would be pthreads vs C++20 coroutines. (Or really: it seems like this "Loom" discussion is more of a Java thing but probably a close analog to the PThreads in C++ vs C++20 Coroutines)

I agree with you that that the garbage collector overhead is a big deal in practice. But its an aspect of the discussion I was purposefully avoiding. But I'm also not the person you responded to.

jeffbee · on July 19, 2021

Right, I admit there are better ways to do it, but I don't think it's obviously true that goroutines specifically are either more compact or faster to switch between. The benefits might be imaginary. The Go runtime has a thread scheduler that kinda sucks actually (it scales badly as the number of runnable goroutines increases) and there are also ways of making native threads faster, like SwitchTo https://lkml.org/lkml/2020/7/22/1202

Thaxll · on July 20, 2021

Have you tried to context switch between 100k native threads? Good luck with that, in the mean time Go has no problems doing that with 1m goroutines.