Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wouldn't any linux/nptl thread require at at least the register-state of the entire x86 (or ARM) CPU?

I don't think goroutines would need such information. A goroutine knows that "int foobar;" is currently being stored in "rbx", and that "int foobar" is currently saved on the stack. Therefore, rbx doesn't need to be saved.

------

Linux/NPTL threads don't know when they are interrupted. So all register state (including AVX512 state if those are being used) needs to be saved. AVX512 x 32 is 2kB alone.

Even if AVX512 isn't being used by a thread (Linux detects all AVX512 registers to be all-zero), RAX through R15 is 128-bytes, plus SSE-registers (another 128-bytes) or ~256 bytes of space that the goroutines don't need. Plus whatever other process-specific information needs to be saved off (CPU time and other such process / thread details that Linux needs to decide which threads to process next)



I don't think the question is dominated by machine state, I think it would be more of a question of stack size. They are demand-paged and 4k by default for native threads, 2k by default for goroutines but stored on a GC'd heap that defaults to 100% overhead, so it sounds like a wash to me.


Hmmm.

It seems like you're taking this from a perspective of "Pthreads in C++ vs Coroutines in Go", which is correct in some respects, but different from how I was taking the discussion.

I guess I was taking it from a perspective of "pthreads in C++ vs Go-like coroutines reimplemented in C++", which would be pthreads vs C++20 coroutines. (Or really: it seems like this "Loom" discussion is more of a Java thing but probably a close analog to the PThreads in C++ vs C++20 Coroutines)

I agree with you that that the garbage collector overhead is a big deal in practice. But its an aspect of the discussion I was purposefully avoiding. But I'm also not the person you responded to.


Right, I admit there are better ways to do it, but I don't think it's obviously true that goroutines specifically are either more compact or faster to switch between. The benefits might be imaginary. The Go runtime has a thread scheduler that kinda sucks actually (it scales badly as the number of runnable goroutines increases) and there are also ways of making native threads faster, like SwitchTo https://lkml.org/lkml/2020/7/22/1202


Have you tried to context switch between 100k native threads? Good luck with that, in the mean time Go has no problems doing that with 1m goroutines.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: