I agree, and some of this overhead is even inherent in legitimate, foundational choices such as VM-interpreted/"managed" code (as in Java/.NET; but Python does this as well) or the use of tracing GC (which requires some strict discipline on heap contents, so as to enable the GC itself to reliably "trace" and discover the semantics it cares about). So, I'm definitely not saying that the choices made as part of Go's design are consistently wrong here!
Indeed, Haskell is in a very similar place overall; the "fibers" that GHC uses in its compiled code are implemented via async code underneath, much like Go's goroutines, and Haskell's performance is also well within an order of magnitude of pure C-like code. So the issues you describe are quite well-understood in that context, and they are definitely not regarded as a "reason" to avoid the use of C FFI when performance requirements call for it. But describing Cgo itself as something that's high-overhead and should not be used for that reason is misleading to an even stronger extent, and that's what I was objecting to in the grandparent comment!