Go isn't unique in this; there are many languages with runtimes that require similarly intensive amounts of copying and context conversion before C code can run on whatever the data is. However, most of those languages, like CPython, are themselves slow enough that the penalty isn't as noticeable against the general background noise. (In general CPython requires a lot more copying too; Go is closer to C struct and array semantics and can more often get by with some form of memcpy, the internals of Python look nothing like that.) Go is fast enough that it's much easier to get into scenarios where in a tight loop you're spending 90% on CGo overhead if you're not careful with data flow. For those languages that have to copy a lot out of their runtime and are also fairly fast, they'll face the exact same issues. It's not really a "Go" issue per se, but the challenges faced by any language that wants a runtime significantly different from C.
(One of the miracles of Rust is building an environment and runtime that isn't stuck on C's limitations but at the same time can still speak to C really, really cheaply. Plenty of languages have one or the other of those, but there aren't very many that have both. I'm not sure there's any other language that has threaded that particular needle so cleanly.)
I agree, and some of this overhead is even inherent in legitimate, foundational choices such as VM-interpreted/"managed" code (as in Java/.NET; but Python does this as well) or the use of tracing GC (which requires some strict discipline on heap contents, so as to enable the GC itself to reliably "trace" and discover the semantics it cares about). So, I'm definitely not saying that the choices made as part of Go's design are consistently wrong here!
Indeed, Haskell is in a very similar place overall; the "fibers" that GHC uses in its compiled code are implemented via async code underneath, much like Go's goroutines, and Haskell's performance is also well within an order of magnitude of pure C-like code. So the issues you describe are quite well-understood in that context, and they are definitely not regarded as a "reason" to avoid the use of C FFI when performance requirements call for it. But describing Cgo itself as something that's high-overhead and should not be used for that reason is misleading to an even stronger extent, and that's what I was objecting to in the grandparent comment!