There's no CGO involved when compiling to Wasm. The sometimes slow performance is due to the hoops the compiled code has to jump through to support the Go runtime and goroutine preemption on a single thread.
I understand there's no CGo specifically, but I'm wondering if the Go runtime when running under WASM still has to manage switching out the goroutine stacks for "WASM stacks" when it's calling out through the WASM VM.
Edit edit: from this comment it sounds like it is, as you say, just the general overhead of managing goroutine stacks. I wonder if TinyGo is more performant.