Thankfully the go language designers were more focused on practicality than purity. Shared nothing concurrency is nice in theory, but it is so slow by comparison to just sharing memory between threads. But then I write high performance servers and concurrent data structures using lock free algorithms (including in go), where even a mutex is a luxury, so maybe the problems you work on are very different to what I work on. But I'm very happy go can accommodate the evil things I do in the name of performance.
Fear of copying overhead can introduce more performance problems than copying overhead. Modern CPUs are really good at copying, which, after all, is completely parallelizeable. If you just created some data, and then pass it to something else by copying it, and it's immediately used there, it will probably still be in the fastest level of cache. At least if the message passing and CPU dispatching are properly connected.
QNX gets this. Almost nobody else does. The reason for having subroutine-like IPC, rather than "send on channel A, then wait on channel B for reply", is that the scheduler can immediately transfer control from sender to receiver. If two unidirectional channels are used, you have the sender and receiver threads both in ready-to-run state, which means a pass through the scheduler for somebody, and possibly a handoff to another CPU, with all the attendant cache misses.
A good test of an IPC system is to have one thread calling another as a service, with control going back and forth rapidly, while other threads are compute-bound.
If the presence of compute-bound threads kills IPC performance, it was done wrong.