Thankfully the go language designers were more focused on practicality than puri...

Animats · on Oct 16, 2014

Fear of copying overhead can introduce more performance problems than copying overhead. Modern CPUs are really good at copying, which, after all, is completely parallelizeable. If you just created some data, and then pass it to something else by copying it, and it's immediately used there, it will probably still be in the fastest level of cache. At least if the message passing and CPU dispatching are properly connected.

QNX gets this. Almost nobody else does. The reason for having subroutine-like IPC, rather than "send on channel A, then wait on channel B for reply", is that the scheduler can immediately transfer control from sender to receiver. If two unidirectional channels are used, you have the sender and receiver threads both in ready-to-run state, which means a pass through the scheduler for somebody, and possibly a handoff to another CPU, with all the attendant cache misses.

A good test of an IPC system is to have one thread calling another as a service, with control going back and forth rapidly, while other threads are compute-bound. If the presence of compute-bound threads kills IPC performance, it was done wrong.