The OS IPC mechanism quite likely does not use a lock free queue -- or at least, not in quite the same way as I think the grandparent post refers to.
Using a well implemented ring buffer [+] can get enqueue operations down to a few instructions and something like two memory fences.
The overhead of IPC, which wakes up the kernel scheduler, switches the processor back and forth between privilege modes a few times on the way, knocks apart all the CPU cache and register state to swap in another process, while the MMU is flipping all your pages around because these two processes don't trust each other to write directly into their respective memory... is not going to have quite the same performance characteristics.
An moment in the history of logging is java's log4j framework, which, within a single process, used exclusive synchronization. When this was replaced by a (relatively) lock-free queue implementation, throughput increased by orders of magnitude. (Their notes and graphs on this can be found at https://logging.apache.org/log4j/2.x/manual/async.html .) This isn't an exact metaphor for the difference between a good lockfree ringbuffer and IPC either, but it certainly has some similarities, and indeed ends with a specific shout-out to the power of avoiding "locks requiring kernel arbitration".
--
[+] The "mechanical sympathy" / Disruptor folks have some great and accessible writeups on how they addressed the finer points of high performance shared memory message passing. http://mechanitis.blogspot.com/2011/06/dissecting-disruptor-... is one of my favorite reads.
I think you're thinking of a synchronous RPC mechanism. I was talking about IPC mechanisms like unix domain sockets, where sending the message doesn't interact at all with the receiving process, but literally just sticks it into a buffer, where the other process can come and get it later.
Using a well implemented ring buffer [+] can get enqueue operations down to a few instructions and something like two memory fences.
The overhead of IPC, which wakes up the kernel scheduler, switches the processor back and forth between privilege modes a few times on the way, knocks apart all the CPU cache and register state to swap in another process, while the MMU is flipping all your pages around because these two processes don't trust each other to write directly into their respective memory... is not going to have quite the same performance characteristics.
An moment in the history of logging is java's log4j framework, which, within a single process, used exclusive synchronization. When this was replaced by a (relatively) lock-free queue implementation, throughput increased by orders of magnitude. (Their notes and graphs on this can be found at https://logging.apache.org/log4j/2.x/manual/async.html .) This isn't an exact metaphor for the difference between a good lockfree ringbuffer and IPC either, but it certainly has some similarities, and indeed ends with a specific shout-out to the power of avoiding "locks requiring kernel arbitration".
--
[+] The "mechanical sympathy" / Disruptor folks have some great and accessible writeups on how they addressed the finer points of high performance shared memory message passing. http://mechanitis.blogspot.com/2011/06/dissecting-disruptor-... is one of my favorite reads.