I've heard rumors that on modern processors, "lock free" doesn't even make a ton...

alright2565 · on May 15, 2022

Modern processors use something like https://en.m.wikipedia.org/wiki/MOESI_protocol to share data between processors.

So basically, writing a shared line is expensive. Atomic reads are free, as are atomic writes to a cache line that no one else is using.

tialaramex · on May 15, 2022

It's never truly free but on some CPUs (e.g. x86 and x86-64) you're always paying for it anyway and so you might as well use it.

Asooka · on May 15, 2022

You can think of a lock-free data structure as a data structure with an embedded wait-free lock implemented with atomics. This has several advantages:

- Less memory - a pthread mutex is around 40 bytes, compared to 4 bytes for this queue.

- No waiting - in some cases you don't want to wait, just do a try-push and if you can't, do some other work with the thread and then try again. Eventually you will have to either wait or discard the data, depending on your situation.

- No syscalls - mutexes weren't always as light as they are now, older implementations would always do a syscall and context switch to take or release a mutex.

tialaramex · on May 15, 2022

> Less memory - a pthread mutex is around 40 bytes, compared to 4 bytes for this queue.

This sort of thing is pretty ugly, and is one place where "portable" C and C++ really betrays the value of the language. Chances are many (often all) your uses are Linux, and so a futex is all you actually needed, which costs 4 bytes like this structure. But to be portable you can't ask for a futex, and it's deliberately not made easy to try to directly call into Linux system calls for futex since you'll probably get it wrong and then blame everybody else.

So you ask for pthread_mutex and now it's ten times bigger and you're regretting your life choices.

arinlen · on May 15, 2022

> So you ask for pthread_mutex and now it's ten times bigger and you're regretting your life choices.

It's 36 bytes bigger.

Unless you're for some reason doing embedded development with a resource constrained system running a UNIX-like OS, that is not a concern that registers anywhere.

And you get standard code that runs anywhere.

Arguing about 36 bytes per critical section makes no sense. You waste far more than that by handling a string.

gpderetta · on May 15, 2022

It is very much a concern as you can't as easily fit your mutex protected data on the same cacheline as the mutex.

gpderetta · on May 15, 2022

> wait-free lock

That's an oxymoron.

grogers · on May 15, 2022

Lock/wait freedom is not about performance but about progress guarantees. Often lock free data structures are not faster than locked ones. Reducing contention applies to both locked and lock-free structures.