

Distributed Read-Write Mutex in Go - Jonhoo
https://gist.github.com/jonhoo/05774c1e47dbe4d57169

======
jaytaylor
I thought "distributed" would mean distributed over the network, which I could
definitely make use of! (Or perhaps I should just use zookeeper or equivalent
for that ;)

Anyways, in this case the term "distributed" is being used to describe a
mechanism which reduces memory contention when go is utilizing multiple cores
on one machine.

I'd love to see more exhaustive analysis of the performance implications for
this technique across a wide variety of usage scenarios.

~~~
epberry
N00b question here but when people refer to mutex's, they are usually talking
about multiple cores on a single machine right? Isn't there a different
terminology for locks taken across the network in distributed systems?

~~~
Matt3o12_
No. There are usually talking about 2 threads accessing the same code.

Let's suppose we have an int, and want to add 2, we would do:

1\. int i = sharedInt

2\. i = i + 2

3\. sharedInt = i

(that's what the compile actually does when you write sharedInt += 2).

If the runtime decides two stop the current thread on line 2 and lets the
second one run all three lines, we would have a problem if I didn't use a
mutex.

~~~
hurin
You can compile on x86 for atomic increments? (Go seems to support it
[https://golang.org/pkg/sync/atomic/](https://golang.org/pkg/sync/atomic/))

~~~
jzelinskie
This is available only under certain architectures (x86 is one of them as
you've noted), but I've also seen scenarios where the atomic increment
instruction is actually slower than using a mutex. Don't consider this feature
to be a magic bullet and always test your use cases!

As for the parent discussion, usually a mutex is talking about the threading
construct while a "lock service" is how you'd refer to something like etcd or
zookeeper.

~~~
YZF
Mutexes are typically implemented over atomic instructions. So you'd do
something like atomic compare/exchange to acquire the mutex and if there's no
contention you got it. If there is contention you go to the OSes
synchronization constructs which are typically much slower... An atomic
increment _should_ always be faster than acquiring a mutex and incrementing...

------
Jonhoo
FWIW, this has now moved to
[https://github.com/jonhoo/drwmutex](https://github.com/jonhoo/drwmutex) so
you can use it directly in your projects.

------
1amzave
Possible unintended side-effect: `taskset -p` might have "interesting" effects
on your process.

~~~
Jonhoo
I don't think taskset should affect this at all. It modified the CPU affinity,
which my code respects, so everything should behave as expected.

~~~
1amzave
So what happens when you find yourself running on a CPU that wasn't in your
initial affinity mask?

Also, the "sleep for 1 ms" approach used in `init()` looks wrong -- if
`sched_setaffinity()` doesn't guarantee that the calling task has been
migrated to one of the target CPUs on return (which I suspect it does), I
don't think sleeping for a millisecond is going to change anything.

~~~
Jonhoo
You'll acquire CPU 0's lock (since a map lookup with an invalid key yields the
zero value). I agree this isn't optimal when you _change_ the affinity of a
process after it has started. You could imagine a scheme where, if this
happens, you create a new lock, but that would significantly complicate the
scheme as you would now potentially need to take a read lock on the map in
case it changes under you. It's annoying that CPUID values aren't guaranteed
to be without holes, but that's what we're stuck with.

Yeah, the sleep is a leftover from an earlier version of the code that didn't
use sched_setaffinity. I've remove it now.

