
Reference Counting: Harder Than It Sounds - sanjoy_das
http://www.playingwithpointers.com/refcounting-harder-than-it-sounds.html
======
akkartik
This is probably simplistic, but in my safe, toy assembly language for
teaching programming I simply avoid ever sharing (always refcounted) pointers
between threads. Instead it pervasively uses channels for communication, and
while channels are generic anything you put in them gets deep-copied on write.
The deep copy is smart enough to preserve cycles, so say if you send in a
linked list that contains a cycle the reader will see a linked list with an
identical topology. It will just be _utterly_ disjoint with the original
linked list.

These design decisions allow me to provide safe pointer access and avoid all
race conditions while teaching programming and concurrency, but they probably
incur significant performance loss on certain programs. My hope is that the
design constraints they impose on the programmer aren't insurmountable. We'll
see.

(More info on the project:
[https://github.com/akkartik/mu#readme](https://github.com/akkartik/mu#readme).
On its memory model:
[https://news.ycombinator.com/item?id=11855470](https://news.ycombinator.com/item?id=11855470).
On the deep-copy implementation:
[https://github.com/akkartik/mu/blob/07ab3e3f35/073deep_copy....](https://github.com/akkartik/mu/blob/07ab3e3f35/073deep_copy.cc#L243))

~~~
sz4kerto
The tricky part with channels and deep copies that it does not solve the
problem completely, just shifts it to a higher level. You will still have
'identity', and concurrent modification of the state of an identity has to be
solved somehow.

~~~
akkartik
I'm not sure any problem needs a solution that requires concurrent
modifications to the state of an identity, though. Can you think of any such?

(Rust seems to be making the same bet.)

------
zzzcpan
I'm thinking, for shared refcounted pointers would it be better to just move
synchronization off the critical path completely? I mean operate on local
pointers in each thread, like it's a single threaded app, but every hundred
decrements merge their counters. And release memory only if counters were zero
and synchronized for some time, i.e. for at least a couple of synchronizations
on every thread or something. It should be possible to get an order of
magnitude better performance, than with any kind of synchronized refcounters.

~~~
Qantourisc
Suppose you could make a queue for each thread, and stick the to-in/decrement
list on there. But I'm no expert.

~~~
zzzcpan
Yeah, already glanced through a bunch of papers on refcounting. Seems like
people have tried similar ideas and quite successfully.

------
nwalfield
One way to reduce the cost of cross CPU synchronization is to use sloppy
reference counters:

"An Analysis of Linux Scalability to Many Cores"
([https://pdos.csail.mit.edu/papers/linux:osdi10.pdf](https://pdos.csail.mit.edu/papers/linux:osdi10.pdf))

------
radiospiel
I don't see how in the example thread b's refcount could be zero, since there
would exist a reference on thread a already (or else A could not create
another reference) So how can that happen?

~~~
Sharlin

         if (--old_val->refcount == 0)
    

It's pre-decrement; the refcount is decremented _before_ comparison.

~~~
sanjoy_das
In (other) words, the situation is that you've just decremented the reference
count of an object, because you've nulled out the only location in the heap
that reached it. The reference count becomes zero after decrementing, so you
know that _now_ there are no slots in the heap that point to it; but how do
you know that there isn't a thread that fetched the object out of the heap
before you started, and got stalled before it could increment the reference
count and has been stalled since then?

(I'll try to edit the post to make this clearer ^).

I'd also like to stress that there are many ways around the problem, the only
point of the post is that you'll have to solve some non-obvious problems if
you try to generalize reference counting to a heap shared across threads.

~~~
gpderetta
How did the other thread get the reference to the object? The only possible
existing reference is the one we are using to decrement the shared counter.

Decrementing is done when the reference itself is being dropped (set to null
or to point to a different object), which is a logically mutating operation
(remember that only the reference count updates are atomic, the operations on
the references themselves are not), thus no other acquire operation can be
happening concurrently or it would be a data race.

Any concurrent operations on the reference itself must be synchronized via
external means, usually a mutex. Of course concurrently mutating distinct
references which refer to the same object/ref count is fine.

edit: rewording

~~~
sanjoy_das
> How did the other thread get the reference to the object? The only possible
> existing reference is the one we are using to decrement the shared counter.

I don't think this affects the point you're trying to make, but I suppose you
could have three threads, where the logical operations are:

    
    
        ThreadA:
          obj = x->field_a;
        
        ThreadB:
          x->field_a = null;
        
        ThreadC:
          x->field_b = null;
    

with both field_a and field_b pointing to the same object initially. It does
not affect your point, since ThreadA and ThreadB are now racing.

> Decrementing is done when the reference itself is being dropped (set to null
> or to point to a different object), which is a logically mutating operation
> (remember that only the reference count updates are atomic, the operations
> on the references themselves are not), thus no other acquire operation can
> be happening concurrently or it would be a data race.

It depends on your programming language. In Java racing on field updates (at
the Java level) is well defined (but is allowed to return counter-intuitive
results to some degree). That is:

    
    
        ThreadA
          int k = obj.field.hashCode()
    
        ThreadB
          obj.field = someOtherValue
    

is defined and is not allowed to have arbitrarily bad effects like crashing
the VM. This is different from C++ (where these kind of accesses are UB, as
you seem to imply). Generally, I think for high level languages it is better
to have Java-like semantics where even racy accesses have some guarantees.

For C++, I can get the same Java-like guarantees by using
`memory_order_relaxed` loads, but I suppose it is defensible for an atomic
`shared_ptr` to have a complex refcounting protocol even for
`memory_order_relaxed` loads and stores.

> Any concurrent operations on the reference itself must be synchronized via
> external means, usually a mutex.

Not sure how you're using a reference here, but if by "reference" you mean "a
location in the heap" then that does not apply for Java. I personally tend to
use "reference" in the same way as "pointer".

> Of course concurrently mutating distinct references which refer to the same
> object/ref count is fine. edit: rewording

edit: formatting

~~~
gpderetta
I was of course discussing the semantics of a C-like language, especially in
light of the C++11 memory model.

Being memory safe at all cost, java (and any other shared-memory memory safe
language) must guarantee minimum of safety to concurrent reference updates. A
JVM which uses reference counting instead of a proper GC would be interesting
to implement...

You can't currently put a shared_ptr in an atomic<> variable as it does not
meet the concept requirements (Trivially constructible and copyable), but,
IIRC, you will in C++17 and, yes, even relaxed loads will be expensive (in
practice implementations are expected to lock a spinlock around every
operation)

