Hacker News new | past | comments | ask | show | jobs | submit login
Reference Counting: Harder Than It Sounds (playingwithpointers.com)
53 points by sanjoy_das on July 24, 2016 | hide | past | web | favorite | 24 comments



This is probably simplistic, but in my safe, toy assembly language for teaching programming I simply avoid ever sharing (always refcounted) pointers between threads. Instead it pervasively uses channels for communication, and while channels are generic anything you put in them gets deep-copied on write. The deep copy is smart enough to preserve cycles, so say if you send in a linked list that contains a cycle the reader will see a linked list with an identical topology. It will just be utterly disjoint with the original linked list.

These design decisions allow me to provide safe pointer access and avoid all race conditions while teaching programming and concurrency, but they probably incur significant performance loss on certain programs. My hope is that the design constraints they impose on the programmer aren't insurmountable. We'll see.

(More info on the project: https://github.com/akkartik/mu#readme. On its memory model: https://news.ycombinator.com/item?id=11855470. On the deep-copy implementation: https://github.com/akkartik/mu/blob/07ab3e3f35/073deep_copy....)


The article is about making ref-counting thread safe in an uncooperative environment (c++ shared_ptrs). If you're willing to relax one of those conditions, there are solutions that don't require giving up shared data. See Yossi Levanoni and Erez Petrank, An on-the-fly reference counting garbage collector for Java. If you buffer all reference count updates in thread local buffers, then process increments and deferments in batches with threads paused, you can use a write barrier that has no synchronization operations. Though IIRC the Levonani-Petrank write barrier assumes stronger ordering of stores than is true on some architectures.

EDIT: http://www.cs.technion.ac.il/~erez/Papers/refcount.pdf. The algorithm requires cores to see other cores" stores in program order, which is true on x86 (and ARMv8 with the correct instructions?)


I remember the horror I felt when first reading about threads in Novell and OS2 back around 1990 or so.

Threads provide a lot of research opportunities. Should they really be exposed at the application programming level? Java was a nice try, but I think any language that is going to tightly integrate threads should emulate processes and interprocess communication, isolating casual variable references between threads.

Cue: somebody more knowledgable than me discuss Erlang here...


Yeah, CSP is awesome.

The traditional thing to do here is to use immutable data structures; because they can never change, you don't need locks to access them, which means you can pass pointers between threads willy-nilly. And if you're sending a message to a process that doesn't share memory, you can fall back to serialisation.

Bear in mind that you can still get race conditions and deadlocks with CSP --- consider a process which provides `get` and `set` messages, and then two other processes try to do `c.set(c.get() + 1)`.


Saying "use immutable data structures" here assumes garbage collection though.

If you didn't have garbage collection, when would you release the memory for these structures? That would bring you back to refcounting.


Is it a problem in practice to use reference counting for an "immutable data structures" ? Obviously it's not 100% immutable as the counters are updated. But if it's done with a thread safe reference counter the data structures is usable like a pure immutable data structure and doesn't need an additional garbage collector.


Oh, yeah, I missed that bit of context. Oops. OP's comment makes much more sense now...


The tricky part with channels and deep copies that it does not solve the problem completely, just shifts it to a higher level. You will still have 'identity', and concurrent modification of the state of an identity has to be solved somehow.


I'm not sure any problem needs a solution that requires concurrent modifications to the state of an identity, though. Can you think of any such?

(Rust seems to be making the same bet.)


The solution I like best, which someone in #proglangdesign mentioned the other day, is to only pass serializable data through. That's how I plan to do it.


For an actor-based language that avoids deep-copying, check out Pony (http://www.ponylang.org/). It uses a small stack of reference capabilities to ensure safety even if you pass a mutable structure.


Interesting! Could you point me at a talk or paper specifically about that aspect?


Check out "Deny Capabilities for Safe, Fast Actors", "fast-cheap.pdf" at

https://github.com/ponylang/ponylang.github.io/tree/master/m...


I'm thinking, for shared refcounted pointers would it be better to just move synchronization off the critical path completely? I mean operate on local pointers in each thread, like it's a single threaded app, but every hundred decrements merge their counters. And release memory only if counters were zero and synchronized for some time, i.e. for at least a couple of synchronizations on every thread or something. It should be possible to get an order of magnitude better performance, than with any kind of synchronized refcounters.


It turns out that the whole notion of a 'reference count' that tracks the exact number of handles is unnecessarily powerful. If you weaken the guarantee to two states - 0 or greater than 0, you can do even better. See SNZI (scalable non-zero indicators): http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83....


Suppose you could make a queue for each thread, and stick the to-in/decrement list on there. But I'm no expert.


Yeah, already glanced through a bunch of papers on refcounting. Seems like people have tried similar ideas and quite successfully.


One way to reduce the cost of cross CPU synchronization is to use sloppy reference counters:

"An Analysis of Linux Scalability to Many Cores" (https://pdos.csail.mit.edu/papers/linux:osdi10.pdf)


I don't see how in the example thread b's refcount could be zero, since there would exist a reference on thread a already (or else A could not create another reference) So how can that happen?


     if (--old_val->refcount == 0)
It's pre-decrement; the refcount is decremented before comparison.


In (other) words, the situation is that you've just decremented the reference count of an object, because you've nulled out the only location in the heap that reached it. The reference count becomes zero after decrementing, so you know that _now_ there are no slots in the heap that point to it; but how do you know that there isn't a thread that fetched the object out of the heap before you started, and got stalled before it could increment the reference count and has been stalled since then?

(I'll try to edit the post to make this clearer ^).

I'd also like to stress that there are many ways around the problem, the only point of the post is that you'll have to solve some non-obvious problems if you try to generalize reference counting to a heap shared across threads.


How did the other thread get the reference to the object? The only possible existing reference is the one we are using to decrement the shared counter.

Decrementing is done when the reference itself is being dropped (set to null or to point to a different object), which is a logically mutating operation (remember that only the reference count updates are atomic, the operations on the references themselves are not), thus no other acquire operation can be happening concurrently or it would be a data race.

Any concurrent operations on the reference itself must be synchronized via external means, usually a mutex. Of course concurrently mutating distinct references which refer to the same object/ref count is fine.

edit: rewording


> How did the other thread get the reference to the object? The only possible existing reference is the one we are using to decrement the shared counter.

I don't think this affects the point you're trying to make, but I suppose you could have three threads, where the logical operations are:

    ThreadA:
      obj = x->field_a;
    
    ThreadB:
      x->field_a = null;
    
    ThreadC:
      x->field_b = null;
with both field_a and field_b pointing to the same object initially. It does not affect your point, since ThreadA and ThreadB are now racing.

> Decrementing is done when the reference itself is being dropped (set to null or to point to a different object), which is a logically mutating operation (remember that only the reference count updates are atomic, the operations on the references themselves are not), thus no other acquire operation can be happening concurrently or it would be a data race.

It depends on your programming language. In Java racing on field updates (at the Java level) is well defined (but is allowed to return counter-intuitive results to some degree). That is:

    ThreadA
      int k = obj.field.hashCode()

    ThreadB
      obj.field = someOtherValue
is defined and is not allowed to have arbitrarily bad effects like crashing the VM. This is different from C++ (where these kind of accesses are UB, as you seem to imply). Generally, I think for high level languages it is better to have Java-like semantics where even racy accesses have some guarantees.

For C++, I can get the same Java-like guarantees by using `memory_order_relaxed` loads, but I suppose it is defensible for an atomic `shared_ptr` to have a complex refcounting protocol even for `memory_order_relaxed` loads and stores.

> Any concurrent operations on the reference itself must be synchronized via external means, usually a mutex.

Not sure how you're using a reference here, but if by "reference" you mean "a location in the heap" then that does not apply for Java. I personally tend to use "reference" in the same way as "pointer".

> Of course concurrently mutating distinct references which refer to the same object/ref count is fine. edit: rewording

edit: formatting


I was of course discussing the semantics of a C-like language, especially in light of the C++11 memory model.

Being memory safe at all cost, java (and any other shared-memory memory safe language) must guarantee minimum of safety to concurrent reference updates. A JVM which uses reference counting instead of a proper GC would be interesting to implement...

You can't currently put a shared_ptr in an atomic<> variable as it does not meet the concept requirements (Trivially constructible and copyable), but, IIRC, you will in C++17 and, yes, even relaxed loads will be expensive (in practice implementations are expected to lock a spinlock around every operation)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: