> *Reference counts automatically collect garbage.* Reference counts are used to...

pron · on May 12, 2019

> this meme needs to die. Garbage collection is not "super sophisticated reference counting", and in fact, can't even be used in most places where reference counting is used. It's not a substitution, or a superset, of reference counting.

I never said that garbage collection is sophisticated reference counting so if you've seen that "meme," it wasn't here. But reference counting and tracing are two classical approaches to garbage collection, the garbage collection literature does refer counting as a garbage collection algorithm, and refcounting is used for garbage collection in languages such as Rust. Refcounting has other uses, too, but this discussion is about garbage collection and not about 101 uses for refcounting.

Anyway, reference counting GC is not intrinsically more primitive or worse-performing than tracing, but in practice, tracing GCs tend to be sophisticated, while languages that rely on refcounting usually employ a rather simple algorithm. In fact, this is one of the attractions of refcounting for GC: it is easier to get OK performance with primitive refcounting than with primitive tracing.

erik_seaberg · on May 11, 2019

You can close a non-memory resource when an object that owns it becomes garbage (Java calls this a "finalizer"), it's just not common. Those resources tend to be too expensive to leave unused indefinitely, and explicit closing risks IOExceptions but not disastrous undefined behavior.

Refcounting detects new garbage more quickly than periodic tracing, which is why it costs so much more in bus cycles. But both are dynamically detecting live objects and garbage. I've also seen systems where the only reference count values are "one" and "more than one", with the latter going uncollected until tracing.

cesarb · on May 12, 2019

> Those resources tend to be too expensive to leave unused indefinitely

The real issue is not that these resources are expensive. The real issue is that they're limited. The usual example is the file descriptor table: once all 1024 (or whatever your ulimit is) slots are in use, trying to open any new file or file-like resource will fail. The same can happen for instance with disk space (once I had a system which could only release a disk file when the corresponding in-memory object was closed, and someone else forgot to close the objects after using them, so all disk space allocated to that system was used up and it stopped working), database connections, and many others.

In fact, the only reason this isn't an issue with memory (which is also limited) is that the garbage collector notices when it's getting full, and starts trying to collect garbage. It could in theory do so for other resources like file descriptors, but not all (how could it know that a remote database, shared with other users, is nearing its connection limit?), and it would probably be very inefficient (like having to do a full scan of the memory just to have a chance to find an open file handle which isn't referenced by any live object and therefore could be closed).

The main advantage of reference counting is that it's deterministic: it doesn't just detect new garbage "more quickly", it detects new garbage immediately after it becomes garbage, which keeps the resource usage at a minimum.

pron · on May 12, 2019

> it's deterministic

Sort of. It's only deterministic in the sense that as soon as an object is not referenced it is freed, but it's not entirely deterministic because you don't know exactly when (or even in what thread) that "as soon as" happens.

> it detects new garbage immediately after it becomes garbage, which keeps the resource usage at a minimum.

Yes, but that comes at a cost to performance. One of the observations that tracing GCs make is that you may not need to do any work to deallocate memory at all. For example, a simple copying algorithm used to manage the young generation in many generational collectors is based on the fact that more objects die young than survive, and it only does work to keep the (fewer) surviving objects alive and does not do any work to free (the many) dead objects. As a result allocation itself becomes much faster. Tracing indeed requires more memory, but it automatically uses that memory to reduce the cost of memory management. In fact the (amortized) cost of memory management can be reduced arbitrarily by enlarging the heap and using a tracing collector, so much so that it could, in principle, be made cheaper than even stack allocation.

cesarb · on May 12, 2019

> but it's not entirely deterministic because you don't know exactly when (or even in what thread) that "as soon as" happens.

But you do know when and where: it's at the exact moment the last reference to the resource was released, and in the exact thread that did this last release. In many cases, this will be a single place and thread; in other cases, it will be a well defined set of places and their corresponding threads, with the one coming temporally last being the one where the resource is freed (that is, the resource is held for as long as it's needed, and no longer).

> > it detects new garbage immediately after it becomes garbage, which keeps the resource usage at a minimum.

> Yes, but that comes at a cost to performance.

I was talking not only about memory, but also about other (more limited) resources like file descriptors. But yes, for memory there is a time/space tradeoff: reference counting minimizes memory usage, but for that it has to do more work by constantly freeing and allocating memory.

> Tracing indeed requires more memory, but it automatically uses that memory to reduce the cost of memory management.

Which occupies memory which could be used instead to reduce other costs. For instance, this extra memory could have been used by the operating system's disk cache.

pron · on May 12, 2019

> in other cases, it will be a well defined set of places and their corresponding threads

But that's also true for GC, only the set of places and threads is larger :) So I agree that refcounting is a more deterministic garbage collection than tracing, but it's still not deterministic. Deterministic is when the "set of places and threads" is of size 1.

> Which occupies memory which could be used instead to reduce other costs. For instance, this extra memory could have been used by the operating system's disk cache.

This is not quite true. A given machine, with a given set of processors P, running application X, has a certain processing capacity beyond which it cannot go. To reach that maximum capacity it has a certain working set size, M(P, X), that it can put in RAM. It can cache it from disk or from the network, but increasing that cache beyond that point won't help you with performance any more (especially for distributed caching, where the distributed workload would just cause lots of cache invalidation). Caching anything beyond M would not help (and perhaps could even hurt). But you can increase RAM by any amount beyond M to reduce the cost of memory management almost arbitrarily low with a tracing GC (the increased throughput may increase M, too, so that more memory could be put to other uses, not less).

This is why so many people want a low-latency GC. You can get machines with 6TB RAM (probably even more now). Very few applications can benefit from an effective working set of that size (unless they have 500 cores or something).

Of course, in environments where RAM is constrained tracing may not be the best choice.

erichocean · on May 12, 2019

> it's just not common

It's not common because it doesn't actually solve deterministic resource disposal, which is what is actually required, and a problem reference counting trivially solves.