Hacker News new | past | comments | ask | show | jobs | submit login

Python does reference counting.

The downside is that you can leak memory if you have circular references. The programmer needs to take care to not let this happen. (This is essentially the cause of the Javascript memory leaks in IE, although the reference counting only occurred at the boundary between Javascript objects and browser objects.)

I believe python handles this by occasionally running a garbage collector that looks for leaked objects.

Objective C offers weak pointers (pointers that "don't count" and zero themselves out when all the other pointers to the object disappear), which can aid a programmer in avoiding circular references.

Finally, I believe it has been shown that reference counting can be slower that a well-implemented garbage collector, but I suspect this depends on your specific workload.




Reference counting is also the reason why Python requires a GIL. Incrementing a reference count is an opportunity for a race condition (another thread could read the refcount in between your read and write), which means that every reference count needs to be protected by a lock. Either you do fine-grained locking for each object, or you add a GIL for the whole interpreter. The former will absolutely kill performance (not only do you need to increment a refcount with every assignment or function call, you need to take a lock). The latter makes the whole interpreter thread-hostile.


You're off in the weeds. A refcount can be incremented/decremented with a simple atomic compare-and-swap, and that's exactly what most refcounting systems do.

It's not free, but it's damn close to it.


If that is the case then why doesn't Python do it? I mean this as a real question; I genuinely don't know.


Because it was designed and written poorly, and its thread-safety issues extend far beyond refcounting.


Python also has weakrefs to avoid creating cycles

http://docs.python.org/3.4/library/weakref.html


> The downside is that you can leak memory if you have circular references. The programmer needs to take care to not let this happen.

Not entirely true[1]:

> CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.

The "most object" and "not guaranteed" refer to[2]:

> Objects that have __del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily in the cycle but reachable only from it. Python doesn’t collect such cycles automatically because, in general, it isn’t possible for Python to guess a safe order in which to run the __del__() methods. If you know a safe order, you can force the issue by examining the garbage list, and explicitly breaking cycles due to your objects within the list.

Note that this is Python 2 thing; in Python 3, the behavior has changed, and cycles with __del__ methods get collected. I include it because many people still use Python 2. As for Python 2, however, I believe this case is rare, and requires both an object to be involved in a cycle (a bit rare) and have a __del__ method (I've never written such a method in many years of Python).

> Objective C offers weak pointers

Python also offers weak pointers.

> Finally, I believe it has been shown that reference counting can be slower that a well-implemented garbage collector

My understanding is that this is true, but workload dependent. I think this mostly comes from being able to quickly allocate memory, because your heap is just allocating by moving a pointer the required number of bytes through the heap. (Allocation is just an addition, mostly.) When a collection happens, objects not collected move to a different heap (the next generation), and the heap used for allocation is emptied out. The amount of work and moving depends on the number of objects remaining, which tends to be low, since objects are short lived.

The advantage to pure refcounting, in languages such as C++, is that collection is deterministic. In the rare case that you have a cycle, then you do have to handle it manually (that's the tradeoff), but I find these are extremely rare. Memory management in modern C++ is a non-issue most of the time. And prevents bugs like:

  data = open(filename, 'rb').read()
I see this bug far too often. One could argue that in Python, it'll get collected via refcount, but this isn't guaranteed by the language. I've seen the same bug in Java/C# code, and there are no refcounts to save you there. (Of course, these languages have a with/using/etc. block, and it's an easy fix, yet nonetheless, these bugs are frequent.)

  [1]: http://docs.python.org/2.7/reference/datamodel.html#objects-values-and-types
  [2]: http://docs.python.org/2/library/gc.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: