Hacker News new | past | comments | ask | show | jobs | submit login

Deterministic destruction is an excellent language feature. I really wish there were a garbage-collected language that separated destruction from deallocation, because the two are completely orthogonal.

When an object goes out of scope, the destructor should run immediately; then, when the garbage collector wants to actually deallocate destroyed objects, it can go right ahead. No idiocy like in C# with “using” blocks.




If it were possible to efficiently detect "when an object goes out of scope" (actually the question is when the object becomes unreachable) in the general case, then every language would do it. They'd immediately run the destructor and immediate collect the memory, because why not? But it's not possible to efficiently detect when an object becomes unreachable in a general way, or at the very least we don't know how. (I'm not certain if it's actually been proven impossible.)


There's automatic reference counting, which kind of works in some cases. But fails spectacularly in others - a simple doubly-linked list will defeat ARC. In essence, any reasonably complex program that relies on it will probably leak memory, unless the programmer has taken the time to find the spots where the ARC fails and takes care of them manually.

If a reference cycle happens then the only good way to detect that the objects involved have been orphaned is to walk the object graph. Doing that every time an object is deallocated would be horribly inefficient. Also, if you're doing it then there's really no need for the reference counting in the first place, since the object graph walk is perfectly capable of finding all the orphan objects on its own. Much better to just do it periodically, so you can get a whole swath of objects each time you walk the graph.

At which point we've come to to garbage collection, and discovered why that's what all the kids are doing these days.


Reference counting is also not free by itself (ignoring its cycle problems). There's overhead to all the incrementing and decrementing. It can add up in modern systems that toss around objects willy-nilly. Reference counting also adds a ton of additional synchronization points in a multi-threaded scenario.


Yup. I don't think it's any coincidence that the only really prominent place where ARC is being used is iOS.

It's a platform where reference counting was already in place, where you can have a reasonable hope that most the programs are too small and simple to take much of a hit on the overhead, and where memory leaks can't get too far out of hand because the OS reserves the right to kill processes whenever it wants.


> where you can have a reasonable hope that most the programs are too small and simple to take much of a hit on the overhead

What overhead? There's no runtime overhead to ARC compared to manual reference counting.


If I understand the distinction you draw between deallocation and destruction correctly, then in .NET a well-designed object that does not maintain unmanaged resources is effectively destroyed the moment the last reference to it goes out of scope. After all, it's disappeared from the world, never to appear again.

All the garbage collector does is comes through later and deallocates the memory it consumed.


Normally, the deallocation of the destroyed object has no visible effect, and you're not expected to do anything with object after it's destroyed.

So, in case of C++, object is destroyed when goes out of scope, but the physical act of deallocation may be deferred by the standard library as whatever 'smart memory manager' wishes.


To support this you'd need refcounting of each object on top of the gc, which would be a substantial overhead.


Technically you only need refcounting of objects with a destructor. For everything else, destruction is a no-op and you can just let the GC re-use the object's memory. And if the destructor is used pretty much just for side effects (closing files & sockets etc.), these will typically be rare & pose little overhead.

Would suck in a dynamically-typed language, though, because you have no idea whether a reference points to an object with or without a destructor, and so always have to pay the overhead of checking and possibly decrementing the refcount. I don't really see this working without a static type system.

BTW, you're often better off with explicitly-scoped destruction (eg. the Ruby & Python way) than refcounting, for the same reason that boost::shared_ptr can be tricky and Java programs can leak memory. If you don't have clear object ownership & lifetimes, it's very easy to stash a reference to an object in some longer-lived object (a cache, for example) and accidentally hold onto it way longer than anticipated.


> Would suck in a dynamically-typed language, though, because you have no idea whether a reference points to an object with or without a destructor, and so always have to pay the overhead of checking and possibly decrementing the refcount. I don't really see this working without a static type system.

It's worse than that. Any object that contains references to other destructible objects (including transitive references), must have an implicit destructor to decrement the refcounts of the destructible objects it points to. In fact, any object that holds a reference, directly or indirectly, to a generic "Object" class or equivalent must have an implicit destructor. This makes a huge part of the object graph implicitly destructible.


This is exactly what Python does, and while Python isn't the fastest language around, it's not that slow either, when compared to other dynamically typed, interpreted languages.


OTOH, refcounting is one of the main reasons why it's virtually impossible to eliminate the GIL, which is a showstopper for many multi-core usages of Python.


My understanding is that circular references wreak havoc with the concept of 'scope of a single object', and make implicit deterministic destruction unsafe if not impossible:

a = new A(); b = new B(); a.b = b; b.a = a;

Whichever you destroy first, the other destructor will have a reference to an unconstructed object.


To get around this you can distinguish between references that imply ownership and those that don't. In C++ pointers and references do not imply ownership, but values do.

It isn't ideal, but I think it's an impossible situation - I think predictable destruction is fundamentally at odds with automatic, transparent memory management when you push the ideas to their limits. The idea of automatic resource cleanup is tied closely to the idea of tight (recursive) resource ownership. Automatic memory management exists mostly to take care of the cases when object ownership and lifetime isn't clear.

Reference-counting smart pointers (used properly) can technically give you deterministic destruction, but it's kinda beside the point. We want our file handles and our connections closed as soon as we're done with them, not some time in the future when someone else's code cleans up its references to our old objects.


Which is where .NET hits a pretty good compromise, IMO. For things where deterministic cleanup really is important (file handles, connections, COM objects, etc.) there's IDisposable to provide for deterministic destruction of that stuff.

Folks do still complain that they can't deterministically destroy managed resources such as instances of String. I suspect those folks have missed the plot. That desire is indeed at odds with the memory manager; what is being requested could not (for the most part - the CLR's large object heap does complicate the story a bit) be provided without wholesale downgrading .NET to a less efficient collection algorithm.

It'd be much wiser to instead learn how to implement performance-critical routines that really would benefit from manual memory management in a language that's better suited to that kind of task, such as C. Whining that a garbage collected run-time uses garbage collection is just being silly, not entirely unlike complaining that your horse isn't very good at swimming.


Objects aren't deallocated in garbage collected languages. The memory they had allocated just doesn't get copied to the new heap -- which is a big difference.


You could argue that objects aren't deallocated in non-GC languages either, the memory they had allocated is just available for other objects to use. Which sounds kinda like "deallocated" to me, and yet it's the exact same thing as the allocation pointer in a stop & copy GC being free to overwrite them.


Generally in other languages the memory is immediately freed and available to be reallocated. This is not true in most garbage collection strategies.


Nor is it particularly desirable. One of the cornerstones of how a modern generational GC performs as well as it does is that blocks of objects are freed up all at once.

If they were deallocated in a piecemeal fashion then:

- Freeing up memory would take longer, because you can't just change a pointer back to the root of the generation.

- Allocating memory would take longer, because it involves searching for a big enough empty slot rather than just throwing the new object on top of the heap.

- More memory is wasted overall, because of heap fragmentation.


That depends on the GC implementation. Maybe you're saying that because most modern collectors tend to generational collection, and copy between generation heaps.

But copying is not required for GC. The trusty mark-and-sweep algorithm doesn't copy, for example.

See: http://en.wikipedia.org/wiki/Garbage_collection_%28computer_...


In fact, one can claim that mark-and-sweep is one of the few garbage collection algorithms that actually collect garbage.

Many others collect non-garbage. Because there typically is way less non-garbage than garbage, that is what makes them competitive with alloc/free.


That's an interesting point, thanks.


I totally wonder why do they not have it this way. I have posted the same question on various forums but have never heard any real answer.


Well, deterministic deallocation is difficult to combine with non-deterministic garbage collection.

Going out of scope is deterministic, but the object that has gone out of scope might still be referenced from somewhere if the reference has "escaped". Sometimes automatic escape analysis can work when references escape, but not always.

Would you like the object to have its destructor called even though it might be still be accessible? What would happen if the program subsequently tries to access the object?

Most languages are naturally uncomfortable with tidying up some resoure while it still might referenced. For this reason languages prefer to leave it up to the programmer to explicitly manage destruction themselves.

A perfect example is C#'s IDisposable interface. The language obviously "knows" that a disposable object needs cleaning up after it is used - that's what IDisposable means - but it cannot work out when this should happen. So it lets the programmer decide when that actually happens. If a programmer wants destruction based on scope then she should use the using statement. If she wants destruction based on some other condition then that is possible too. The language cannot make this decision.

I hope that answers your question.


The CLR also has different concepts for destruction and deallocation. Not sure what the using block and IDisposable have to do with that.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: