Hacker News new | past | comments | ask | show | jobs | submit login
Reclaiming C++ Data Structures with Cycles (justsoftwaresolutions.co.uk)
55 points by criddell on Oct 13, 2016 | hide | past | favorite | 92 comments



I recently watched Herb Sutter's talk "Leak-Freedom in C++", described in the article [1], and I can't help but notice that C++ community has a strong bias against garbage collection. ... And then he describes the very problem you can't solve without GC: memory management for graphs with cycles. Solving this problem is equivalent to implementing GC. Of course, having your own specialized GC may help, but you may also benefit from your own memory allocation.

Why can't you acknowledge that there are problems that have GC as the only and best solution?

(Note: it's possible to mix memory management approaches, e.g only pointers to graph nodes are being garbage collected and everything else has a nice hierarchical ownership.)

[1] https://www.youtube.com/watch?v=JfmTagWcqoE


>I can't help but notice that C++ community has a strong bias against garbage collection. [...]

>Why can't you acknowledge that there are problems that have GC as the only and best solution?

Your prelude and the followup question is not well-formed.

C++ programmers do not have a bias against GC as a specific problem-solving technique. In fact, expert C++ programmers can embrace GC so much that they can write an entire virtual machine[1] with GC and a DSL[2] for that vm that takes advantage of memory safety. Both the CLR vm and the (original) C# compiler were written by C++ programmers.

What the C++ community doesn't want is GC in the C++ base language itself or the standard runtime. That's a very different concept from a generalized "C++ bias against GC".

In other words, the following is unacceptable:

  std::string x = "Hello, " + fullname; // cpu cycles spent on GC
Those cpu cycles spent on constantly checking if "x" is no longer reachable is cpu power that's taken away from rendering frames of a 60fps game, or computing numeric equations or high speed quantitative trading. C++ programmers don't want GC as a global runtime that you can't opt out of. Also, global GC often requires 2x-3x the memory footprint of working memory which is extremely wasteful for the resource constrained domains that C++ is often used in.

Herb Sutter's presentation is compatible with "pseudo-GC-when-you-need-it" without adding GC to the entire C++ standard runtime.

[1]https://en.wikipedia.org/wiki/Common_Language_Runtime

[2]https://en.wikipedia.org/wiki/C_Sharp_(programming_language)


I agree with you, but in this

> std::string x = "Hello, " + fullname; // cpu cycles spent on GC

you are already spending cycles on memory management (if C++ allocates character data on the heap which I think it does). You are searching for free space in the heap, possibly synchronising to do that, and so on.

With a GC you may even use less cycles here! For example a copying GC could mean that you can allocate with a simple thread local bump pointer.

So in your statement you are already paying an unknown cycle cost for memory management. Why do you care if it's GC?

Your answer is probably the variance in the number of cycles - the noticeable pauses - which is a reasonable concern.


Yes, but in this case we know when the allocations will occur, and when they will be freed. If using a GC we know when they will occur, but do not know when they will be freed. Which means that at some indeterminate point in the future there will be a large temporary slowdown due to processing the GC.

This is one of the bigger reasons people use C++ and even techniques within it to explicitly collect such items at a known point in time. (Techniques such as marking items as dead in an array but still keeping them in there until the end of frame, etc)


> If using a GC we know when they will occur, but do not know when they will be freed. Which means that at some indeterminate point in the future there will be a large temporary slowdown due to processing the GC.

This just isn't true anymore. Incremental collectors can achieve pause times in the single-digit millisecond range, and concurrent collectors can achieve pause times in the single-digit to tens of microseconds range, even for super-high-allocation rate programs. There are even real-time collectors suitable for audio and other hard real-time applications.

Azul GC (a high performance concurrent compacting collector): https://www.azul.com/products/zing/pgc/

Metronome (a real-time GC with guaranteed maximum pause times): http://researcher.watson.ibm.com/researcher/view_group_subpa...

GC scheduling in V8 (hides GC pause times between frames, reducing animation jank): http://queue.acm.org/detail.cfm?id=2977741

The Go GC ships now with a very low-latency GC: https://blog.golang.org/go15gc


> This just isn't true anymore. Incremental collectors can achieve pause times in the single-digit millisecond range

Single digit milliseconds is millions of instructions, which /is/ a large slowdown in some applications.


How much do you think a page fault costs?


Page faults cost zero. On locked pages.


A millisecond is a huge amount of time, and in that time we can do so many more things more useful to the application than just collecting its trash.


Again downvotes. It'd be nice if people actually replied instead of downvoting.


This is the advantage of C++ for me. Deterministic behaviour.


But is it deterministic? Will the allocator always have a deterministic amount of work to do to find enough free space for your string characters? I'm not sure that's the case.


Which allocator? For those who care, they replace the allocator.


You can deactivate GC momentarily and reclaim memory when you want (end of frame, every ten frames, ...). Most of the time you can manage to write your code to minimize allocation, or make sure memory is allocated on the stack. Depending on your parameters and a little bit of profiling, you can manage to have a stable usage of memory over time and a bounded GC time.


malloc/free/etc. will only cost when they're called; GC can be a continuous expense even when there's no collection.


It is a matter how it is implemented.


Most GCs will not cost you a single cycle if you never allocate.


This is technically correct (though in most gc's, if you allocate and keep a single byte, you pay for it with various barriers, etc, forever) but then, because they have good GC's that are like this, almost every GC language used allocates all the time.

So it would be more accurate to say "Most GC's will not cost you a single cycle if you and the underlying language runtime and execution environment do not allocate".

IE your statement, while true in theory, makes literally no practical difference if allocations happen all the time without you knowing or controlling it.


But in most GC languages there is nothing you can do without allocating. Creating an object is already allocating it on the heap, printing a string will also allocate.


Not in GC languages that also have value types.

For example you can do all of that in Modula-3, Active Oberon, Component Pascal and many others without allocating more than in C for example.

Mixing all GC enabled languages in the same box is a mistake.


I don't know why they downvote you. Even in Java with no value types yet, there are ways to write useful code with no or almost no managed heap allocation. And if you don't need ultra low pauses, mixing these techniques, e.g. using managed heap for 80% of non-performance critical stuff and using careful manual dynamic allocation for the rest 20% (e.g. large data buffers) typically gets you all good things at once: high throughput, low pauses, almost no GC activity when it matters and convenience of writing most of the code.


I just recently wrote my own memory allocator for my own String. Now my String is at least 2x faster then the next fastest alternative (I have tried many alternatives for C++ strings including std::string of course). String allocation can be made very fast with thread local memory pools (and you just need a basic GC to free up memory if there are a lot of strings allocated in one thread but destroyed on another one).


It is very likely that in this particular case small string optimization would allocate data on stack and no heap allocations would take place.


> For example a copying GC could mean that you can allocate with a simple thread local bump pointer.

This is equally true for explicit memory allocation. The point is that on some allocations under GC, it will have to collect garbage. And collecting garbage will tend to be more expensive than explicit frees, because it usually has to do work to discover what is garbage.


There is an advantage to universal GC: it can simplify interfaces.

For example, in C every time a `const char *` appears in an API there has to be an implicit contract about who owns the string and how to free it. A language like Rust improves on this by enforcing such contracts, but the contracts still complicate every interface.

In a GC'd language you can just forget about these issues and just treat strings (and other immutable objects) almost as if they were ordinary fixed-sized values.


And C++ is all about choice trade-offs.

Put very simply: do your choice for your problem and deal with its consequences.


Nobody says GC has no advantages, the claim is that its disadvantages don't always offset them.


The way C/C++ get around the implicit ownership of const char * is everyone copies data around.


"Also, global GC often requires 2x-3x the memory footprint of working memory which is extremely wasteful for the resource constrained domains that C++ is often used in."

Memory fragmentation in manual memory management often comes at a similar or higher cost, which is often "forgotten" and neglected, because it is hidden:

https://bugzilla.mozilla.org/show_bug.cgi?id=666058#c31

And fragmentation may grow over time much more than the typical GC overhead. Then you have to do a "forced GC" that is restarting the application. Often happens to my phone or browser. I'm not sure if it is more due to fragmentation or more due too plain old memory leaks.


> Your prelude and the followup question is not well-formed.

Thank you for your feedback, I appreciate it.

> C++ programmers do not have a bias against GC as a specific problem-solving technique. Expert C++ programmers can embrace GC

Please watch this portion of the video, and the way the speaker is pronouncing the word "collect":

https://www.youtube.com/watch?v=JfmTagWcqoE#t=1h5m25s

Regarding general-purpose GC:

I don't support using GC for everything. If there is a problem that can only be solved with GC, it doesn't mean that we should express our entire running programs in terms of dynamic cyclic graphs. I believe we can do much better, with much fewer resources.


>Please watch this portion of the video, and the way the speaker is pronouncing the word "collect":

Herb was deliberately using circumlocution to avoid the phrase "garbage collection" so as to not poison the well. The latter part of the video explains his reasoning for presenting it that way:

https://www.youtube.com/watch?v=JfmTagWcqoE&t=1h32m55s

>I don't support using GC for everything. [...] I believe we can do much better, with much fewer resources.

This is a statement that sounds reasonable and balanced . How can anyone possibly disagree with it?!? The issue is that it's very difficult to take that universal wisdom and construct a real programming language that satisfies all the following properties seamlessly and elegantly:

1) language that has optional GC

2) has zero cost when not using it. This means no active tracing loop that checks object graphs and no large memory blocks reserved to allow reallocation and defragmention.

3) transparent syntax that's equally elegant for manual memory or GC in the same language

To make your GC-when-appropriate advice real, one has to show a real implementation. E.g. one can fork the GCC or LLVM compilers and add GC to it. Or fork the Java OpenJDK compiler and show how syntax can remove the GC on demand. Or construct a new language from scratch. There may be some lessons learned from the D Language backing away from GC. Also, Apple Objective-C retreated from GC as well.


> To make your GC-when-appropriate advice real, one has to show a real implmentation.

I haven't tested it closely, but I believe that Rust with this crate: https://github.com/Manishearth/rust-gc is an implementation for "GC only when you need it".


Rust would be a perfect GC-optional language as, having per-thread heaps, GC using threads can be completely segregated from the non-using ones.


Microsoft's Managed C++ is frankly scary, but its gc root type seems pretty close to this.


Right, but they pay a different cost there that isn't listed: "It's not entirely compatible with the existing language"

private and multiple inheritance do not work, for example (yes, i know some of this is fixable, but some is not)

  public __gc class one { int i; };
  public __gc class two: private one { int h; i = h; }; //error

  __gc class a {};
  __gc class b {};
  __gc class c: public a, public b {}; //will produce an error
etc


Which one (or more) of those three properties does the D language fail to satisfy?


Another aspect against GC is system determinism. Unmanaged runtime is more deterministic and is often faivored in real time applications.


Only if malloc()/free() don't need to call OS APIs, if it happens good luck with being deterministic.


At least with the real time applications that I have experience with, this is often mitigated with custom memory allocators which pre-allocate large contiguous blocks before performance critical sections and then parcels out segments of those blocks at runtime.


Which is also possible in GC enabled languages, so one can make use of a GC and then make use of similar technics for the code path that really requires them.

For example in real time JVMs


One does not need to ever use malloc()/free() or new/delete in C++. For instance, it is very common to use just the DATA and BSS segments in embedded code, and rely on placement-new, which is always deterministic.

Dynamic memory allocation is also a choice in C++.


Going a bit off topic, that is something you can also do in Ada or SPARK, while having a more type safe language.

My problem when I used to code in C++ was working with others in corporate projects.

It didn't matter how I strived to write modern code, making use of best practices to write safe code, there were always team members that never moved beyond C with Classes and nevermind having some processes in place to avoid precisely that.

Even nowadays, I get the feeling most developers don't make any use of static analysis.


Sadly, I don't think that's a problem that a language alone can solve. Ada, for instance, is awesome with its runtime enforcement of design contracts, but all a programmer needs to do is change the contract and chaos ensues.

I'm a huge fan of strong typing. I'm also actively trying to find ways to improve static analysis and formal methods. But, if you can't trust your developers, it all eventually breaks down.

I find that for most mature projects, a good developer needs 3-6 months of ramp-up time, which should include knowledge transfer, training, and restricted commit access. The point of this isn't to haze the developer, but instead to give him/her a chance to fully grok the intent of the mechanisms in the code base and to (hopefully) present and socialize better options in a controlled way. More and more, I've come to the conclusion that a strong team mechanic is one of the mandatory components of good software. DBC, code reviews, unit testing, formal analysis, and static analysis all help to reinforce this mechanism, but if the tribal knowledge of the team breaks down, then so ultimately will the quality of the software produced.


There are real-time GCs that depend on using uniform blocks of memory. All real-time allocators are going to involve similar pain.


They are not free, though, as they trade throughput for a guaranteed latency upper bound.

edit: clarify


A GC API definition is already in the standard. It was added in C++11.


My usual argument is as follows:

There are indeed cases where you have to use GC on some part of your program (as you say, some graph with cycles). However, such things are still only a small part of a program, and most of your memory allocations can still be handled without a GC.

No program language I am aware of (and I don't claim to know all) are good at saying "GC this 5%, I'll explicitly handle everything else". In C++ the "GC" bit is painful, in most other languages the "I'll do everything else" bit is painful.

Of course, there is an argument that just GCing everything is as efficient as bothering with manual memory management, so why bother at all? I'm still to be convinced about that, particularly because GC systems tend to want to use quite a bit more memory, so they have room to breathe.


> No program language I am aware of (and I don't claim to know all) are good at saying "GC this 5%, I'll explicitly handle everything else"

I haven't used rust in production, but I think it allows you to do precisely this, using the following crate: https://github.com/Manishearth/rust-gc.

> Of course, there is an argument that just GCing everything is as efficient as bothering with manual memory management

I specifically said that it should be possible to mix GC with other types of memory management.

The problem is that I often hear from C++ crowd "GC is bad, mmkay!", without mentioning that there are specific cases where you have to use/implement GC.


Where to place a record (struct) in Modula-3:

    VAR
      stackOrGlobalMem: RECORD ... END;

      heapGC : REF RECORD ... END;

      somewhereManual : UNTRACED REF TO RECORD ... END;

      rawMem : ADDRESS;

Also valid for any other data type in the language.

So one can take advantage of GC productivity, while making use of C++ like features for manual allocation when required.

I remember reading a paper with a generics module for using smart references as well.

There are other not so well know GC enabled system programming languages with similar feature sets.

EDIT: like was missing


> Why can't you acknowledge that there are problems that have GC as the only and best solution?

GC is traditionally bundled into a language runtime in a way that imposes global costs, and cannot be opted out of. The GC interrupts your program in a way you have no direct control over and scans the global heap of all objects you have ever created.

C++'s philosophy is zero-cost, opt-in abstractions. So naturally anything that you can't opt out of is going to rub C++ programmers the wrong way.

Implementing GC within the language, in a way that is entirely opt-in, is fine. Any muttering under the breath when discussing this is, IMO, just acknowledging that most of our experiences are of "bad" GC, to the point that we almost don't want to use the same word when referring to "good" GC.


> C++'s philosophy is zero-cost, opt-in abstractions.

It's a nice philosophy, but unfortunately C++ itself often fails to deliver unless the programmer is an absolute expert on the underlying semantics. E.g. forget to pass a vector by reference or as a pointer, and you get a complete O(n) copy underneath. With other data structures, one can implement efficient copy constructors to make this pretty cheap. When an abstraction leaks details of the underlying implementation that then lead to huge non-obvious costs, that is not an abstraction.

Another example is that C++ heap allocation relies on an underlying allocator, which has overhead in managed chunks of memory, size classes, etc. The underlying allocator's alloc() and free() operations are not constant time. In fact, they are almost always outperformed by the allocation performance of GC'd languages, where bump-pointer allocation and generational collection make these overheads very, very cheap.


C++ is an expert's language, no doubt about it. But wielded properly, it does deliver on its promise pretty well, especially with recent language evolution.

C++ lets you use a bump allocator if you want. malloc()/free() are just functions that you don't pay for unless you call them. That is the point! And as others have pointed out, many common data structures let you swap out the calls to malloc()/free() for something else if you want to.


Every data structure in C++ has an allocator override. If you want to use a bump pointer allocator, you can use a bump pointer allocator. In fact, this is a very common optimization in game software during time-critical frame rendering. Allocate a large chunk of memory to act as a flywheel, then use a bump allocator against this chunk of memory while performing data-heavy crunching, then reset the bump pointer at the end of the render cycle. All memory is freed at once, without the use of a more intrusive garbage collector.

As they say, C++ gives you enough rope to hang yourself. It's pretty unapologetic about not being a language meant for everyone. But, sometimes, one needs to drop down to a lower level language to boost performance. I like to apply the Pareto Principle: 80% in a higher level language, and 20% in a language like C or C++.


Sure, we use our own arena allocators with varying degrees of success in my current project (V8--JavaScript VM implemented in >800KLOC C++). The fact that C++ allows you to peer behind the curtain is more its own admission that it is inadequate to meet all use cases. It always allows one to resort to "manual" control. Usually that manual control comes at the cost of literally violating the language semantics and wandering into the territory of undefined behavior. As tempting as this manual control is, my 15 years of C++ versus my 10 years of Java and JVM implementation makes me feel like C++ causes far more trouble than it's worth, especially for projects that have no business at all trying to outdo the defaults.


I think that's mostly your opinion. One can remain very tight to the language semantics and still get a lot of fine-grained control over performance in C++.

No language can meet all use cases. Every language has features that it is best suited for, and features that are a bit of a pain. For languages like C and C++, the pain points are higher-level programming semantics. For languages like Java / JVM, the pain points are fine-grained control and low-level bit twiddling.

I recommend an 80/20 split. Write 80% of your code in a high-level language of choice that mostly meets your goals. Profile the hell out of it, then use that language's FFI to refactor the slow bits in a lower level language like C/C++.

There will be constraints, such as system level programming or embedded system programming, where a language like C/C++/Rust can't be avoided. But for general purpose work, the Pareto Principle works pretty well for balancing time-to-market with performance.


> Usually that manual control comes at the cost of literally violating the language semantics and wandering into the territory of undefined behavior.

What makes you say this?


Along the same lines, the thing that is annoying me now is return value optimization. It's apparently required by the standard in specific circumstances, but there's no (obvious) syntactic indication of whether you are using it.

So I'm sticking with out params for now.


You can implement allocators which are very fast, much faster than generic malloc and free. Then people that do care about this level of performance can write their own ones, optimised specifically for the platform they are using and problems they encounter.


It'd be nice if people actually replied with arguments instead of downvoting.


> C++ community has a strong bias against garbage collection

Well, it's a bit of a selection bias on your part IMO. The people who use C++ have seen the benefits of Java, python, C# and all the rest. And many of them(us) use Java, python, C# and stuff for meta-productivity build tools and the like. GC is great for some enormous set of problem domains. But for problems that cannot endure high or unstable latencies, GC is a bad solution. I'm writing C++ because it hits a sweet spot in the intersection of (low/deterministic latency + ubiquity among developers).

> Why can't you acknowledge that there are problems that have GC as the only and best solution?

There are, but I probably wouldn't have used C++ for them. In some rare cases I would but only because of momentum on an existing legacy solution.


I write C++. I write a lot of it, I've been using it since 1987.

I love LISP and Smalltalk, but I'll never ship a product in those languages. I've written a ton of C# and Java and somewhat less Javascript; those environments are fine (though I found myself paying much closer attention to object lifetime than I wanted).

I'm dead set against garbage collecting C++ because the semantics just don't fit well with the available runtimes. A few times I've written garbage collected heaps for specific types of objects; crossing the boundary between the hand-allocation world of C++ and the dynamic world of a GC environment is not a happy or particularly efficient experience, especially in the presence of threads.


You choose C++ to be able to control performance more than in lots of other languages. In a game engine or other very performance intensive software, people tend to think a lot about memory management. Whether in very rare cases, for a part of a system the optimal solution is something similar to a garbage collector: this might be possible, but quite irrelevant, because the important thing is not the final solution, but the fact that you can (sometimes quite creatively) control and constantly fine tune things, so that you can come up with an optimal solution.


Having been part of the community, but with experience in many other languages another common trait seem to be ignoring that many other languages also have features for writing cache friendly code just like C++, while being safer.

A good example is Modula-3, a systems programing language that while it does have GC by default, also allows for global and stack allocation, value types, bit and structure packing and if really necessary naked pointers (only allowed in unsafe modules).

Or Mesa/Cedar at Xerox PARC with its mix of RC/GC, which is basically the same as C++ _ptr<>() + deferred_ptr<>() just implemented at the language level.

Same applies to RAII, yes it was a very good invention in C++, but many other languages do offer good ways of doing resource management, for example bracket in Haskell or withResource in others.


I think the reason the C++ community has a strong bias against garbage collection as you say is because it has a bias against all sort of unpredictable/unwanted behavior. That's the essence of c++: pay only for what you need/want/use. Garbage collection wouldn't be appropriate in some situations.

So re-implementing your own GC algorithm in the special case where you need it is seen as acceptable. That deffered pointer could be part of the standard library as long as those who don't need to use it don't pay any performance overhead


I don't know if I would acknowledge that there are problems that have GC as the only and best solution. Perhaps there are certain specific memory management problems for which a GC is the best solution. But what if you avoid that entire class of problems by adopting certain programming paradigms, or by the design of your programming language? For instance, Rust ensures memory safety without a GC. And while I write C++, I rarely find myself directly call new -- most of my objects are allocated by STL containers and via stack building.


Rust is horrifically bad at dealing with non-trivial cyclical structures for a number of reasons, not having a GC being one of them.


Beyond "not having GC", what are those reasons? I'm new to Rust and curious...


See http://smallcultfollowing.com/babysteps/blog/2015/04/06/mode... and its linked post; they cover some of the ways you can write general graphs in Rust. It's a bit differently than you'd do it elsewhere.

Of course, if you're willing to give up some safety internally, you can also implement them the same was as you could in C or C++.


Inherited mutability and no aliasing mean cyclical structures tend to be a fairly complicated mess of RefCell/Cells. The borrow checker is very good at handling ordered lifetimes but cannot handle more complicated access patterns without the runtime checks in RefCells.

There are graph libraries, but these only help with structures that are graphs in the strict sense (all edges are created equal and not actually part of the structure) as opposed to structs with one or more fields that reference (and have shared ownership of) another node in the graph.


Right, but then you also shouldn't be using the techniques described in the article, right? To put it another way, there's no case where deferred collection with smart pointers is better than real GC.


Because GC wastes cycles on the user's machines instead of just hiring a competent developer who can manage memory properly. As we move to a more mobile world, wasted cycles matter more. So no. While GC is the definite answer to "how do I allow my dog to make a 'press this button for fart sound' app?" Is it most certainly not the definitive answer (though perhaps a passable one) to "how do I write good fast stable software?"


How does a "good programmer" properly manage cyclical graphs without very specific, static guarantees about how they're used (e.g. All cycles are parent/child)? Because I'd love to know a way to do this without using/implementing the equivalent of a GC.


In addition to what adrianratnapala said, it is perfectly fine to have specialized GCs for specific problems, and I suspect that any moderately complex C++ program in fact does (and no, that's not Greenspuring). The problem is using GC as the general memory management solution when 95% of the allocated objects have a single reference or a simple lifetime patter.

A not well known fact is that even the linux kernel as a GC implementation, used to collect file descriptors, but any suggestion to GCing all memory in the linux kernel wouldn't be well received.


I completely agree, but the OP has been pretty clear in this thread that a GC is never the best solution and I'd like to know his alternative.


In most of the graph problems I have ever dealt with, the graph is constructed piecemeal but destroyed as a whole after you are finished with it.

It is true there are specific problems (e.g git repos) where you want a mutable graph over the long term -- but I have never actually had to deal with those cases, unless it was a special case with a obvious solution (like a doubly-linked list or tree).


Even in the case of a mutable graph, one can use copy-on-write semantics to create a DAG of graph revisions that can be managed or compacted. I believe this was the strategy used by Subversion.

Compaction and collection can both be trivially performed on such a structure by making a deep copy of the latest revision of the graph to a new memory arena, just as is commonly done with a Cheney collector. Old locations are back-patched with pointers to the new locations, which solves the cycle problem.


hey, that's a generational GC :)


Well, it's a Cheney GC. To be generational, there would need to be more than two heaps. But, that's hair splitting. Heh.

I doubt one would find many C++ folks who would disagree that GC is useful some of the time. It's all about controlling when and where GC is used, which sort of hits home with Sutter's point.


What are people using cyclical graphs for? Most of my daily work is in a sad language which has GC and structurally prohibits cyclical references[1], so what am I missing out on?

[1] unless you cheat...


You haven't answered my question. Please re-read it carefully.


I did. You wanted acknowledgement that there are problems for which GC is best solution in C++. I explained why it is never best and likely never anything beyond "barely acceptable".


Do you understand that the linked article, and the talk referenced in the article describe a way to implement garbage collection as a solution to (memory management for) data structures with cycles?

Do you think that this problem is not equivalent to "how do I allow my dog to make a 'press this button for fart sound' app?"?

Do you have a better solution to reclaiming data structures with cyclic pointers?


Pool allocation, and delete the whole thing at the end.

Of course, sometimes (most times ?) you can't do this, and you really need a GC.

There is a reason why TAOCP spends a whole chapter on GC IIRC...


I am not doubting that you are right, but this argument does not quite amount to a proof. That is because you would have to show that there are cases where the lifetimes of the graph's elements cannot be determined from consideration of the semantics of the problem.


This is a very promising solution. If it has a fatal problem (I'm not sure it does), it would be the cost of searching back through the referrer chain to look for a root pointer. Am I wrong in thinking that that will be at least O(n), if not O(n^2)? And n could be extremely large—as large as the number of objects in your program.

To say it another way: what we (C++ programmers, stereotypically) don't like about GC is that it incurs long search costs at unpredictable intervals. That long search cost is not dissimilar in nature (i.e. in algorithm) to the cost of this search. So it's not clear to me that the performance characteristics of `internal_ptr` would be much different from those of GC. I'd like to see the performance characteristics of the solution filled out with more detail.


The author's "internal_ptr" template seems exactly equivalent to Objective-C (and I assume Swift)'s weak references.

And that's great! Automatic reference counting is my single favourite thing about Obj-C. Once you get the hang of using weak refs for parent objects, ARC is very easy to use, with consistent speed, low memory usage, and immediate deallocation of dead objects.

I'd go as far as to say that Objective-C is a big reason why iOS has historically been faster and smoother than Android. It's not the world's fastest language, but there are no big GC pauses and it doesn't waste a lot of memory.

So I bet this is a good memory management style for C++ too.


No, Obj-C/Swift use something more akin to C++'s `shared_ptr` and `weak_ptr`, as evidenced by the fact that cyclic references may result in memory leaks in each system. This is precisely the problem that `internal_ptr` seeks to resolve.


Ahh, rereading more carefully, I think I see. You could have a chain of objects connected by internal pointers, and they'd all be kept alive as long as the first object is strongly referenced, but that wouldn't work for weak pointers.


Really fascinating article. I'm yet to look at the library and Herb's library too but great to have both.

The cppcon YouTube channel is full of videos that will take up my evenings, to my wife's disgust (doubt she wants to watch Bjarne talk).


Such effort would be better spent on figuring out best practices for avoiding data structures with cycles in the first place.


I think you've misunderstood the article. It's talking about the object dependency (pointer) graph, not data structures in a more general sense. The object dependency graph is intrinsically cyclical in the way the OP describes. There's no avoiding it.


No need to be condescending.

The article talks duly about the data structure in question and is perfectly interesting. What it doesn't address is when this data structure comes up in practice and whether it can be handled in different ways.

In many cases, it might be desirable to search for alternative memory representations, since this data structure obviously complicates things considerably (for C++).

For example, a graph is isomorphic to a sparse array, which could avoid embedding bidirectional pointers directly into the graph nodes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: