The same argument would apply to any non-compacting allocator, because the worst case memory blowup due to external fragmentation is huge. But such cases are extremely rarely observed in practice, so people use e.g. standard malloc()/free() implementations or non-compacting garbage collectors without being concerned about that.
In addition, there are plenty of cases where memory usage is unbounded or excessive, not because of allocator behavior, but because of programming mistakes. In fact, memory can sometimes blow up just because of large user inputs and very few systems are prepared for properly handling OOM conditions that happen legitimately.
Case in point: Both CRuby and Apple's JavaScriptCore have garbage collectors that use conservative stack scanning and are widely used in production systems without the world ending.
That said, you're probably not going to use conservative stack scanning because of collection speed alone. There are other trade-offs between conservative stack scanning and precise stack scanning that weigh more heavily.
I'll add the caveat that I would be very cautious about using a conservative GC on 32-bit systems, but on 64-bit systems this is about as much of a concern as memory fragmentation.
> But such cases are extremely rarely observed in practice
After long years of finding problems and trying to encourage, pressure, bribe, cajole and finally yell at people about said problems, my professional opinion is that people aren’t very observant, and either do not see problems or pretend not to see them.
They just reboot the process every 48 hours and walk away.
And sadly continuous deployment is making this worse. The collective We have problems that only show up on three or four day weekends, right when you don’t want to be on call.
As somebody who uses and likes both Kotlin and Python (and quite a few other languages), I'd be cautious with using a subjective term such as "more expressive", too, but I can possibly shed some light on where such feelings come from.
Personally, I see Kotlin as the closest thing to a statically typed Smalltalk that we have among major languages, and that's a major draw.
A key part here is that Kotlin closures are fully-featured equivalents of Smalltalk blocks (up to and including even non-local returns [1]), whereas in many other languages that falls short. Java does not allow mutation of local variables and Python restricts lambdas to normal expressions.
I find code whose behavior can be parameterized by code to be an essential feature of modern-day programming and this should be as frictionless as possible.
This is also a situation where syntax matters, and while it isn't quite as nice as Smalltalk, Kotlin's syntax (esp. with trailing closures) make such code as readable as possible in a brace-style language with minimal additional syntactic noise.
In a similar vein, the functionality of Smalltalk's cascades is offered through scope functions [2], especially `.run {}`.
But ultimately, fully-featured closures (and the fact that they are widely used in the standard library) power a lot of the things that people seem to like about Kotlin.
That does not mean that there aren't downsides. The limitations of running on the JVM are one (e.g. while Kotlin has workarounds for the JVM's type erasure, they're still workarounds), and then Gradle is arguably Kotlin's weakest point (which apparently even JetBrains are seeing, given their investment in Amper).
That said, personally I'd say Kotlin's static typing and performance would be the primary reasons for me to reach for Kotlin over Python, not necessarily expressiveness. Type annotations in Python + mypy etc. just aren't the same experience, and writing performance-sensitive code in Python can be very tricky/hacky when you can't delegate the hot paths to numpy or other existing C/C++/Rust libraries.
Conversely, Python often has a leg up when it comes to fast prototyping and scripting, even with Kotlin Worksheets in IntelliJ IDEA and with kscript.
[1] Which, to be clear, are a nice-to-have thing, not essential, but still impressive that even that was covered, when previously Ruby was the only major language I know of that did it.
sequence { for (x in 0..<10) for (y in 0..<10) yield(x*y) }.toList()
Now, technically, Kotlin doesn't have list comprehensions, only the equivalent of generator expressions in Python, so you have to tack an extra `.toList()` on at the end if you want a list, but you can write pretty much any for comprehension in Python in a similar way in Kotlin.
On the other hand, you're not limited to for loops/ifs inside such a generator, but can use fairly arbitrary control flow.
What happens under the hood is that a `sequence {}` call creates an instance of `SequenceScope`, which has `yield()` and `yieldAll()` methods. When executing the block, `this` will reference that particular instance and `yield()` is essentially `this.yield()` and will call the method on the scope instance.
The actual functionality is then provided by the coroutine system, though a lot of the heavy lifting is done by the optimizer to eliminate all or most of the runtime overhead.
First of all, I recommend giving the paper a read, because I think you're misunderstanding the claim (plus, it is a very good paper). The claim is not that they are equivalent, but that tracing GC and reference counting are dual solutions to the same problem, two ends of the same spectrum if you will, with hybrid solutions existing in between.
Second, what you seem to consider to be fundamental characteristics of tracing GC and RC is not in fact so fundamental.
For starters, RC absolutely can handle cycles (e.g. through trial deletion). Such implementations may be difficult or impossible to implement as pure library solutions, but there is nothing that says it can't be done. The most prominent example of a programming language that uses such an approach is probably Python.
Nor does the claim that tracing GC cannot provide hard performance guarantees in the general case (while RC does) hold up under closer examination. Leaving aside the problem that it's already non-trivial to provide hard real time performance guarantees for malloc()/free() and ignoring the issue of cascading deletions, it doesn't hold under the more relaxed assumptions discussed downthread.
For starters, we have such predictability only for the single-threaded case, not for arbitrary multi-threaded situations. And even in the single-threaded case, there are real use cases where predicting performance becomes functionally intractable. Examples are implementations of binary decision diagrams or certain persistent data structures, where the presence of shared subgraphs of arbitrary size make predicting performance of individual deallocations impractical.
In contrast, in the single-threaded case we can absolutely bound individual operations of a tracing GC by either a constant or (in the case of arbitrarily sized allocations) make them linear in the number of bytes allocated (e.g. Baker's treadmill).
What is true is that in the absence of cycles, (naive) reference counting will free memory at the earliest opportunity, which is not something we can say for tracing GC.
> First of all, I recommend giving the paper a read
I'll put it on my list, thanks for the recommendation, but it really has no impact on my point (see next point).
> because I think you're misunderstanding the claim (plus, it is a very good paper)
Note: I wasn't criticizing the paper. I was criticizing the comment, which claimed "it’s better" to view these as special cases.
If it's not obvious what I mean, here's an analogy: it's the difference between having a great paper that reduces ~everything to category theory, vs. claiming "it's better" for the audience I'm talking to to view everything in terms of category theory. I can be impressed by the former while still vehemently disagreeing with the latter.
> For starters, RC absolutely can handle cycles (e.g. through trial deletion).
"Can handle" is quite the hedge. You "can" walk across the continent too, but at what cost?
> The most prominent example of a programming language that uses such an approach is probably Python.
You're saying Python uses RC to handle reference cycles, and doesn't need a GC for that? If so please ask them to update the documentation, because right now it specifically says "you can disable the collector if you are sure your program does not create reference cycles". https://docs.python.org/3/library/gc.html
> [...] hard real time [...]
Nobody said "real time". I just said "hard guarantee".
> Note: I wasn't criticizing the paper. I was criticizing the comment, which claimed "it’s better" to view these as special cases.
I didn't assume you were. My note about it being a good paper was just a general "this is worth reading" recommendation.
> "Can handle" is quite the hedge. You "can" walk across the continent too, but at what cost?
It's not a hedge. You claimed that (tracing) GC can handle cycles, while RC was "the opposite", which I read to mean that you believe it cannot.
While we are at it, let's go through the basics of trial deletion.
Trial deletion first looks at possible candidates for objects involved in a cycle (in the original algorithm, those were objects whose RC got decremented without reaching zero). Then, you do a recursive decrement of their children's (and their children's children's, and so forth) reference counts.
Unlike with regular reference counting decrements, you visit children even if the reference count doesn't reach zero. The net result is that reference counts are reduced only along internal paths, but that objects that are still reachable from external paths have reference counts > 0 after that.
Thus, any object with a reference count of zero after this step must be part of an internal cycle and can be deleted. All other objects have their original reference counts restored.
Because trial deletion operates on reference counts differently, it's not something that you can easily implement as a library, which is why you don't see it much except when a language implementation chooses to go with reference counting over a tracing GC.
> You're saying Python uses RC to handle reference cycles, and doesn't need a GC for that? If so please ask them to update the documentation, because right now it specifically says "you can disable the collector if you are sure your program does not create reference cycles". https://docs.python.org/3/library/gc.html
This is a terminology thing. Python uses a variant (generational) trial deletion approach [1]. It's not a traditional tracing GC, and it's also not inaccurate, because GC can mean more than using a traditional tracing GC.
> Nobody said "real time". I just said "hard guarantee".
I was not sure what you meant, so I answered both, as you may have noticed.
First, I have no desire to handle both memory and external resources in a unified way, because memory management and resource management have different needs.
Memory is not just "one kind of resource", it's a very specific type of resource that if it has to be managed manually, inherently creates crosscutting concerns. And memory allocation is pervasive, often implicit in other language constructs. Garbage collectors get to cheat here, because they have a global view that ignores module boundaries and information hiding.
The classical example is that introducing a caching mechanism usually introduces API breaks. Where a function normally returns a pointer/reference/unique pointer and makes the caller responsible for freeing memory (whether through convention such as in C or enforced/automated through language mechanisms such as in Rust), the moment you cache it, you need to return a reference-counted pointer, because now the memory can only be freed if both the caller and the cache don't use it anymore. And that change from a non-reference-counted pointer to a reference-counted pointer is a breaking API change.
There are plenty more situations where manual memory management interacts poorly with modularity, such as filter() style functions, or the various complications that arise from closures capturing their local environment.
Conversely, it is absolutely possible to have pretty straightforward resource management with guaranteed and predictable lifetimes in a GCed language (though, alas, there's a lack of direct language support for that).
The general approach is as follows: Each resource's constructor takes an explicit or implicit owner argument (implicit being the current scope, whether defined through a language construct). You can also transfer a resource to a different owner (reparenting) [1].
Owners of resources can be lifetime managers such as scopes (but those do not need to correspond to lexical scopes and are more like transactions), that have more complex lifetime logic (such as a pool of resources) or objects that themselves are owned (e.g. if you have resources dependent on other resources). When the lifetime of the owner finishes, it calls a dispose function in all owned objects.
Because an owner is required in order to construct such a resource object (unlike a C# using clause or Java's try-with-resources) by virtue of its constructor requiring it, it is impossible to accidentally create such a resource without a controlled lifetime [2].
Note that this is not the equivalent to RAII. You can have a number of non-owning references to such resource objects, essentially the equivalent of a weak pointer. In my experience, this is generally a good thing, because you do not want to have a hidden pointer secretly extending the lifetime of a potentially expensive resource. I prefer resource lifetimes to be explicit and to get an error if they are used past their intended lifetime.
Introducing caching is a semantic API change, regardless of the way how memory is managed, so it should be breaking. After introducing caching, now you also accidentally introduced sharing, because two consecutive API calls can return the reference to the same object when earlier they couldn’t. This could lead to a correctness issue.
This is not a problem if the returned object is immutable. If you're returning mutable objects, then that already needs to be documented as part of the AI and not an incidental discovery from the object being reference counted.
In any event, that is hardly the only case of manual memory management breaking abstractions.
It may not be a problem in a language like Rust, where the compiler understands the concept of immutability. But most (if not all) mainstream languages except Rust don’t. Your object may be immutable in January but in February someone makes a change in a class used 10 layers below and suddenly your class is no longer immutable. Add invisible sharing to that and your code explodes.
I will happily pay a small price in breaking some abstractions and get the protection from fuckups like that every single time.
Abstractions understood as „I can change only X without changing Y” (aka GoF OOP patterns or most Clean Code OOP patterns) are overrated anyways. Readability and understandability of code is more important than ability to add something without changing something else. If code is readable and constraints are enforced by the compiler, it is easy and safe to change. Optimizing for ability to change is better than optimizing for not having to to changes. Because changes will happen no matter how beautiful abstraction you make.
> It may not be a problem in a language like Rust, where the compiler understands the concept of immutability. But most (if not all) mainstream languages except Rust don’t. Your object may be immutable in January but in February someone makes a change in a class used 10 layers below and suddenly your class is no longer immutable. Add invisible sharing to that and your code explodes.
No offense, but this strikes me as a strawman argument. What software design methodology leads to immutable types suddenly being made mutable? Outside of dynamic languages, where is support for immutability not there?
Note that any language with proper information hiding can expressly do immutability (literally going back to the days of Modula-2), plus of course several mainstream languages that have first-class language features to represent immutability for added convenience.
Finally, this is only one example. One underlying problem is that if all reference counts need to be capped at one, you have to either switch to full reference counting or to copy the underlying data.
You can see this play out in the std::collections::HashSet interface. Operations like intersection, union, difference, and symmetric difference return iterators rather than sets. While there are also operators that do return sets, such as bitor, that's implemented as follows.
Because the result and the argument can't in general share references to elements, you end up essentially doing a deep copy for e.g. a set of strings, which you want to minimize as much as possible. Thus, limitations on the sharing of references dictate aspects of the API.
> Abstractions understood as „I can change only X without changing Y” (aka GoF OOP patterns or most Clean Code OOP patterns) are overrated anyways. Readability and understandability of code is more important than ability to add something without changing something else. If code is readable and constraints are enforced by the compiler, it is easy and safe to change.
So, you never work with third-party libraries where you do not control the API and have never written libraries to be consumed by other teams/third parties?
Java cannot express immutability. `final` is not transitive, so nothing stops an unrelated change in code to break something that was immutable earlier. Same with Golang.
What happens if a large std::unordered_map<std::string, std::string> has its destructor called?
The maximum number of references is a red herring. While having a RC capped at 1 allows you to elide the actual reference count and makes pointer assignment cheaper, it does not materially affect the primary source of latency in a reference counting implementation, namely cascading deletions.
1. Such data structures (or more generally, std::vector<std::string, std::vector<std::string>> or something like that) are the natural way to represent e.g. dictionaries. So, you absolutely often need to do that and "don't do that" doesn't help here.
2. This general issue extends to virtually all collections. The idea that you should avoid large collections is not a practical solution to real world problems.
3. An alternative solution would be lazy destruction, but that comes with its own issues, such as a really bad worst-case memory overhead or making RC sufficiently more complex that it's not really a win over tracing GC anymore [1].
Still no, because STW GC can pause innocent code in arbitrary unexpected moments.
If you’re pausing the thread because the thread is doing some work eg calling into system to do I/O, this is not considered a pause (even though technically it is, as the thread may be simply paused and waiting for data) but rather just the cost of the operation. If the thread is being interrupted and paused because some unrelated operation has to run - now that’s bad and considered a true pause.
> Still no, because STW GC can pause innocent code in arbitrary unexpected moments.
Well, yes, that's the problem, isn't it? And that's the point I was making. Pauses in the main thread absolutely count for latency purposes. Note that in most stop-the-world GCs you also can have pretty good control over when the GC is invoked, and that doesn't make things better.
The idea that you can always predict when cascading deletes happen in RAII code is also misleading. They can easily be hidden behind an abstraction barrier. Do you exactly know what's happening under the hood in all third-party libraries that you use, just for an example?
> If you’re pausing the thread because the thread is doing some work eg calling into system to do I/O, this is not considered a pause
In real-time scenarios (soft or hard), it absolutely is. "Regular" operations are absolutely part of your latency budget, and if "regular" operations can exceed it, you absolutely have a problem. See e.g. pretty much any of Gil Tene's talks on the subject.
The kernel (as long as it's not realtime) also pauses innocent code in arbitrary, unexpected moments, and modern concurrent collectors like ZGC pause such "innocent" code for no longer than the OS does. These synchronisation pauses are of roughly constant duration (regardless of working set/heap size) and overall introduce unexpected latency to a lesser degree than the freeing work done by refcounting collections.
There is a lot in that statement and it deserves challenge. Note two things: there is no universal right answer here, sometimes you are correct, sometimes you are wrong; my responses will be contradictory (and also contradict things I didn't think to write) each project will need to figure out which of the combinatorical points is important their project to come up with the right also. Not doing this is premature pessimization - getting your data structures wrong up front will force slow runtime on you in a way that no micro-optimization (which is what premature optimization is really referring to) can ever fix and getting out will cost a very expensive rewrite.
> 1. Such data structures (or more generally, std::vector<std::string, std::vector<std::string>> or something like that) are the natural way to represent e.g. dictionaries.
They are natural, and easy but that doesn't mean they are the right way.
Often I find with a little thinking that there is a enum key under it all that I can hard code - and the enum key is also type safe so that the compiler can prove my code is correct against some class of bugs in a way a string as key cannot.
Why are your keys and values std::string which (baring small string optimization) will allocate more? Often you can place a maximum length on one of both that is small enough that you can replace std::string with a struct containing a char array (or std::string_view - I still have to support C++14 so I haven't been able to try it) and avoid a lot of allocations.
Failing that, often the correct data structure is a database (I reach for sqlite first, but there are plenty of options with pros and cons) which is fast lookup, allows for more complex structures easially, it persists the data to disk so that next startup I don't have to spend all the time reconstructing all that data (realistically performance of creating that large data dictionary is of far greater concern that getting rid of it).
> 2. This general issue extends to virtually all collections. The idea that you should avoid large collections is not a practical solution to real world problems.
This article is about systems programming. In systems programming you rarely have such a large dictionary in that format. You often will in something like the filesystem, but there the point is to have the data persisted from disk and typically you only read the subset you care about. You may have a copy in cache (and cache invalidation becomes an issue), but the in memory portion of data itself is never in a dictionary of that format.
> 3. An alternative solution would be lazy destruction, but that comes with its own issues, such as a really bad worst-case memory overhead or making RC sufficiently more complex that it's not really a win over tracing GC anymore [1].
That is one option, there are more.
Intentionally leak that whole dictionary. There is no reason to care if all the destructors run for any of the examples you presented and realistically if you are getting rid of the whole thing you are probably in shutdown anyway so who cares if you leak memory? Many systems programs and libraries do this.
Move the data to a different thread that only handles reclaiming memory and then don't worry about it. (in discussion we talk about making that thread idle time priority so it won't affect performance - but in practice all schedulers I'm aware of do this poorly in some way)
Finally, if after reading all that and consideration all your options (including things I didn't think of!) if you still conclude a large dictionary of strings to strings is your correct answer, then you probably should demand good tracing garbage collector. I'm not against using them where they are appropriate - I'm just arguing that if you think about your data a little more you will discover they are not needed nearly as much as it seems - and this time spending thinking is usually worth it because it results in much faster programs anyway (even if there is one large dictionary in the code how many others did you get rid of?).
> They are natural, and easy but that doesn't mean they are the right way.
> Often I find with a little thinking that there is a enum key under it all that I can hard code - and the enum key is also type safe so that the compiler can prove my code is correct against some class of bugs in a way a string as key cannot.
There's a fundamental misunderstanding here, I think. By dictionaries I mean language dictionaries, e.g. English -> French. You won't find an underlying "enum key" here. Nor am I sure how an enum would ever get large.
> This article is about systems programming. In systems programming you rarely have such a large dictionary in that format.
First, the article is, but the subthread that started this discussion was more general than that.
Second, even in systems programming, you will commonly have arrays or other collections. Caches are a very common thing in systems programming, after all.
> That is one option, there are more.
It is trivially true that there are alternative options, but all these are workarounds around the problem, i.e. you have to complicate your design or make silent assumptions that may not hold in the future (e.g. when disposing of memory at process exit does no longer work because you now need the code as a library).
> By dictionaries I mean language dictionaries, e.g. English -> French. You won't find an underlying "enum key"
That is a niche case not a common one though. There are a lot of niches each either different requirements.
>It is trivially true that there are alternative options, but all these are workarounds around the problem, i.e. you have to complicate your design
This attitude of non-systems programmers is why people argue garbage collection is slow. Garbage collection can be faster, but people who work in garbage collected languages think that solves all problems and so they don't build efficient data structures. Sure they are not leaking memory, but garbage collection is not enough more efficient than reference counted as to make up for thousands of destruction's when the more complex reference counted version only had a couple. Yes the code is more complex, but it is also faster and that is a trade off I'll take.
It was an example meant as an illustration. The general case of "collection of objects" is hardly niche.
> This attitude of non-systems programmers is why people argue garbage collection is slow.
I started programming writing Z80 assembler in the 1980s, counting T-states in order to make hard realtime code work. I wrote a graphics driver for Atari ST/TT hardware not to soon after. I think I have a pretty good idea what working in a real-time and/or resource-constrained environment means.
> This attitude of non-systems programmers is why people argue garbage collection is slow.
That is an incorrect generalization. In fact, I see plenty of inefficient code in C++ and Rust (e.g. because a lot of the workarounds for not having GC require additional copying).
> Sure they are not leaking memory, but garbage collection is not enough more efficient than reference counted as to make up for thousands of destruction's when the more complex reference counted version only had a couple.
This is some really unclear statement. If you're trying to make a connection between absence of (tracing) GC and having value types, they are not inherently related. You can have tracing GC and value types (e.g. C#) or reference counting and a lack of value types (e.g. Python).
What is true is that in general memory allocation is relatively expensive, so you want to avoid unnecessarily allocating objects, but that's true regardless of whether you use malloc()/free() or a garbage collector and the strategies for dealing with that are the same in both cases.
> Yes the code is more complex, but it is also faster and that is a trade off I'll take.
C# has structs with generics/explicit layout/object references/fixed buffers/etc. and byrefs which are special pointers that can point to managed memory, stack or unmanaged one yet are still tracked by GC despite allowing pointer arithmetic on them, and regular C pointers yet successfully uses GC.
Much damage has been dealt to non-JVM languages by stereotypes and perception that the way JVM languages work is the way all GC languages work.
That isn't a choice though. The high performance garbage collector is implemented by the language implementers and comes for free, while the delayed cascading deletion has to be done by the individual programmer not the language designers.
This particular group ("Vulkangruppe") isn't about ecoactivism. They're an extremist, violent anarchist group that have previously attacked objects such as cable ducts or radio masts with the overall aim to disrupt public life.
The above problem is about latency of stop the world collectors in a domain that requires extremely low latency. And if you think that stop the world collections are representative of garbage collection as a whole (i.e. the "bad rap"), this is just not being up to the state of the art. (It's just that generational/incremental collection is hard to impossible without having support for write barriers/read barriers/snapshotting on the compiler or hardware side, which makes that a practical no-go for a language that was never designed to have GC support.)
But in terms of throughput, the BDW GC actually performs pretty well, competitive with or beating malloc()/free() libraries. This is plenty enough for a number of applications, especially when it comes to batch processing. In fact, even the stop the world latency is (combined with parallel marking) plenty good enough for a number of types of regular GUI applications, where you don't have the fairly extreme latency requirements of standard video games.
It is also worth noting that manual deallocation (especially via RAII) isn't a panacea for latency, as that is just as prone to large pauses due to cascading deletes [1].
[1] While in theory you can do that lazily, in this case you're losing timeliness of deletion and may actually have worse worst case latency than a modern GC that can stagger its work to have bounded pause times. The benefit of RAII is that you may be able to plan these deletions, but even that can be surprisingly difficult at times, requiring extra management to keep data structures alive longer than normal. [2]
[2] Note that lazily deleting objects in RC to avoid pause times is not necessarily the answer; for one thing, you're introducing much of the complexity associated with incremental GC again (especially having to do X amount of deallocation work for each allocation), or risking considerable space overhead. https://dl.acm.org/doi/abs/10.1145/964001.964019 It also remains a difficult challenge when you're dealing with objects that aren't of bounded size (e.g. large arrays).
> It's just that generational/incremental collection is hard to impossible without having support for write barriers/read barriers/snapshotting on the compiler or hardware side, which makes that a practical no-go for a language that was never designed to have GC support.
It was demonstrated in the mid 80s that the MMU could be used to implement the write barrier (when I saw that I knew lisp machines were a dead end, though I kept using them for as long as was feasible). You can use this and other compiler support to implement it in a language not designed for GC support, no problem.
The issue I think you meant is that you can’t have a transporting collector (which also compacts your working set, even allowing you to return memory to the system). A language like C++ doesn’t anticipate an object’s address ever changing, so that’s a no go.
I suppose you could write special shared_ptr / unique_ptr that supported address mutation but you’d need a compiler flag to check for an object’s address being taken… but to some degree the destructor paradigm renders a lot of this moot.
> It was demonstrated in the mid 80s that the MMU could be used to implement the write barrier (when I saw that I knew lisp machines were a dead end, though I kept using them for as long as was feasible). You can use this and other compiler support to implement it in a language not designed for GC support, no problem.
Yes, that's why I wrote about support "on the compiler or hardware side" (i.e. MMUs). That said, while this might work in theory, the BDW GC does have support for it, and in practice it doesn't actually work well. Throughput suffers and pause times can actually get worse.
By the way, another alternative way of having incremental collection without compiler support is to use fork() for snapshotting (there's an option in D for that), but that has its own issues, of course.
> The issue I think you meant is that you can’t have a transporting collector (which also compacts your working set, even allowing you to return memory to the system). A language like C++ doesn’t anticipate an object’s address ever changing, so that’s a no go.
No, I wasn't talking about a compacting collector. I very specifically meant generational and/or incremental collection.
The compiler support you need in practice is quite limited. There is an implementation of incremental cycle collection in Rust: https://github.com/chc4/samsara It's made possible because Rust can tell apart read-only and read-write references (except for references to interior mutable objects, but these are known to the compiler and references to them can be treated as read-write). This avoids a global stop-the-world for the entire program.
Cascading deletes are rare in practice, and if anything they are inherent to deterministic deletion, which is often a desirable property. When they're possible, one can often use arena allocation to avoid the issue altogether since arenas are managed as a single allocation.
> The compiler support you need in practice is quite limited.
This is true, but it's absent especially in C. Of course, as long as you can intercept writes and/or reads through pointers, you are good to go in principle. However, that solution may not be efficient.
> Cascading deletes are rare in practice
An array of objects (e.g. std::vector<std::string>) already has unbounded deallocation cost. People mostly think of tree-like structures here, but such structures only add the risk of stack blowout to the lack of an upper bound. But in principle, any container that does not have a size limit can give rise to unbounded pause times.
I have been writing low latency code for a decade and never seen cascading deletes be a problem. GC on the other hand is a constant latency problem for people.
> I have been writing low latency code for a decade and never seen cascading deletes be a problem.
I don't know what your domain is, but the domains I'm familiar with (especially hard real-time) deal with the underlying constraints by way of restricting how you write programs. A side effect of these restrictions is that you generally avoid use constructs that give rise to cascading deletes. But ultimately, having to deal with these restrictions is suboptimal.
I remember a talk by Gil Tene (CTO and co-founder of Azul Systems) about how the problem with writing real-time Java was that it historically required you to write non-idiomatic Java code, which came at a serious cost to reuse and where he argued that one of the benefits of Azul's GC was that you could write (mostly) idiomatic Java and still have GC noise be less than OS noise.
There is of course the actual problem that the stricter your latency requirements are, the fewer GCs actually meet them. Plenty of modern off the shelf GCs meet typical soft real time requirements, but the harder your real time requirements get, the fewer implementations actually meet them (and are increasingly more expensive).
I'll also add that depending on how strict your latency requirements are, off the shelf allocators may not meet them, either, even for a single malloc() (though there are allocators that offer hard upper bounds for allocation costs, such as TLSF).
running a throughput-oriented GC instead of latency/realtime one is a common error of those people (also, throughput oriented ones are usually the default)
For all of the reasons you note above, this is why I've almost exclusively dealt with arena-style allocations of fixed-object-sizes when doing ultra low latency work.
Hi, I'm just starting a (long) PhD in computer music languages, and one of my hoped-for projects is improving the GC for the Scheme implementation I work in for soft realtime use. If you (or anyone else) wouldn't mind sharing suggestion on resources for learning about low-latency modern GCs, it would be most appreciated. :-)
You can look at the SGCL garbage collector for C++: https://github.com/pebal/sgcl. It works in a separate thread, is locks-free and never stops the world.
thanks, that's good to know. I'm working through his first GC book right now and plan to order the new edition of the gchandbook when I'm done the other (which I read was a gentler intro to the topic).
Absolutely not. While it obviously should not be used for use cases that it isn't designed for, as a conservative mark-and-sweep GC it is extremely well engineered and has very good throughput.
In addition, there are plenty of cases where memory usage is unbounded or excessive, not because of allocator behavior, but because of programming mistakes. In fact, memory can sometimes blow up just because of large user inputs and very few systems are prepared for properly handling OOM conditions that happen legitimately.
Case in point: Both CRuby and Apple's JavaScriptCore have garbage collectors that use conservative stack scanning and are widely used in production systems without the world ending.
That said, you're probably not going to use conservative stack scanning because of collection speed alone. There are other trade-offs between conservative stack scanning and precise stack scanning that weigh more heavily.
I'll add the caveat that I would be very cautious about using a conservative GC on 32-bit systems, but on 64-bit systems this is about as much of a concern as memory fragmentation.