
Modern garbage collection: Part 2 - mariuz
https://blog.plan99.net/modern-garbage-collection-part-2-1c88847abcfd
======
gsliepen
> For a video game running at 60 frames per second each frame has 16
> milliseconds in which to complete, so a pause of less than a millisecond is
> not going to be significant relative to other overheads.

This is definitely very significant, you just took away 6% of the CPU time
available to the game to calculate everything necessary for that frame. And
that's if there's only one pause inside each frame. And if you are running
close to 100% CPU time, and the pause happens right before the vsync, you
might miss the vsync and you will drop a whole frame.

This is not only a problem with GC, even with more deterministic memory
management, like plain malloc() and free() in C, you can have pauses if the
operating system needs to go and find some free memory, update page tables,
flush TLB caches and so on. So for games and anything real-time, you probably
want to avoid memory allocations as much as possible, and certainly don't want
your whole process being paused at random. So if you are stuck with GC, then
one question would be: can GC be done for individual threads without blocking
other threads?

~~~
w0utert
True. This is why any serious game running on a VM with GC would avoid any
kind of per-frame memory allocation and use arena allocators instead.

The whole premise of a GC to save you from thinking about memory allocation is
moot if it comes to real-time sensitive tasks, there it makes things harder
instead of easier. A reference counting system is much better IMO, even though
you may be able to get higher absolute throughput with GC, at least the small
overhead you incur is stable and predictable.

~~~
LessDmesg
Reference counting is not better, it's a poor man's GC. Reference counting
means: 1) instruction pollution from all the refcount updates and their
synchronization 2) circular references, ie memory leaks 3) too much time spent
freeing memory (whereas a generational GC spends time on live objects only) 4)
memory fragmentation and hence slow allocation (whereas in a normal GC
allocation is just O(1)

~~~
w0utert
Languages like Swift and Objective-C hide the reference counting for you, so
'instruction pollution' is not really something I care about. The overhead at
the instruction & memory level is fairly minimal for these language too, by
means of using tagged pointers. I'm pretty sure the compiler is smart enough
to factor out reference counting for complete sections where it can determine
objects can impossibly go out of scope (e.g. sections where aliasing can be
ruled out, and all assignments are to locals, such as in many loops).

Circular references can be a problem, this is just something you have to live
with and design for, just as in languages with manual memory management. In
the typical cases where this can be a problem (graphs of objects) its very
straightforward how to fix them using weak references.

I don't understand point 3 and 4 and why they would be a property of reference
counting for memory management. They both seem completely orthogonal problems
that have nothing to do with the mechanisms that decide when to free memory.

Anyway, my original point was not that reference counting is perfect, or even
more efficient compared to garbage collection. Just that it is predictable and
deterministic, which is very often much more important, especially for code
with real-time constraints.

~~~
liuliu
> Languages like Swift and Objective-C hide the reference counting for you, so
> 'instruction pollution' is not really something I care about. The overhead
> at the instruction & memory level is fairly minimal for these language too,
> by means of using tagged pointers. I'm pretty sure the compiler is smart
> enough to factor out reference counting for complete sections where it can
> determine objects can impossibly go out of scope (e.g. sections where
> aliasing can be ruled out, and all assignments are to locals, such as in
> many loops).

New Swift Ownership API can help, but compiler is not that good to figure out
exclusive ownership all by themselves without any annotation. It is common to
have more than 10% of you time in RC environment spent on refcount
calculations and locks acquisition (some stats:
[http://iacoma.cs.uiuc.edu/iacoma-
papers/pact18.pdf](http://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf)).

------
nwallin
> For a video game running at 60 frames per second each frame has 16
> milliseconds in which to complete, so a pause of less than a millisecond is
> not going to be significant relative to other overheads.

First of all: bullshit. Second of all: bullshit.

99th percentile of 1ms means that 1% of the time you're worse than that. The
author says sometimes it's as bad as 8ms. A dropped frame every 15 seconds or
so will make your game unplayable. 99th percentile measurements are useless.
Tell me how often your users are subjected to any given latency.

One millisecond is enormous. My kingdom for a millisecond. If one in ten
frames losses 1ms to a GC and one in a thousand frames looses 4ms, that means
my budget is 12ms down from 16ms. Thanks, you've sent my expected environment
back to 2010.

People think they can just say "99th percentile is bad but not terrible" and
assume that's the end of the argument but that's not how the world works. 99th
percentile frame time happens once every 1.7 seconds. If your web page loads
200 assets (which is basically all of them these days) nearly all page loads
will hit worse than 99th percentile for one of those asset loads.

99th percentile analysis is useless.

~~~
ncmncm
As we say in fintech, a millisecond is an eternity. (Lately, I have been
hearing that a microsecond is an eternity.)

Garbage collection promotion is always, fundamentally, an exercise in
doublethink. How can an intolerable process be made to seem tolerable, so that
my dodgy language which depends on it can be used in place of a mature
language which has a robust mechanism for managing all resources, not just
memory?

It can't. GC is fine for things that don't matter, but things have a way of
coming, in time, to matter. Then you have a Problem.

------
gok
It's telling that of the 15 "things that matter in GC", power consumption
isn't mentioned, even in passing.

~~~
tom_mellior
What should be said about it that's not already covered by CPU and memory
overhead?

~~~
cesarb
Power consumption has some subtle details which make it different from both
CPU and memory overhead. For instance, to reduce power consumption it's
important to reduce wakeups.

~~~
tom_mellior
I'd like to learn more, if you're willing to provide something more than vague
hints.

------
pastrami_panda
Incremental GC is very promising for games in favour of traditional stop-the-
world GC. It's essentially time sliced GC that runs over multiple frames.
Unity supports it and if you tweak if correctly for your needs the GC can run
while the CPU is waiting for the GPU.

~~~
chrisseaton
I don't think there are _any_ GCs that do not stop-the-world at some point,
except those with special hardware support, are there? Even C4 requires in
theory stopping individual threads, and in practice even has a global stop-
the-world phase.

~~~
pcwalton
It's easy to write a GC that doesn't stop the world at all. A Baker treadmill
[1], for example, is a very simple type of incremental collector that can be
implemented quickly. (It's a fun exercise, by the way!) If you implement it
"by the book", it will have no stop-the-world pauses whatsoever (assuming the
mutator doesn't allocate faster than collection can happen).

The problem is that a simple GC like this will have low throughput, so it's
not worth it. These are some of the most common misconceptions about GCs: that
it's hard to write a low-latency GC and that pauses are all that matter. In
reality, GC engineering is _all_ about the tradeoff between throughput and
latency.

[1]:
[https://www.memorymanagement.org/glossary/t.html#treadmill](https://www.memorymanagement.org/glossary/t.html#treadmill)

------
skybrian
It's odd how he goes out of his way to make the Go team's decisions seem
strange just because they're different. What's going on there?

I'm happy that the Go team doesn't want to expose tuning knobs. I've seen a
lot of people fiddle with JVM settings without doing the controlled
experiments needed to see if it actually helps on a particular machine and
I've done that myself. It ends up as cargo-cult programming, like people
sharing magic JVM settings on the wiki to allegedly make IntelliJ faster. (It
worked for one person!)

~~~
apta
> I've seen a lot of people fiddle with JVM settings without doing the
> controlled experiments

That's a problem with what the people are doing then, not the JVM.
Furthermore, the new low latency JVM GCs only have 2-3 knobs to tune.

golang likes to pretend that complexity doesn't exist, and goes for the most
simplistic approach, at the cost of things like throughput, code size, speed,
code maintainability, etc. The JVM is suited for a much wider range of tasks.

~~~
hu3
That's not the impression I get from Go literature at all.

Go authors tend to be quite humble about Go being targeted mostly to a
specific class of software, read servers. And Go is pretty darn successful at
it.

The lack of GC knobs is an informed decision within that context.

~~~
apta
"Servers" refers to a broad range of software. If you're writing a simple app
that parses JSON and handles REST, it can be an ok fit. Now if you're writing
server software that needs to be high-throughput, then that's where you hit
golang's limitations.

~~~
hu3
High throughput servers you say? I'll just leave this here:

> How We Built Uber Engineering’s Highest Query per Second Service Using Go

[https://eng.uber.com/go-geofence/](https://eng.uber.com/go-geofence/)

And that was in 2016. Go's GC performance characteristics improved quite a bit
since then.

~~~
apta
This doesn't negate the fact that I stated. You can throw more hardware at the
problem to reach higher throughput if your problem domain allows for it (such
as the use case you link to). It goes without saying that this is an
inefficient approach, not to mention that this won't apply if you're running
batch jobs for instance where you need high throughput (e.g. on individual
nodes).

------
ryanseys
I was hoping this was about the infrastructure and systems around collecting
human garbage in the modern world.

~~~
symplee
Anyone have any details on recent developments in the field?

For example:

    
    
      Reclaiming raw materials from landfills
      Machines/robots that can sort recyclable materials from a heap of mixed garbage.
      Better insulated landfills

~~~
mzi
In the area in Stockholm, Sweden where I live we throw "all" garbage (we of
course separate glassware, metal, paper, electronics first) in the same bin,
but food waste in green bags. I don't know if it's robots that does the final
sorting but I would believe so. Less than 1% of household trash ends up in
landfills here.

------
brighteyes
The technical detail in this article is excellent! A great read.

But I think it would have been an even better article without the negativity
about Go and how the author thinks "the Java guys are winning" in his words.
That felt a little petty.

~~~
randomidiot666
Why? Did he hurt Go's feelings? It's a programming language, not a child. I
prefer his honest assessment of their relative strengths and weaknesses.

~~~
uluyol
Except that it isn't necessary nor is it completely honest. Go and Java take
different approaches here, but the article focuses on the merits of the Java
approach and the downsides of the approach taken by Go.

Example 1. The article talks about compaction and generational collection as
being Good Things(TM), but it doesn't talk about the costs associated with
them. Looking at the linked Go article, these approaches suffer from high
write barrier overhead. For Go, this isn't worthwhile because escape analysis
allocates many young objects on the stack (which btw is effectively bump-
pointer allocation) so trying to further reduce GC overhead by increasing the
overhead of every pointer write is just not worth it. It may, however, be the
right trade-off for Java.

Example 2. Java's many tuning parameters means that programmers who care about
performance have to choose the right GC and tune it. If better GCs come out or
tweaks to the algorithms are made, these configurations have to be updated. In
contrast, Go programs gets these benefits for free. The best approach seems to
be to offer a small number of high-level knobs, but it's hard to determine
what those are, leading to the two (suboptimal) extremes you see with Go and
Java.

~~~
pcwalton
These are common misconceptions.

> Example 1. The article talks about compaction and generational collection as
> being Good Things(TM), but it doesn't talk about the costs associated with
> them. Looking at the linked Go article, these approaches suffer from high
> write barrier overhead.

You need a write barrier no matter what for any sort of incremental or
concurrent GC, to maintain the tricolor invariant. Otherwise there is no way
for the runtime system to detect a store from a black object to a white
object. Typical GCs will fold the write barrier needed for generational GC
into the write barrier needed for incremental/concurrent GC, so there is no
need for extra overhead if properly implemented.

> For Go, this isn't worthwhile because escape analysis allocates many young
> objects on the stack (which btw is effectively bump-pointer allocation)

Java HotSpot has done the same thing for a long time! It's just that in
HotSpot escape analysis doesn't really help allocation performance, because
the generational GC already offers bump allocation in the nursery. Escape
analysis in the JVM does open up more optimizations, though, because it serves
as the scalar-replacement-of-aggregates transformation.

> so trying to further reduce GC overhead by increasing the overhead of every
> pointer write is just not worth it.

This is only because of their specific implementation. There is no need for
increased overhead.

> If better GCs come out or tweaks to the algorithms are made, these
> configurations have to be updated. In contrast, Go programs gets these
> benefits for free.

There is no reason why Java can't do the same by updating defaults. In fact,
they often do.

~~~
randomidiot666
>> If better GCs come out or tweaks to the algorithms are made, these
configurations have to be updated. In contrast, Go programs gets these
benefits for free.

> There is no reason why Java can't do the same by updating defaults. In fact,
> they often do.

Correct. The JVM guys always update the default GC to be the nearest to 'one
size fits all'. Obviously if you've made a custom GC configuration then you
want a level of tuning that Go does not provide.

