If your "deterministic amount of time" can tolerate single-digit microsecond pau...

whimsicalism · on Dec 1, 2021

My impression was Go's GC was a heck of a lot slower than "single-digit microsecond pauses." I would love a source on your claim

throwaway894345 · on Dec 2, 2021

I had seen some benchmarks several years ago around the time when the significant GC optimizations had been made, and I could've sworn they were on the order of single-digit microseconds; however, I can't find any of those benchmarks today and indeed any benchmarks are hard to come by except for some pathological cases with enormous heaps. Maybe that single-digit µs values was a misremembering on my part. Even if it's sub-millisecond that's plenty for a high 60Hz video game.

fleventynine · on Dec 1, 2021

If it can really guarantee single-digit microsecond pauses in my realtime thread no matter what happens in other threads of my application, that is indeed a game changer. But I'll believe it when I see it with my own eyes. I've never even used a garbage collector that can guarantee single-digit millisecond pauses.

titzer · on Dec 2, 2021

Have you measured the pause times of free()? Because they are not deterministic, and I have met few people who understand in detail how complex it can be in practice. In the limit, free() can be as bad as GC pause times because of chained deallocation--i.e. not statically bounded.

gpderetta · on Dec 2, 2021

People don't call free from their realtime threads.

alophawen · on Dec 2, 2021

This is true, but for performance's sake, you should not alloc/free in a busy loop, especially not on a real time system.

Allocate in advance, reuse allocated memory.

titzer · on Dec 6, 2021

> Allocate in advance, reuse allocated memory.

In practice, almost all real-time systems use this strategy, even going so far as to allocate all memory at compile time. The first versions of Virgil (I and II) used this strategy and compiled to C.

When doing this, the whole debate of memory safe (automatic) vs unsafe (manual) is completely orthogonal.

cozzyd · on Dec 2, 2021

But you can generally control when free is called.

dpifke · on Dec 2, 2021

Not sure current state of the art, but Go's worst-case pause time five years ago was 100µs: https://groups.google.com/g/golang-dev/c/Ab1sFeoZg_8

elabajaba · on Dec 2, 2021

Discord was consistently seeing pauses in the range of several hundred ms every 2 minutes a couple years ago.

https://blog.discord.com/why-discord-is-switching-from-go-to...

terinjokes · on Dec 2, 2021

Hard to say without more details, but those graphs look very similar to nproc numbers of goroutines interacting with the Linux-of-the-time's CFS CPU scheduler. I've seen significant to entire improvement to latency graphs simply by setting GOMAXPROC to account for the CFS behavior. Unfortunately the blog post doesn't even make a passing mention to this.

randomswede · on Dec 2, 2021

Anecdotally, the main slowdown we saw of Go code running in Kubernetes at my previous job was not "GC stalls", but "CFS throttling". By default[1], the runtime will set GOMACSPROCS to the number of cores on the machine, not the CPU allocation for the cgroup that the container runs in. When you hand out 1 core, on a 96-core machine, bad things happen. Well, you end up with a non-smooth progress. Setting GOMACPROCS to ceil(cpu allocation) alleviated a LOT of problems

Similar problems with certain versions of Java and C#[1]. Java was exacerbated by a tendency for Java to make everything wake up in certain situations, so you could get to a point where the runtime was dominated by CFS throttling, with occasional work being done.

I did some experiments with a roughly 100 Hz increment of a prometheus counter metric, and with a GOMAXPROCS of 1, the rate was steady at ~100 Hz down to a CPU allocation of about 520 millicores, then dropping off (~80 Hz down to about 410 millicores, ~60 hz down to about 305 millicores, then I stopped doing test runs).

[1] This MAY have changed, this was a while and multiple versions of the compiler/runtime ago. I know that C# had a runtime release sometime in 2020 that should've improved things and I think Java now also does the right thing when in a cgroup.

terinjokes · on Dec 2, 2021

AFAIK, it hasn't changed, this exact situation with cgroups is still something I have to tell fellow developers about. Some of them have started using [automaxprocs] to automatically detect and set.

[automaxprocs]: https://github.com/uber-go/automaxprocs

randomswede · on Dec 3, 2021

Ah, note, said program also had one goroutine trying the stupidest-possible way of finidng primes in one goroutine (then not actyakly doing anything with the found primes, apart from appending them to a slice). It literally trial-divided (well, modded) all numbers between 2 and isqrt(n) to see if it was a multiple. Not designed to be clever, explicitly designed to suck about one core.

pbohun · on Dec 2, 2021

I found this go.dev blog entry from 2018. It looks like the average pause time they were able to achieve was significantly less than 1ms back then.

"The SLO whisper number here is around 100-200 microseconds and we will push towards that. If you see anything over a couple hundred microseconds then we really want to talk to you.."

https://go.dev/blog/ismmkeynote

syntheticcorp · on Dec 1, 2021

I believe Java’s ZGC has max pause times of a few milliseconds

dralley · on Dec 2, 2021

Shenandoah is in the the same latency category as well. I haven't seen recent numbers but a few years ago it was a little better latency but a little worse throughput.