If your "deterministic amount of time" can tolerate single-digit microsecond pauses, then Go's GC is just fine. If you're building hard real time systems then you probably want to steer clear of GCs. Also, "developer velocity" is an important criteria for a lot of shops, and in my opinion that rules out Rust, C, C++, and every dynamically typed language I've ever used (of course, this is all relative, but in my experience, those languages are an order of magnitude "slower" than Go, et al with respect to velocity for a wide variety of reasons).
I had seen some benchmarks several years ago around the time when the significant GC optimizations had been made, and I could've sworn they were on the order of single-digit microseconds; however, I can't find any of those benchmarks today and indeed any benchmarks are hard to come by except for some pathological cases with enormous heaps. Maybe that single-digit
µs values was a misremembering on my part. Even if it's sub-millisecond that's plenty for a high 60Hz video game.
If it can really guarantee single-digit microsecond pauses in my realtime thread no matter what happens in other threads of my application, that is indeed a game changer. But I'll believe it when I see it with my own eyes. I've never even used a garbage collector that can guarantee single-digit millisecond pauses.
Have you measured the pause times of free()? Because they are not deterministic, and I have met few people who understand in detail how complex it can be in practice. In the limit, free() can be as bad as GC pause times because of chained deallocation--i.e. not statically bounded.
In practice, almost all real-time systems use this strategy, even going so far as to allocate all memory at compile time. The first versions of Virgil (I and II) used this strategy and compiled to C.
When doing this, the whole debate of memory safe (automatic) vs unsafe (manual) is completely orthogonal.
Hard to say without more details, but those graphs look very similar to nproc numbers of goroutines interacting with the Linux-of-the-time's CFS CPU scheduler. I've seen significant to entire improvement to latency graphs simply by setting GOMAXPROC to account for the CFS behavior. Unfortunately the blog post doesn't even make a passing mention to this.
Anecdotally, the main slowdown we saw of Go code running in Kubernetes at my previous job was not "GC stalls", but "CFS throttling". By default[1], the runtime will set GOMACSPROCS to the number of cores on the machine, not the CPU allocation for the cgroup that the container runs in. When you hand out 1 core, on a 96-core machine, bad things happen. Well, you end up with a non-smooth progress. Setting GOMACPROCS to ceil(cpu allocation) alleviated a LOT of problems
Similar problems with certain versions of Java and C#[1]. Java was exacerbated by a tendency for Java to make everything wake up in certain situations, so you could get to a point where the runtime was dominated by CFS throttling, with occasional work being done.
I did some experiments with a roughly 100 Hz increment of a prometheus counter metric, and with a GOMAXPROCS of 1, the rate was steady at ~100 Hz down to a CPU allocation of about 520 millicores, then dropping off (~80 Hz down to about 410 millicores, ~60 hz down to about 305 millicores, then I stopped doing test runs).
[1] This MAY have changed, this was a while and multiple versions of the compiler/runtime ago. I know that C# had a runtime release sometime in 2020 that should've improved things and I think Java now also does the right thing when in a cgroup.
AFAIK, it hasn't changed, this exact situation with cgroups is still something I have to tell fellow developers about. Some of them have started using [automaxprocs] to automatically detect and set.
Ah, note, said program also had one goroutine trying the stupidest-possible way of finidng primes in one goroutine (then not actyakly doing anything with the found primes, apart from appending them to a slice). It literally trial-divided (well, modded) all numbers between 2 and isqrt(n) to see if it was a multiple. Not designed to be clever, explicitly designed to suck about one core.
I found this go.dev blog entry from 2018. It looks like the average pause time they were able to achieve was significantly less than 1ms back then.
"The SLO whisper number here is around 100-200 microseconds and we will push towards that. If you see anything over a couple hundred microseconds then we really want to talk to you.."
Shenandoah is in the the same latency category as well. I haven't seen recent numbers but a few years ago it was a little better latency but a little worse throughput.