
An in-depth look at OCaml’s new “best-fit” garbage collector strategy - testcross
http://www.ocamlpro.com/2020/03/23/ocaml-new-best-fit-garbage-collector/
======
StreamBright
"Remember that whatever works best for you, it’s still better than having to
malloc and free by hand. Happy allocating!"

Nice, they are saying exactly the same as those pesky game developers.

[https://www.youtube.com/watch?v=tK50z_gUpZI](https://www.youtube.com/watch?v=tK50z_gUpZI)

~~~
pjmlp
Yeah, like Tim Sweeney.

"It's interesting that many games can afford a constant 10x interpretation
overhead for scripts, but not a spikey 1% for garbage collection."

[https://twitter.com/timsweeneyepic/status/880607734588211200](https://twitter.com/timsweeneyepic/status/880607734588211200)

[https://wiki.unrealengine.com/Garbage_Collection_Overview](https://wiki.unrealengine.com/Garbage_Collection_Overview)

Which was it again, the engine chosen by Nintendo, Microsoft and Google as
first party to their 3D APIs?

[https://developer.nintendo.com/tools](https://developer.nintendo.com/tools)

[https://docs.microsoft.com/en-us/windows/mixed-
reality/unity...](https://docs.microsoft.com/en-us/windows/mixed-
reality/unity-development-overview)

[https://stadia.dev/blog/unity-production-ready-support-
for-s...](https://stadia.dev/blog/unity-production-ready-support-for-stadia-
now-available/)

[https://developer.android.com/games/develop/build-in-
unity](https://developer.android.com/games/develop/build-in-unity)

The anti-GC crowd on the games industry, is no different than the ones that
fought adoption of C/Modula-2/Pascal over Assembly, and then fought adoption
of C++ and Objective-C over C.

Eventually they will suck it up when the major platform owners tell them it is
time to move on.

~~~
StreamBright
I guess this mentality leads to the current state of play where all UI related
latencies are out of the roof compare to the 80s. At work, I deal with systems
that require 16G of heap as the minimum. Funnily when things get rewritten in
Rust providing the exact same functionality and the same or better performance
the memory requirement goes down 10x (or more). It is up to us how much
garbage our systems producing, how much CO2 is wasted on this. I guess many of
us are ok with it. While some of us are not. [https://blog.discordapp.com/why-
discord-is-switching-from-go...](https://blog.discordapp.com/why-discord-is-
switching-from-go-to-rust-a190bbca2b1f)

~~~
whateveracct
I've seen multiple production systems rewritten in Rust and the consensus of
the developers working on it is that, while the Rust version was more
performant, the majority of that performance is attributable to the rewrite
itself and not the language. And that a performance-focused rewrite in the
original language would have also seen huge performance gains.

As the truism goes, if you require your software to be performant, you must
first make Performance a Requirement.

~~~
StreamBright
Sure I bet they feel this way. However, if you look at rewrites like the one
at Discord[1], it is dead obvious that the rewrite helped not because of the
rewrite by itself but because there is no GC (mostly) and some other smaller
things (better data structures for certain tasks). In my experience working a
lot for the fortune 500 in the last 15 years is that developers are usually
not aware of the low level details of their stack including (but not limited
to): networking, garbage collection, concurrency, parallelism. It is the
exception (mostly in FAANG companies) when these low level details a very well
understood and dealt with. It is no surprise that these companies are picking
up Rust because those developers actually understand how much less mental
overhead is to work in Rust (once it compiles successfully :) ). I know it is
a new language and there are hoops (async/await) but at least we can be sure
that there is no need for GC tuning, memory corruption is non existent and a
few nice additional properties.

In my mind performance is alway a requirement (and a feature). We just stopped
caring a long time ago, because it is easier to think about performance as a
hardware or capacity problem, memory as GC problem and so on.

1\. [https://blog.discordapp.com/why-discord-is-switching-from-
go...](https://blog.discordapp.com/why-discord-is-switching-from-go-to-
rust-a190bbca2b1f)

------
twic
I'm not aware of any other industrial-strength GC using this strategy. Are
there any?

If not, is there something about OCaml that makes this strategy more suitable
than it is for other languages?

If not, is this a case of this being the best strategy they have the resources
to implement, rather than the best possible strategy?

~~~
the8472
I think the hotspot's CMS old gen allocator used best-fit strategy since its
collector didn't compact. But CMS has been deprecated because newer,
compacting low pause collectors have taken over its use-cases while being less
fragile.

~~~
hinkley
If memory serves, the new one uses an extra object header that points from the
old object to the new one during move operations, and any reads of the old
object get forwarded to the new one.

I'm pretty sure that would have not performed well without the aggressive
prediction logic in modern processors.

Java 1's object accesses _always_ read through an indirect pointer, but that
went away in the name of performance, either when Hotspot was introduced, or
on the next round of GC impromevents.

~~~
_old_dude_
They are two _new_ GCs Shenandoah and ZGC.

Indirect pointers or Brooks pointers has it is called were used in Shenandoah
v1 to allow an application thread that perform a read to not move the object
during the evacuation phase. This strategy has been removed in Shenandoah v2
to have a better throughput so now both read and write by the application move
the object during the evacuation phase.

ZGC has never used Brooks pointers.

~~~
MaxBarraclough
I had to look up 'Brooks pointers'. For anyone else in this position, these
two blog posts seem a good place to start:
[https://rkennke.wordpress.com/2013/10/23/shenandoah-gc-
brook...](https://rkennke.wordpress.com/2013/10/23/shenandoah-gc-brooks-
pointers/) , [https://blog.plan99.net/modern-garbage-collection-
part-2-1c8...](https://blog.plan99.net/modern-garbage-collection-
part-2-1c88847abcfd)

------
jjoonathan
Hell is other peoples' algorithmic choices. My GC-fu isn't high level enough
to comment on this one, but I just spent the last two days suffering in
dependency hell because someone thought it would be a good idea to use a full-
blown SAT solver for package management. Grr.

~~~
hinkley
Sometimes I have lamented the fact that 80% of everything we're ever going to
properly discover in Computer Science has already been discovered forty years
ago.

What hasn't been explored very well is how to formulate these solutions so
mere mortals can comprehend how they work. Algorithms accessibility is, I
believe, the limiting factor on building systems any bigger than the ones we
have now. When there is one tricky bit in the code, you can get away with
asking people to dive in and learn it. When there are 50? 100? Just figuring
out the consequences of how those systems _interact_ is a full time job, let
alone how they function internally.

Give me a SAT solver, or a parser, or half a dozen other things, that
decomposes a problem the way a human would do it, just faster and with far
more accuracy, and I could learn it in a week. Take all my Paxoses and replace
them with Raft, then swing by and make a second pass on a few.

~~~
imtringued
The real problems are conventions and ease of use. You're not going to solve
these problems by pointing at a paper from the 80s and shouting "We did this
40 years ago! Why is everyone so dumb nowadays?".

Lets take WebAssembly as an example. It's only reason for existence is
convention. You need everyone to agree on an IR that is not only cross
platform and stable but also low level and allows execution of untrusted code
within a sandbox.

If you look at competitors then it becomes obvious that they are not following
these core principles. LLVM IR just isn't meant to be used in a browser. JVM
bytecode just wasn't meant to be used in a browser. So what are we going to
do? Use them anyway? That's how you get situations like the horribly insecure
JVM plugin. You can restrict the JVM plugin to a subset that is secure enough
for browsers and add additional opcodes for low level behaviors but then you
are no longer using JVM bytecode. It's a completely new invention at that
point but Oracle will still hunt you down.

~~~
hyperman1
This is a bit harsh. JVM byte code was meant to be in the browsers of that
time. Plugins were a respected part of the ecosystem back then. There were 2
types of security problems.

The majority were not bytecode related, but were bugs in the interface to the
outside world. There is no reason why WebAssembly is better in this regard.
E.g. webglvs java' s graphic APIs. This mainly comes from corporate culture
prioritizing security. If financial hardship or other stresses befalls the
browser maker, webassembly will probably give the same trouble as java had.

The minority were related to the soundness of the jvm itself. Most of these
have been fixed. These bugs are in general nasty, as the basics have been
valisated by a mathematical proof. A few are still there and very hard to fix,
like locking system objects like Thread.class I think this was a learning
experience for all secure VMs that follow it, and WebAssembly knew what to
avoid because of Java. Only time will tell how good WebAssembly withstands the
nasty ideas humanity throws at it.

------
hyustan
Surprised not see any talk about slab allocations.

Typical malloc implementations today use slabs:

A variety of allocation-classes is defined; for example, 1B, 2B, 4B, 8B, ...

Each allocation-class is essentially its own independent heap.

Slabs are really good:

Allocation is fast: a few cycles to determine the slab, then pick the first
available cell, done.

Compaction is easy: all cells have the same size!

And I repeat, all cells within an allocation-class have the same size! This
means things like pin cards, etc...

Compared to the pointer-chasing inherent to a splay-tree... I do wonder.

------
fjfaase
What about the following strategy: Find the first space that is large enough.
If it is smaller than double the size of the required, take it. (A little more
space is allocated than would be strictly needed.) If it larger than double
the size, split it. This leaves a piece that is at least as big as the current
size. Assuming that the allocations have some distribution, it is likely that
another piece of memory with this size will be allocated in the future. In
this way, the distribution of available spaces will remain about the same as
the wanted spaces. (Of course, one should also first round up the size to some
power of two and possibly implement a minimum size.)

~~~
judofyr
Sounds similar to a buddy allocator:
[https://en.wikipedia.org/wiki/Buddy_memory_allocation](https://en.wikipedia.org/wiki/Buddy_memory_allocation)

------
adultSwim
Shout out to Damien Doligez, a phenomenal engineer

------
jeffdavis
This reminds me somewhat of the main postgresql allocator. It keeps segregated
freelists for smaller allocations, and then larger allocations are handled by
malloc.

