
A Hardware Accelerator for Tracing Garbage Collection - andrekandre
https://ieeexplore.ieee.org/document/8695831
======
Causality1
This is the paper linked in
[https://news.ycombinator.com/item?id=19864830](https://news.ycombinator.com/item?id=19864830)

------
BubRoss
I'm skeptical that cpu usage is really the problem with multicore CPUs. I
would think it is much more about concurrency.

I'm also a bit surprised there isn't more of a transition to modern C++ for
problems where so much time is spent fighting the garbage collection.
Ownership and memory management in C++11 and up really isn't a big deal. Keep
your allocations to a minimum, make sure you are moving instead of copying.
Shared pointers are there, but I actually never use them.

~~~
mruts
Isn’t that the same with as C? You can use alloca to allocate on the stack,
giving you the same benefit as RAII (I think?). I haven’t used C++ so maybe
I’m speaking out of my ass though.

But regardless, you could use the Boehm GC for objects you don’t feel like
tracking the lifetime of.

~~~
monocasa
RAII can help you protect heap memory (and other resources besides memory)
because it's more about registering code to run on the end of object lifetimes
rather than just the memory being deallocated. Alloca isnt really a great
option for async flows for instance, to say nothing of file descriptors, or
abstract concepts like 'the idea of this mutex being locked'. Alloca is also
terrible for memory passed on to other threads obviously..

Albeit most of that is moot when compared to a GCed language like the global
discussion.

------
cryptonector
This is one of those cases where active memory might prove extremely useful.

Imagine store instructions that can tell memory some metadata about what's
stored. "This is a pointer in an object at offset O to another object". "This
is an object with a header you can use to paint objects". "This is no longer
an object or a pointer". Also imagine instructions like "clear all object
paint", "do a one layer trace", and "report unreachable objects". The one-
layer trace would have every cell that has a pointer paint the pointed to
object with either a "reachable by a reachable object" or "reachable by an
object whose reachable state is unknown".

Then GC could proceed as follows: clear all object painting, use the CPU to
scan roots to the first objects on the heap and mark those reachable, do two
runs of one-layer traces, report unreachable objects and reclaim them.

Something like that.

~~~
monocasa
The thermal budget of something like that would be bananas I would think.

~~~
cryptonector
It needn't all happen at once. Memory talking to memory is still subject to
bandwidth limits. It would take time, but not CPU, and it should run faster
than a CPU could do it.

------
exabrial
Anyone have some info about how the Vega from Azul did gc acceleration? Came
up in a discussion awhile back and we couldn't seem to find any technical
details.

~~~
monocasa
Mainly boils down to write barriers with interesting semantics, and an MMU
that let's you in user space update a subset of the permissions without a
transition to user space so you can quickly mark pages as "under GC migration"
and know you'll get traps if another thread/core tries to modify the
references on that page out from under you.

------
tyingq
Would CPU support for tagged memory make this easier?

------
0815test
"Those who don't know Lisp Machines are doomed to reinvent them, poorly."

~~~
monocasa
Eh, this is pretty different than Lisp Machines of old.

Most of the minicomputers they we're competing with were more or less true von
Neumann architectures without caches. That meant that one those systems the I
fetch of the interpreter itself was constantly competing for memory bandwidth
with the actual Lisp code it's interpreting and it's data. By pushing the
interpreter into microcode, you've created a Harvard arch where the
interpreter can fetch out of uCode entirely and not pollute the main system
bus. IMO this is why Lisp machines dies off coinciding with the usage of
caches (particularly I caches) in those kinds of computers. At that point the
hot path of the interpreter can totally live in I cache, so it's not polluting
system bandwidth the same way anymore, and now the biggest reason in favor of
a dedicated lisp machine is moot.

This work in the article is actually really cool and goes way beyond any lisp
machines by off loading GC to another independent unit that runs in parallel
to the main CPU, interleaving into the regular system memory bandwidth in a
clean way.

------
stephc_int13
Memory allocation management should not be a big deal, and it is not the kind
of problem that can easily be solved at the language level.

Correct memory management needs information on the allocation context, in some
cases, it can be handled automatically by a framework (like a game engine) or
manually by the coder, but it is really cumbersome to force this into the
language...

In my opinion, garbage collection is a bane.

~~~
mruts
Functional programming can’t really exist in any meaningful way without GC. If
you’re not a FP developer, it’s easy to think that memory allocation falls
into two categories: a) short-term objects that can be allocated on the stack,
or b) long-term objects that the lifetime of can be tracked easily. But with
FP a lot of short-term garbage is created that isn’t known at runtime, thus
necessitating GC.

I program professionally in Scala and without GC, the paradigm of strong FP
would be (almost) impossible. Fortunately, the JVM handles this pretty
exceptionally.

I think a GC in hardware is a fantastic idea and hope Intel and others
incorporate it into x86-64. I don’t know how practical this is as I’m not a
hardware guy, but a hyper-efficient pauseless concurrent GC that could keep
garbage out of the cache sounds like a massive win in my book.

~~~
weberc2
> that isn’t known at runtime

Typo: s/runtime/compile time

Also, this isn't particular to FP; every style of programming requires dynamic
memory management for real-world programs (modulo pedantry about embedded
programming).

