The Metronome GC from IBM Research (http://www.research.ibm.com/people/d/dfb/papers/Bacon05High....) also uses a read-barrier to fix up pointers to objects which have moved.
Although, I don't think it does marking in the read barrier.
More in general, putting things off frequently means being able to handle them without interruption when the time is right; it's only a problem if the right time doesn't come up frequently enough (too busy). And context switching is pretty much guaranteed to be expensive in every circumstance, be it computers, your life, or a country.
Back to computers again, this would mean that the problem is being too busy, not that you put it off - they'll likely take about the same amount of time no matter when they're handled; if anything, they're more likely to be faster if done all at once rather than bit by bit, due to cache behaviors. At that point you know you have other, larger problems.
On topic: so they traded some large unexpected pauses for a lower throughput rate. I like the lazy object remapping idea. I usually like stability over increased throughput and this scheme seems to provide just that.
A compiler of course! :) [And then once you have your compiler, like Click did, GC becomes an obvious next hack!]
But it's hard to recommend a vendor who's opaque about pricing. If we have to ask, we probably can't afford it.
The remaining problem I see is cache-trashing because on every minor page-fault you have to run a part of your mark&sweep algo, surely kicking out all the application data&code from caches. That's inevitable less efficient than a stop-the-world approach.
On the other hand, did you know you can turn off the garbage collector on CPython?
From what I gathered, when they compact a page of memory, moving all the objects within it to different locations, they will set a marker on all pointers to be "unset". Then, while program execution is still going on, the GC thread will be busily going through the pointers and correcting them to their new locations as necessary, then setting the marker flag. If, during this period, the executing code tries to use an unmarked pointer, a "read barrier" is hit in the VM, and the GC code corrects that pointer ("self-heals"), sets the marker, then allows execution to continue.
Do I have this right? What about the initial unsetting of all these markers? It would seem to require going through all pointers before you want to compact a page, and I would suspect they're being more clever than that.
What I'd like to know is how you can guarantee that all garbage is eventually collected in a system like this, and how you can guarantee that you've fixed up all pointers to an unmapped virtual page so you can reuse it.
Maybe they don't.
amd64 effectively has a 48-bit addressing limit. You can unmap 1,000 4K pages each second for over two years before you need to reuse a page address.
The purpose of the read barrier is to allow the program to keep running while collection is happening. It lets the VM trap access to the parts of memory that the GC is working on, and do little bits of the collection algorithm in the program threads so they don't have to stop and wait for the GC thread to finish what it's doing.
If the program modifies an object that's already been scanned during this mark phase, does the GC have to go back and re-scan that object? If so, then how can you guarantee the mark phase will complete without stopping the program at some point?
From the interview: "We will find all the live objects in the heap in a single pass. We will never have to revisit any reference." So, no the GC never has to revisit any object, and it completes when it's scanned all live objects.
The read-barrier is what prevents the program from interfering with the marking phase. That bit about unmapping virtual memory happens later when marked objects are being moved. As the GC walks through memory it's marking references by setting (or clearing) a bit in the pointer. Meanwhile the program goes about modifying objects by copying references - first reading them into a register and then writing them back out to the object's storage on the heap. The read triggers the read-barrier. If the reference being read has already been marked, great. If not, the trap handler marks the reference before allowing the program to continue. The write operation then writes a marked reference on to the heap, so no need for the GC thread to return to that object.
> What I'd like to know is how you can guarantee that all garbage is eventually collected in a system like this.
Me too. If you can't guarantee that, it seems that when you unmap a page of virtual memory, you'd have to make sure never to use that page again. You'd also have to keep around your table of "mappings from old pointers to new pointers" forever, just in case you encounter a lingering bad pointer and need to correct it.
The hard part has always been dealing with reads, (which are much more common and expensive to put a software barrier around), and Azul has quite brilliantly figured a way to handle this both in their specialized hardware, and now their VM.
Gil Tene: At Azul, we've been building high performance Java virtual machines (JVM)s for the past eight years now. We started the company in 2002, and we aimed to solve several scalability and infrastructure problems for Java back then....
Anyone else really hate then people don't actually answer the question? They should have edited that useless part out.
Azul continues to impress.