Hacker News new | comments | show | ask | jobs | submit login

As someone woefully uninformed about these things, why can't GC be implemented as a separate thread, maybe with a lower priority than the primary interpreter? Would a separate thread not be able to count references to objects or something?

Quoting ko1 (https://bugs.ruby-lang.org/issues/8339#note-11):

    > Parallel tracing needs an assumption that "do not move (free) memory
    > area except sweeping timing". Current CRuby does.
    > For example: "ary << obj". Yes, the CRuby's memory management strategy
    > (assumption) is different from normal interpreters.

Isn't memory only shuffled around during compaction? Why not mark/sweep in parallel, then stop-the-world at compaction time?

In a nutshell, generational GCs works by scanning live data. Thus, anything not scanned is garbage. While the application is running, it also allocates new data.

If you allocate an object somewhere --after-- that area has been scanned, the GC will treat that area as garbage, and then you will have memory corruption, leading to segfaults and other nice things.

While parallel mark/sweep is certainly doable (Java does this) it's not easy to get right.

notice that refers to parallel (multiple GC threads) but doesn't rule out concurrent (one gc thread concurrent to the application thread)

Wouldn't two threads possibly run in parallel in a multi-core system?

Concurrency, as I understand it, is dividing things into tasks that can run (to some degree) independantly of eachother.

Parallelism is two or more concurrent tasks running simultaniously.

This if you have one gc thread and one application thread, you have the possibility of them running in parallel.

yes, they can run in parallel.

But in the context of GC (in my understanding) the distinction goes like

* concurrent GC is when (some phases of) the garbage collection happen in a separate thread from the main program.

* parallel GC is when (some phases of) the collection are performed by multiple threads.

So e.g. in a mark&sweep collector you can have a background thread that performs the mark phase concurrently or you can have a parallel collector that uses multiple threads to perform the mark phase.

In MRI you can't do the latter but it could be possible to do the former, according to the thread referenced earlier.

GC needs to know about all references in the program. If the mutator (the program) is running concurrently with the collector, it's difficult (though not impossible) for the collector to construct this consistent view.

Here is a good introductory talk: http://www.infoq.com/presentations/Understanding-Java-Garbag...

Many styles are implemented that way. Check out Java for (many) examples.[1]

Naively, with Ruby's mark-sweep collector[2], you might want to do the sweep phase in parallel, adding garbage blocks to the free list. You could do the mark phase as well, with I believe a pause to make the marks a consistent snapshot.

[1] http://www.oracle.com/technetwork/java/javase/tech/g1-intro-..., for one.

[2] http://tmm1.net/ruby21-rgengc/

One issue is shared mutable state. Typically a GC requires some sort of snapshot of the world state to determine liveness of memory objects.

still, some of the work can be done in parallel I.e. the jvm has had UseConcMarkSweepGC for many many years.

EDIT: what I mean is that there is still a need for locks, but a lot of work can be done concurrently.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact