

New release of Memory Pool System Garbage Collector - BruceM
http://mailman.ravenbrook.com/pipermail/mps-discussion/2012-September/000118.html

======
zbowling
This line from the readme made me giggle a litte:

> The MPS has been in development since 1994 and deployed in successful
> commercial products since 1997. Bugs are almost unknown.

Then half way down the page is a list of known bugs.

~~~
rptb1
I should probably clarify that little bit of hyperbole. We do track known
issues (of course), but bugs in production are extremely rare -- about one per
year.

------
eternalban
@rptb1:

MPS appears to have been designed in a single core era, and it would be great
if you would address any architectural changes that were made to address the
current prevailing multi-core platforms.

/tia

[edit: removed ref to the azul q]

~~~
rayiner
MPS is thread safe, but doesn't do concurrent collection. Not really that
different than most GC's besides the JVM's.

~~~
rptb1
This is a really interesting topic and I could fill your screens with a wall
of text. It's true that the MPS currently does not collect concurrently,
however the only thing that makes it not-concurrent is a critical point in the
Shield abstraction where the MPS seeks to gain privileged access to memory
(usually in order to scan it for GC). The critical point is where ShieldExpose
in shield.c has to call ShieldSuspend to preserve the shield invariants. This
is the _only_ point in the MPS that prevents concurrency, and the rest of the
MPS is designed to support it.

The restriction could be removed if either:

    
    
      * the MPS could use a different set of protections to the mutator program
    
      * the mutator program uses a software barrier
    

The first one is tricky, and the second one just hasn't come up in any
implementation we've been asked to make yet. Given a VM, it could happen, and
the MPS would be concurrent.

So, I believe there's nothing fundamentally non-concurrent about the MPS
design. It's kind of waiting to happen.

[I've put this in a comment in the code. Thanks for making me write it down
:)]

------
rwallace
Is there any documentation that says what this does/how it works? The
documentation on the web site seems to be for people already familiar with the
code.

e.g. how does it compare with the Boehm collector?

~~~
pcwalton
Looks like it's a precise GC, not a conservative GC like Boehm. That means
that your language has to provide a list of roots manually, while Boehm
discovers them automatically. The downside is that Boehm can misidentify
pointers (and it also has to jump through extra hoops to achieve incremental
and generational collection, while in MPS it should be more straightforward).

~~~
rptb1
It's both a precise and conservative GC, depending on what you declare to it,
and which pool classes you use. In commercial deployment and in Open Dylan
it's a mostly-copying GC (i.e. a mixture of both) see
[http://www.memorymanagement.org/glossary/m.html#mostly-
copyi...](http://www.memorymanagement.org/glossary/m.html#mostly-
copying.garbage.collection)

------
pwpwp
This includes an example Scheme interpreter, so it's finally possible to see
MPS in action.

~~~
BruceM
OpenDylan (<http://opendylan.org>) has been using MPS since MPS was first
created. (Both were created at the now defunct Harlequin.) We've been working
on reviving OpenDylan over the last year as well and this MPS update is a big
boon for us.

------
nickmain
Looks like a viral license that would prevent this being used in any closed-
source applications:
<http://www.ravenbrook.com/project/mps/master/license.txt>

~~~
BruceM
This seems pretty clear to me:

    
    
        If the licensing terms aren't suitable for you (for
        example, you're developing a closed-source commercial
        product or a compiler run-time system) you can easily
        license the MPS under different terms from Ravenbrook.
        Please write to us <mps-questions@ravenbrook.com> for
        more information.
    

(And you can see from the exception that we got for OpenDylan that such things
are possible.)

~~~
nickmain
Do those terms cost money ?

I was excited to use MPS in the runtime of an open-source compiler, but if
users of the compiler had to pay in order to use it for closed-source apps
then that would be a big turn-off.

~~~
rptb1
Perhaps you could write to mps-questions and we can have a chat about it.

------
aidenn0
Alright, now that I found this, my last excuse for not writing my own CL
implementation is gone.

------
ryanpers
how does MPS-GC compare to low pause, no global pause GC systems like GPGC
from Azul?

~~~
rptb1
I can't give you figures right now, but the reason we have commercial clients
for the MPS is that we have extremely low pause times. Side-by-side
comparisons are expensive to arrange. I'm working on that :)

~~~
ryanpers
One of the largest problems with the sun-jvm GC implementations is they have
global pause times to rewrite pointers. This is (in part) because there is no
read barrier, and thus all threads must be paused to be 'fixed' after object
moves.

Additionally with CMS, there is global heap fragmentation and recompaction
which can take literally minutes on a large (8gb+) heap or slow machine.

Setting aside exact figures, can you comment on the above challenges and
issues?

~~~
rptb1
Sure. The MPS approach that's deployed commercially is to use an unfashionable
hardware read barrier to amortize the cost of the pointer rewrite, allowing
the heap to be compacted incrementally. There's nothing that takes minutes or
even seconds -- the cost is spread through the entire execution. In some
sense, all this can take "minutes" of real time, but not at the cost of
stopping the program doing stuff. The MPS design has always been to spread the
cost evenly, not push it into a cataclysmic event in the future.

That said, we have an abstract framework that could be made to work in other
ways, depending on requirements. It wouldn't be much work to hook in a write-
barrier-only pool class, etc. etc. and have it co-operate. However, most of
the development effort so far has gone on the read barrier approach.

As to how this effects overall run-time, well, that's where we'd have to
arrange a side-by-side comparison, and make sure it included one of those
compactions :P

~~~
ryanpers
I am not really sure the current state of things, but to "get real" I feel any
GC needs to be able to handle:

\- multicore/multithreaded \- heap sizes of 20-200GB and larger ideally

Think about it this way... We have the same goal, of diminishing the use of
manually allocated ram to the smallest possible place. Think of the GC vs
malloc as compilers vs assembly language. Assembly has it's place, but it's no
longer because compilers cannot generate efficient object code in a vast
number of circumstances. Lets do the same for GC!

~~~
rptb1
I think we'd need a more careful definition of "handle" there!

And we're definitely not proselytizing garbage collection here. The MPS is a
framework for both manual and automatic memory management (and co-operation
between the two). One of our main high performance commercial applications is
all about the manual management, and for very good reasons.

But nobody should be rejecting GC out of hand, that's for sure.

