
New Garbage Collector - dochtman
http://wiki.luajit.org/New-Garbage-Collector
======
pcwalton
This is very interesting, because it's extremely similar to the design that we
need for Rust. We need:

* A generational GC, preferably bump-allocating so that we can win on allocation-heavy workloads.

* A non-thread-safe GC, because all GC'd data is local to a single task in Rust (and we enforce this in the type system).

* A precise GC, because (a) we don't want accidental or malicious leaks due to misinterpreting non-pointers as pointers; and (b) because we need precise GC metadata to clean up shared data and run destructors when tasks fail.

* An incremental GC, because minimizing pause times is absolutely critical for browsers.

* A non-copying GC, because LLVM doesn't currently support copying GC satisfactorily (i.e. when allowing pointers to live in registers). Fixing this would require a significant portion of all target code to be rewritten.

Also of particular interest is the ability for individual allocation sites to
choose to bump allocate only, in the interests of performance (but at the
expense of fragmentation). This could potentially allow Rust programmers to
fine-tune their allocation behavior and trade performance for fragmentation
when it makes sense to do so.

Very cool, and I'll be sure to keep an eye on this.

~~~
Raphael_Amiard
> A non-copying GC, because LLVM doesn't currently support copying GC
> satisfactorily

This is a shame that this is so badly documented. I stumbled upon this way
after having coded the whole code generator of my Z3 project
(<https://github.com/raph-amiard/Z3> \- Ocaml bytecode -> LLVM compiler). The
fact that you have to explicitly handle live roots with stacks instructions
defeats a lot of the purpose of LLVM IR.

~~~
pcwalton
I've got a bunch of code in a branch to allow register roots:
<https://github.com/pcwalton/llvm/tree/noteroots-ir>

It just automatically roots live values that are pointers in a nonzero
addrspace. Still doesn't support copying GC, and only works in the fast
instruction selector at the moment, but it goes a long way toward making GC
performant in LLVM.

~~~
cwzwarich
The fast instruction selector only supports some constructs, whereas others
are handled by fallback to the standard SelectionDAG instruction selection. In
order to use this, do you have to ensure that you never trigger any of these
fallbacks?

~~~
pcwalton
Yes.

This is just for now, as fast isel is much easier to work with. I have other
branches that implement most of the SelectionDAG support needed as well, but
they aren't updated to the most recent work.

------
Raphael_Amiard
This is great news ! I don't know if Mike Pall reads Hacker News, but a couple
ideas come to my mind :

\- May it be a good idea to create a KickStarter page ? I know LuaJit has its
own donation mecanism, but it may help it gain further exposure. I know i'd be
willing to donate for a better GC in either case.

\- Mike mentions it has to be a non-moving collector, which is usually known
for having worse performance than moving collectors. Also, a non-moving
generational collector (and incremental, since this is also a requirement) is
hard to get right.

I tried implementing a GC along the lines of this paper :
[http://www.pllab.riec.tohoku.ac.jp/papers/icfp2011UenoOhoriO...](http://www.pllab.riec.tohoku.ac.jp/papers/icfp2011UenoOhoriOtomoAuthorVersion.pdf)
which presents a fast non moving collector. Incidentally it has ideas similar
to those presented in this article. Maybe it could be a good source of
inspiration ? I realize the ideas are maybe not as production grade since
there is only one language implementation with a GC of this sort.

~~~
mikemike
Yes, I thought about crowdfunding. But I fear that a) the Lua community is way
too small to gather enough interest and b) it looks like the whole
crowdfunding idea is rapidly deteriorating into an arms race of marketing
experts. So many people are now jumping on that bandwagon. You'll never make
it, unless you stay on the frontpages somehow.

Alas, I'm not good at marketing and a garbage collector is a very technical
and very unsexy project (for most people, anyway). I should make up a silly
name for it, that bears no relation to what it does. Yeah, that would do ...

Thank you for the link to the paper! I'll check it out.

I have good evidence that it's possible to create a non-moving GC that doesn't
suck. It's a bit like marrying the best-of-breed of malloc implementations
with the best-of-breed of incremental mark & sweep collectors. Plus some crazy
ideas I'll have to experiment with first ...

~~~
richcollins
Any ideas why you get more sponsorship from gaming companies? Seems like this
would be really useful for mobile devices.

Also, has anyone tried to pre-jit then load an image on iOS?

~~~
jacktoole1
Unfortunately, I think most mobile devices still forbid JIT (aside from
integrated javascript implementations). The technical limitation was removed
in iOS 5.0, but I believe whether Apple will accept apps using JIT is still an
open question. Where I learned this:
<http://news.ycombinator.com/item?id=3818994>

~~~
richcollins
Right that's why you jit ahead of time and then load the result into
executable memory when the app starts (vs while it's running)

~~~
jacktoole1
I didn't know that was possible - thanks for the insight!

------
simcop2387
The more and more I read about modern garbage collection design the more and
more it seems to be remeniscent of filesystem design. It makes me wonder if
there's some large parallels between them that can be used advantageously to
benefit both. Things that people have learned work well on filesystems might
apply to garbage collection and vice versa.

~~~
ajross
I don't see the technical reasoning there, but I agree that there's a thematic
parallel: they're both reasonably straightforward ideas with very clean user-
facing stories. They can both be implemented very simply (literally as part of
an undergraduate course -- I remember writing both).

And they both have undesirable performance problems in certain regimes, the
solution to which has been the unending quest of generations of incredibly
smart programmers and academics. They have led to staggering complexity, weird
bugs, huge near-unmaintainable codebases, and uncounted PhD theses. Whole
careers have been aimed at this stuff.

And frankly it's been, from many perspectives, mostly a waste. ZFS/btrfs are,
for 95% of problems, indistinguishable from FFS/ext2 implementations that are
a tiny fraction of their size. Modern Java and .NET VMs still have latency
issues under GC pressure that were equally visible in Emacs Lisp a quarter
century ago.

Applications which have hard requirements that fly in the face of these
systems don't use them. Serious storage servers don't use the filesystem
except vestigially (i.e. they do manual sync on very large files, or they use
a block device directly). Serious realtime apps don't do GC and manage memory
themselves. And that's not going to change no matter how many PhD's we throw
at the problem.

~~~
cokernel_hacker
I am afraid that is not even close to accurate. If you define "95% of
problems" to be "reading and writing data such that data is read and written"
sure, great.

However, there are two little, minor things that file systems care about:
performance and safety. FFS/ext2 have neither of things.

Neither ext2 nor FFS contain a journal nor do the copy-on-write metadata.
Heck, if you "upgrade" to ext3, you get the journal but nothing that protects
you from bit-rot.

If you look at most drives on the market, you will see devices capable of
corrupting data _for certain_ after three years. Your journal'd filesystem
does jack in this case, all it can do is ensure proper ordering of writes to
metadata, no guarantees that the data will be any good once written.

How about performance? Well, if you look at FFS/ext2 they are essentially
terrible. Block-at-a-time allocators with no extent configuration. Good luck
getting the most out of your storage media when you have your block tree data-
structure. Granted, ZFS suffers from the same issue but btrfs's extent tree
configuration certainly does not. IIRC, the state of the art ext[23]
implementations use read ahead to ameliorate the problem but does not
fundamentally cure it. If you look at ext4, they have adopted extent trees via
their Htree structure.

A filesystem like zfs/btrfs is pretty imune to bitrot, they can easily mirror
their metadata and avoid overwriting their metadata like FFS/ext[234] making
torn writes non-issues. They avoid the many pathologies that your "simpler"
filesystem and trades the complexity for not needing a fsck mechanism in the
face of data-corruption, one should only be needed in the face of
implementation bugs (you should note that ZFS has no fsck, btrfs just recently
obtained one).

Oh, and if any of you think soft updates work, they don't. While it would be
great if they really did work but in a world where drives actively reorder
writes _and_ do not respect SCSI/SATA/misc transport commands to flush their
internal caches, then you do not get safety. This set of drives is
considerably huge.

tl;dr You are oversimplifying the complex.

~~~
ajross
Clearly I hit a nerve.

>If you define "95% of problems" to be "reading and writing data such that
data is read and written"

Pretty much. You have a competing definition? With the added point that "95%
of problems" are mostly I/O bound reads of unfragmented write-once files and
won't see benefit from extent allocation. And of the remaining 5% most of them
are database systems which are doing their own journaling and block
allocation.

Does btrfs have nice features (I like snapshots myself)? Sure. Do I use it in
preference to ext4? Yes. But be honest with yourself: it's only incrementally
better than ext2 for pretty much everything you use a computer for. And yet it
sits at the pinnacle of 40 years of concerted effort.

And garbage collection is much the same: a staggeringly huge amount of effort
for comparatively little (but not zero, thus "worth it" by some metric)
payout.

Edit: just to throw some gas on the^H^H^H^H point out the narrowness of vision
that I think is endemic in this kind of thought:

> If you look at most drives on the market, you will see devices capable of
> corrupting data for certain after three years.

If you look at most actual filesystems on the market, you'll find they're
embedded in devices which will be thrown out in two years when their contract
expires. They'll also be lost or stolen with vastly higher frequency than that
at which they will experience NAND failure. If you look at most high-value
storage deployments in the real world, they have redundancy and backup regimes
in place which make that filesystem feature merely a downtime/latency
improvement.

Basically, if someone waved a magic wand and erased all fancy filesystems and
GC implementations from the world... how much would really change? Apple has
deployed a pretty good mobile OS without GC, after all. Oracle made a business
out of shipping reliable databases over raw block devices 20 years ago. Try
that same trick with other "difficult" software technologies (video
compression, say) and things look much more grim.

~~~
cokernel_hacker
My competing definition includes safety and performance. Not about how the
system works a single epsilon after an operation but perhaps reading data that
we wrote last month, or last year and doing it using the hardware's resources
efficiently.

Clearly either your or the ext[234] developers are mistaken as they have gone
ahead and implemented extent based allocation and file management.

btrfs's design is inherently safe while ext2's design is inherently unsafe. To
make the analogy work, I would say that ext2 does not fit the design spec for
a file system; it would be like a garbage collector that does not actually
collect garbage.

The payout for data integrity, which no filesystems really handled until the
WAFL era, is huge. If you cannot see, this discussion has no point.

The only thing that I must be _honest_ about is that I would not trust ext2 to
store my data any more than I would trust you to implement a scheme to store
my data.

~~~
ajross
It's really not worth continuing the argument. But I'll just point out that
again you're arguing about features in isolation (e.g. "safety" instead of the
value of what you want to keep safe or dangers from things other than
filesystem failure). It's easy to justify one filesytem as better than
another. Our disconnect is that I'm talking about whether or not that
filesystem is so much better as to justify the effort spent to develop it.

To make the point again: without btrfs (or really "the last 30 years of
filesystem research") we'd all be OK. Likewise if V8 or LuaJIT had to do mark
& sweep, or if everyone had to write in a language with manual storage
allocation, things would look mostly the same.

Without video compression probably two thirds of the market for semiconductor
devices and half the internet would simply evaporate (I pulled those numbers
out of you-know-where, but I frankly doubt they're all that wrong). That tells
me something about the relative value of these three applications of software
engineering talent.

~~~
nightski
The question is, would we have known any of this if that research had not
taken place? It is easy to have 20/20 hindsight. Even with it though I am
pretty sure we still would of expended the effort to get where we are today.

------
dllthomas
>It must be non-copying, due to various constraints in the Lua/C API.

I don't see this. The Lua GC has to know what's visible from C, in order to
avoid freeing those objects. If it knows what's visible from C, then it
strictly just needs to avoid copying _those_ objects. If it moves them to an
uncollected area when C references them, and back when C dereferences them,
then it can use a copying collector on the Lua-only objects.

This actually might work very well, if C-referenced objects are going to tend
to be much longer lived. Of course, copying the objects into and out of the
uncollected region will itself be some work.

I have no idea if the ideal algorithm is in this space, I just think that
either they discarded copying collectors prematurely, or there's something I'm
missing, so either I get to learn something or they do; is there a constraint
I've missed?

~~~
mikemike
There's no explicit API to pin objects from the C side. Ok, so there's an
implicit contract on which pointers to internal VM objects are still valid,
once the C side has gotten them (const char *lua_tostring() is the biggest
offender). But this contract a) does not lead to a practical implementation
and b) is violated by plenty of C modules for Lua, because Lua didn't have a
moving GC, so nobody bothered to follow it.

Now ... the interesting objects, that benefit most from the defragmentation
effects of a moving GC, are the variable-sized objects. Sadly, userdata cannot
be moved at all and strings are the only other interesting object type. Well
... see above.

[Table objects are not variable-size. Only their array and hash parts are. But
these are singly-linked from the table object and can easily be moved around,
anyway. The addresses of these memory blocks are never exposed by the VM.]

~~~
tedunangst
If you had a bit free, you could track which objects escape to C. But it's
probably not worth going that way. Well, maybe. Programs that generate tons of
strings may only occasionally escape them, so maybe moving them as part of the
C call would be a good idea.

------
snprbob86
This document uses a bunch of jargon that I couldn't find good
explanations/definitions for. Is there a good GC survey out there? If I wanted
to write a modern GC and knew nothing about it, where would I look to get
started? What should I read to get up to speed?

~~~
cjensen
"Garbage Collection: Algorithms for Automatic Dynamic Memory Management" by
Richard Jones and Rafael D Lins.

Very readable and approachable. I read the entire book, which I almost never
do with technical books.

~~~
me2i81
There's a newer book (2011), "The Garbage Collection Handbook: The Art of
Automatic Memory Management" by Richard Jones, Antony Hoskins and Eliot Moss.
I haven't read the new one, but it seems to cover a lot of the recent activity
around Java GC.

------
kokoloko
Each language is rolling out its own GC. Why not share some efforts in a
common library?

~~~
pjmlp
It's called JVM and .NET.

~~~
warmwaffles
Can't really use that with Lua and Ruby

~~~
pjmlp
Really?!

Lua:

<http://sourceforge.net/projects/luaj/>

<https://bitbucket.org/xixs/anlua>

<http://luaforge.net/projects/luanet/>

Ruby:

<http://jruby.org/>

<http://www.ironruby.net/>

------
caf
It would be easier to follow if you used actual colours rather than several
shades of grey in your description of your 4-colour design.

