
Show HN: A simple garbage collector for C - mkirchner
https://github.com/mkirchner/gc
======
davnicwil
Just out of interest for people not familiar with C, latest goings on there
etc, what's the usecase where you'd want a garbage collector in C?

My (very surface level) understanding was always the trade off for the
increased manual effort of using C - manual memory management being one
example - was that you could tailor your solution exactly to your usecase for
increased performance / lower resource use.

If you're going for a garbage collector, why not also benefit from some of the
increased language power/features of a higher level language?

~~~
gwbas1c
Typically, new garbage collected languages use the Boehm garbage collector (GC
for C) in early versions, and then implement their own once they have time to
fully optimize their runtime.

This is what Mono and Golang did. (They both used Boehm until they had time
and resources to implement their own runtime-optimized GC.) I suspect Java did
too, but I'm not sure.

In this case: "The original motivation for gc is my desire to write my own
LISP in C, entirely from scratch - and that required garbage collection."

What basically happens is that gc (and Boehm) are "conservative garbage
collectors." They treat all values in a data structure as a potential pointer,
because they don't know the contents of the data structure. It's a good "quick
and dirty" way to have garbage collection if you can accept the risk that some
of your memory will remain uncollected if some of your data happens to have
the same value as one of your pointers. In practice, it's a good tradeoff.

(Other tradeoffs are that you can't have things like real-time garbage
collection, generational garbage collection, or compacting garbage
collection.)

> If you're going for a garbage collector, why not also benefit from some of
> the increased language power/features of a higher level language?

Ironically, one of Boehm's use cases is looking for memory leaks. You
basically #ifdef Boehm into a test build, and if a GC finds garbage, you know
that you didn't free something correctly.

~~~
weberc2
> Ironically, one of Boehm's use cases is looking for memory leaks. You
> basically #ifdef Boehm into a test build, and if a GC finds garbage, you
> know that you didn't free something correctly.

So a "garbage checker"? :) This is interesting, I never really thought about
using it as a checking tool.

------
tekknolagi
Is there a simple GC like this one that uses handles instead of C stack
scanning to keep track of local references?

~~~
shakna
The Boehm GC seems to do both internally.

------
swiley
I just wish C had something like defer in go, That would cover most cases.

~~~
skolskoly
If you're willing to use macros, this can do the trick. Return and break will
preempt the expression, but continue should work fine.

    
    
      #define DEFER(EXPR) for(int _tmp=1; _tmp; _tmp=0,(EXPR))
    

Example:

    
    
      char * data = malloc(32);
      DEFER(free(data))
      {
              // do stuff
      }

~~~
adtac
Perhaps I'm misunderstanding this, but this doesn't solve the problem at all
-- returning within the block statement wouldn't call the deferred expression,
which is entirely the point of a defer statement, no? If control is guaranteed
to reach the end of the block statement, of course; but requiring such a
constraint would make this defer very handicapped. You have tons of function
exit points and you most likely want to free memory at every single one.

~~~
simias
Yes, the only way I could imagine implementing a "proper" defer in C would be
to use (like for most insane C hacks) setjmp/longjmp. I'm fairly sure it's
explicitly forbidden by the Geneva Conventions though.

Alternatively you might be able to use nested functions to guard against a
stray return but that's not standard.

~~~
saagarjha
How _would_ you implement defer with setjmp/longjmp? I can’t think of anything
off the top of my head.

~~~
simias
You'd probably have to #define "return" to call longjmp instead. Although I
guess at this point you might be better off just have return call the cleanup
code. You'd have to be careful to handle the nesting correctly though, which
might be slightly easier with setjmp contexts.

------
Rochus
How about efficiency? Looks like the collector has to scan through all
allocated memory and the whole stack to check whether there could be a
pointer. And if any data looks like such a pointer (even by coincidence) the
corresponding memory cannot be collected. Anyway: isn't this just the same
concept as the Boehm conservative GC? What's the difference/improvement?

~~~
hu3
Second paragraph in README:

> The focus of gc is to provide a conceptually clean implementation of a mark-
> and-sweep GC, without delving into the depths of architecture-specific
> optimization (see e.g. the Boehm GC for such an undertaking). It should be
> particularly suitable for learning purposes and is open for all kinds of
> optimization (PRs welcome!).

------
jhammond1
I was hoping for a link to the exit() manpage.

------
mrobot
I am actually doing this exact same thing right now and will reference this
for comparison once i am done with the first version of mine. I am doing stop-
and-copy though.

------
kickscondor
Cool! Thanks for this - and the link to orangeduck’s work (which further
points to Cello.)

------
touchpadder
Can it help with compiling Typescript to WASM? i.e. in
[https://github.com/AssemblyScript/assemblyscript](https://github.com/AssemblyScript/assemblyscript)

~~~
maxgraey
AssemblyScript already has tiny hybrid garbage collector (PureRC) which use
deferred ARC and GC only for cyclic references:
[https://github.com/dcodeIO/purerc/blob/master/papers/Bacon03...](https://github.com/dcodeIO/purerc/blob/master/papers/Bacon03Pure.pdf)

