

Chicken Scheme internals: the garbage collector - bhrgunatha
http://www.more-magic.net/posts/internals-gc.html
An overview of how the Garbage Collector in Chicken Scheme works. Contains some good diagrams and code sample to help understand the topic and also a set of links to read more about the subject.
======
gjm11
Henry Baker's "Cheney on the M.T.A." paper (which is the basis for the stuff
described here) is very nice. There's a collection of Baker's research papers
here:

[http://www.pipeline.com/~hbaker1/](http://www.pipeline.com/~hbaker1/)

which includes the "Cheney on the M.T.A." paper
([http://www.pipeline.com/~hbaker1/CheneyMTA.html](http://www.pipeline.com/~hbaker1/CheneyMTA.html))
and a bunch of other interesting things (mostly with a lispy flavour). A few
specific examples:

[http://www.pipeline.com/~hbaker1/Prag-
Parse.html](http://www.pipeline.com/~hbaker1/Prag-Parse.html) \-- a nice
technique (not invented by Baker) for simple parsing tasks, and a neat Lisp
implementation.

[http://www.pipeline.com/~hbaker1/LazyAlloc.html](http://www.pipeline.com/~hbaker1/LazyAlloc.html)
\-- you might notice that the "M.T.A." paper is subtitled "CONS should not
CONS its arguments, part II"; this is the original "CONS should not CONS its
arguments" and is also about allocating things on the stack that you might not
expect to be stack-allocatable.

[http://www.pipeline.com/~hbaker1/Use1Var.html](http://www.pipeline.com/~hbaker1/Use1Var.html)
\-- "use-once" variables (related to "linear logic", which is not Baker's
creation), not a million miles from C++'s auto_ptr/unique_ptr but Baker was
advocating this stuff as early as 1993.

And some other interesting things due to people other than Baker, such as a
copy of the famous HAKMEM
([http://www.pipeline.com/~hbaker1/hakmem/hakmem.html](http://www.pipeline.com/~hbaker1/hakmem/hakmem.html)).

------
girvo
I was looking at Chicken Scheme the other day for building a Entity-Component-
System 2D game engine, as I love Lisps, but alas, the SDL bindings were quite
out of date, and I didn't want to take my first project on as writing bindings
for a C library!

It's a very cool system, and this article is awesome. I find garbage
collectors fascinating.

~~~
Flow
Perhaps Gambit Scheme could be a bit more updated on this?
[http://dynamo.iro.umontreal.ca/wiki/index.php/Main_Page](http://dynamo.iro.umontreal.ca/wiki/index.php/Main_Page)

[https://news.ycombinator.com/item?id=7361947](https://news.ycombinator.com/item?id=7361947)

[http://jlongster.com/Open-Sourcing-My-Gambit-Scheme-iOS-
Game...](http://jlongster.com/Open-Sourcing-My-Gambit-Scheme-iOS-Game-
from-2010)

------
dman
Any pointers on how the diagram for the stack was generated?

~~~
t__r
<!-- Created with Inkscape
([http://www.inkscape.org/](http://www.inkscape.org/)) -->

------
userbinator
Wow. As someone with a background of mostly C/Asm, it is rather shocking to
see the amount of extra code compiling higher-level-languages produces;
summing the contents of a list (array or linked) shouldn't be more than half a
dozen instructions in a loop... and then I see things like dynamic allocation.
Seriously, _wow_.

~~~
rbehrends
This is not inherent in the process. What you're seeing is the confluence of
several factors:

(1) Scheme is dynamically typed, not statically, so some operations need to
operate on objects whose properties are not known at compile time and which
need to be constructed/wrapped. The same does not necessarily hold for a
statically typed language.

(2) C is not really designed to be the target language of a compilation
process. In particular, there's no portable access to stack frames, so any
mechanism that inspects the stack (in this case, for precise garbage
collection) needs to recreate the necessary machinery. If you target a
representation that is actually designed for this purpose (such as LLVM), a
lot of these problems go away. Likewise, C does not have native support for
continuations, tail calls, or a number of other mechanisms that more
sophisticated backends do support.

(3) The compiler does not need to generate human readable code; code
generation that targets C is generally not concerned with producing minimal
output, since the C optimizer will strip away superfluous stuff, so the focus
is more on keeping the backend simple.

~~~
userbinator
> so some operations need to operate on objects whose properties are not known
> at compile time and which need to be constructed/wrapped

But in the given example, shouldn't the compiler be able to see, via a simple
analysis, that this function is only being applied to lists of numbers?

> since the C optimizer will strip away superfluous stuff

That's a _huge_ assumption; in practice I have not seen any optimisation that
would take that Scheme-compiled C program and turn it into one a C programmer
would write (if there was, I think it could certainly make Scheme a _lot_ more
popular), and I've been looking at compiler output for over two decades now.
There has been improvement but not _that_ much. There's also the question "why
even generate 'superfluous stuff' if it's going to disappear anyway?"

To paraphrase an old saying, "with great flexibility comes great complexity."

~~~
_vya7
> _But in the given example, shouldn 't the compiler be able to see, via a
> simple analysis, that this function is only being applied to lists of
> numbers?_

No, that's the beauty and curse of dynamic languages such as the Scheme
mentioned here. Of course, you could do special optimizations, such as
inlining bindings that never get used out of scope or defining special
functions that don't operate on real Lisp data structures but plain C
structures, but they add extra complexity and duplicated logic to the
compiler. I'm sure those kind of optimizations would be out of scope of this
article.

> _in practice I have not seen any optimisation that would take that Scheme-
> compiled C program and turn it into one a C programmer would write_

Obviously Scheme code is going to look different than C code, they're
different languages. I think the point being made was that it's okay to emit
some C code that's a little extra verbose, doing things like defining extra
vars and such, knowing that the compiler will simplify many such things.

> _There 's also the question "why even generate 'superfluous stuff' if it's
> going to disappear anyway?"_

To keep the compiler's implementation simple and understandable. The more
complex output it has to produce, the harder it is to read/write/debug.

------
tsenkov
If you are having trouble reading this because of the dark theme and you use
chrome - install StyleBot and apply this css:
[https://gist.github.com/nicroto/9885777](https://gist.github.com/nicroto/9885777)
(I inverted the colors of the images, too).

Awesome read. Thanks.

~~~
agumonkey
Since chromium supports it, I often use printfriendly.com for this

[http://www.printfriendly.com/print/?source=site&url=http%3A%...](http://www.printfriendly.com/print/?source=site&url=http%3A%2F%2Fwww.more-
magic.net%2Fposts%2Finternals-gc.html)

You lose some worthy css (code syntax coloring) but otherwise it's pretty
nice. (requires javascript)

------
mercurial
Really good technical article. I'm surprised to see the CPS representation
(apparently) used to generate code, my understanding is that it was usually
reserved for intermediate representation of some functional languages, in
order to facilitate optimisation.

------
pjmlp
Very interesting read for any wannabe compiler implementer.

------
tempodox
That link is broken.

~~~
agumonkey
archive.is snapshot in case the server goes down (again)

[http://archive.is/7C5B4](http://archive.is/7C5B4)

------
nutjob123
CHICKEN, not to be confused with the CHICKEN CHICKEN CHICKEN programming
language defined here: [http://torso.me/chicken](http://torso.me/chicken)

~~~
gjm11
Regrettably, that language is in fact also called Chicken (not CHICKEN CHICKEN
CHICKEN). But Chicken Scheme has been around a lot longer.

