
Garbage collection in a large Lisp system (1984) - lispm
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.125.2438
======
hprotagonist
One day a student came to Moon and said: “I understand how to make a better
garbage collector. We must keep a reference count of the pointers to each
cons.”

Moon patiently told the student the following story:

“One day a student came to Moon and said: ‘I understand how to make a better
garbage collector...

~~~
hawkice
If something is purely functional and doesn't have any fixed point loopholes
or other nonsense, you can know that there is no program with circular
pointers.

~~~
tachyonbeam
I don't think that's true. I think you could design a purely functional
language where everything is immutable, but something contains a circular loop
by definition. You could just declare that l is the list of integers going
from 0 to 5 and then back to 0 ad infinitum. IMO, circular references can
exist without mutation, if the language specifies that something is
circular/recursive at the moment it's created. Defining a circular linked list
in a declarative way rather than in terms of the instructions that build the
linked list. The list is created all at once. It comes into existence with a
cycle into it.

~~~
junke
Like, say, Prolog?

    
    
        A = [a | A]
    

Prolog implementations provide destructive operations and mutation, but here
above this is done with unification only. Arguably, you could say that
unification performs mutation (but only once).

~~~
spyrosg
Better than that, you can "mutate" if you're not invalidating information
about the variable. Oz (not incidentally, developed by Prolog people) lets you
add constraints to a variable as long as they don't contradict the previous
ones. Binding to a specific value is just a very strong constraint. Re-binding
to the same value is okay.

This is less strict than pure functional programming, but still feels
declarative and makes concurrent programming easy: no piece of code that
looked previously at your variable will have its assumptions about it broken.

------
13of40
Having switched to managed code a little over a decade ago, I can say that
garbage collection is a godsend, but something in this article flipped a
switch for me. A conventional C programmer is taught, in a somewhat moralistic
way, that it's their _responsibility_ to track every resource and manage its
lifetime, but with managed code we can delegate that to an admittedly very
complicated program that takes care of it for us. Just like if I live in
Manhattan, I don't need to know how to take my bag of garbage to the dump. I
wonder what other metaphors are lurking right under our noses like that.

~~~
pjmlp
One example I started to think about is driving.

I wonder if C programmers only drive manual.

~~~
willtim
It's true that in the UK, the vast majority of us drive manuals for no other
reason than to maintain our practice at driving manuals.

~~~
pjmlp
I drive mostly manual, because of the price difference on car prices here in
Europe.

------
Someone
(1984), so large=small. The Symbolics machine used had a whopping 1M words
(=4.5 Megabytes) of real and 15M words (= 67.5 Megabytes) of virtual memory.

They used that 1:15 real:virtual ratio in their experiments and claim typical
use used 1:30.

I think this system swapped quite a bit more than most modern systems. That
must have had an impact on optimizing the garbage collector.

~~~
lispm
1MW RAM was very expensive in 1984. It was actually 36bit + 8bit ECC memory.
That was a limiting factor. Disks with >100 MB capacity were also quite
expensive. With RAM, Disk, Tape and possibly other extensions the price was
easily $100k for a new system.

1 MW was not really enough for practical development use. After a few years 4
MW = 18MB was a useful RAM size. Disks might then be 300MB or even larger.
Which made virtual memory of 200-400 MB possible. Mid 80s onwards.

> That must have had an impact on optimizing the garbage collector.

That is one of the topics of the linked paper... ;-) We are not only talking
about 'the' garbage collector, but the whole memory management system, which
provided static/manual memory management and various forms of GCs.

------
PaulHoule
This is basically the system Java uses today, isn't it?

~~~
wahern
Yes. The general term these days is tracing garbage collection.
([https://en.wikipedia.org/wiki/Tracing_garbage_collection](https://en.wikipedia.org/wiki/Tracing_garbage_collection))

There are all kinds of bells & whistles you can add on top of the general
approach, mostly time-space tradeoffs, such as with a copying collector. But
the overall time-space complexity generally remains the same. Sometimes it can
get worse, like when you add concepts like ephemerons.

~~~
ysleepy
Well defining the java GC as tracing is underspecifying it a bit. Virtually
all GCs are non-refcount, AFAIK mostly because touching refcounters all the
time destroys memory performance. A tracing GC can do that in a batched and
parallel fashion.

And then you realize that they are more or less the same in the end:

[https://pdfs.semanticscholar.org/91dc/a25f8cb407fc68218f7d5a...](https://pdfs.semanticscholar.org/91dc/a25f8cb407fc68218f7d5adb912e7db35e81.pdf)

"A Unified Theory of Garbage Collection" TL;DR: Refcount and tracing are a
duality.

~~~
wahern
That paper discusses a reference counting GC which also has a trace phase in
order to reclaim cycles.

    
    
      "Reference counting must perform extra graph iterations in
      order to complete collection of cyclic data. This is
      typically viewed as a major drawback of reference counting"
    

What that paper really shows is the equivalence of any kind of tracing
collector.

In practice when people refer to reference counting it's usually implied that
there's no trace phase, and that the GC cannot reclaim cycles. Perl, Swift,
etc, are the typical examples. Python is the counter example because it does
reference counting and tracing.

