Hacker News new | comments | show | ask | jobs | submit login
Garbage collection in a large Lisp system (1984) (psu.edu)
70 points by lispm on Dec 21, 2016 | hide | past | web | favorite | 38 comments



One day a student came to Moon and said: “I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons.”

Moon patiently told the student the following story:

“One day a student came to Moon and said: ‘I understand how to make a better garbage collector...


If something is purely functional and doesn't have any fixed point loopholes or other nonsense, you can know that there is no program with circular pointers.


I don't think that's true. I think you could design a purely functional language where everything is immutable, but something contains a circular loop by definition. You could just declare that l is the list of integers going from 0 to 5 and then back to 0 ad infinitum. IMO, circular references can exist without mutation, if the language specifies that something is circular/recursive at the moment it's created. Defining a circular linked list in a declarative way rather than in terms of the instructions that build the linked list. The list is created all at once. It comes into existence with a cycle into it.


Like, say, Prolog?

    A = [a | A]
Prolog implementations provide destructive operations and mutation, but here above this is done with unification only. Arguably, you could say that unification performs mutation (but only once).


Better than that, you can "mutate" if you're not invalidating information about the variable. Oz (not incidentally, developed by Prolog people) lets you add constraints to a variable as long as they don't contradict the previous ones. Binding to a specific value is just a very strong constraint. Re-binding to the same value is okay.

This is less strict than pure functional programming, but still feels declarative and makes concurrent programming easy: no piece of code that looked previously at your variable will have its assumptions about it broken.


Arguably we could argue that everything flips 1's to 0's and back under the hood, though. Instantiation of a new lexical environment with fresh bindings mutates something in the machine.


Make that a functional programming language without mutually recursive first class local functions. (In the case of Lisp/Scheme, no labels or letrec)


xs = 1:xs

Lazy evaluation makes it possible to construct circular data structures without mutation.


Or in Lisp one could write #1=(1 . #1#), eg:

    CL-USER> (subseq '#1=(1 2 3 . #1#) 0 10)
    (1 2 3 1 2 3 1 2 3 1)


Ah that's interesting. I was going to say that in principle a strict language could have a special syntax for defining circular data structures, but I didn't know about that particular corner of CL.


You also know that your program is not Turing-complete.


Had a complicated discussion here, but it's ridiculous, because:

Church-Turing thesis says lambda calculus is turing complete and it is purely functional with no pointers at all.


But it does have fixpoints... right? You need recursion somehow.


In Standard ML, all recursive value definitions must have function literals as their right-hand sides. Thus, at runtime, recursion is guaranteed to go through functions (which normally aren't allocated in the heap) and/or mutable data structures (which can be handled in a special way by the garbage collector). Immutable data structures are generally guaranteed to be cycle-free.

That being said, reference counting is a really bad fit for functional languages for several other reasons.


You don't need the type of fixed points that cause the issues with circular pointers. Recursion can be done with (appropriately for HN) the Y combinator. The kind of tricks I'm talking about rely on laziness and are pretty complicated. See: http://stackoverflow.com/questions/37366222/why-is-this-vers...


How would you implement recursion then?


Having switched to managed code a little over a decade ago, I can say that garbage collection is a godsend, but something in this article flipped a switch for me. A conventional C programmer is taught, in a somewhat moralistic way, that it's their responsibility to track every resource and manage its lifetime, but with managed code we can delegate that to an admittedly very complicated program that takes care of it for us. Just like if I live in Manhattan, I don't need to know how to take my bag of garbage to the dump. I wonder what other metaphors are lurking right under our noses like that.


One example I started to think about is driving.

I wonder if C programmers only drive manual.


C, the 1980's Porsche 911 of programming languages.

Drive it well and it's scary fast. Make a mistake and you're upside down in a ditch.

And it takes 30s for a dedicated amateur to break in and steal it.


It's true that in the UK, the vast majority of us drive manuals for no other reason than to maintain our practice at driving manuals.


I drive mostly manual, because of the price difference on car prices here in Europe.


Driving conditions can vary a huge lot. Stuck in city traffic, there's nothing to gain from manual stick. But if you are coming down a steep mountain road or running a car race, you want to use your engine to brake as well as to accelerate. As far as I know, in such a case, manual is unbeatable.


> As far as I know, in such a case, manual is unbeatable.

I guess that was a big mistake for F1 and other sport cars to have adopted semi-automatic gear boxes then.


From Wikipedia:

> A semi-automatic transmission ... is an automobile transmission that does not change gears automatically, but rather facilitates manual gear changes by dispensing with the need to press a clutch pedal at the same time as changing gears.

I wasn't talking about clutch-and-stick vs. push-a-button. I was talking about controlling which gear is on vs. letting the car guess alone. I've never heard about race cars where the driver does not decide himself which gear is on at which moment; if you have, thanks for letting me learn something new.


Just like reference counting...


I don't know of a lot of F1/automatic gear cars being involved in steep terrain where the engine braking scenario would come into its own.


The grandpartent comment clearly says "or running a car race" as another example.

That said, I see no reason why a (semi-)automatic couldn't engine brake.


> I see no reason why a (semi-)automatic couldn't engine brake.

The semi surely can. The full automatic, I don't know. How can the car understand that you want a lower gear before you touch the brake?


I understand, my point was that manual shift cars are used in every discipline where such things matter like rally driving afaik


You mean like WRC?

Better update yourself.

The Subaru Impreza WRC2004 gearbox does a gear change in 0.05s vs 1s taken by the best manual ones.

http://m.crash.net/wrc/feature/112516/1/technical-talk-semia...

Do you know any WRC driver able to beat that?


> I wonder if C programmers only drive manual.

FWIW, I mostly program in C and I drive automatic. C enables me to be a control freak with my code, but I don't really have a need or desire to be a control freak with other things in my life…


> Just like if I live in Manhattan, I don't need to know how to take my bag of garbage to the dump.

Except when your garbage is full of rotting fish and you want that garbage disposed right now. In other words, RAII.

I'd really like a language where you can choose GC or RAII for the types you define.


(1984), so large=small. The Symbolics machine used had a whopping 1M words (=4.5 Megabytes) of real and 15M words (= 67.5 Megabytes) of virtual memory.

They used that 1:15 real:virtual ratio in their experiments and claim typical use used 1:30.

I think this system swapped quite a bit more than most modern systems. That must have had an impact on optimizing the garbage collector.


1MW RAM was very expensive in 1984. It was actually 36bit + 8bit ECC memory. That was a limiting factor. Disks with >100 MB capacity were also quite expensive. With RAM, Disk, Tape and possibly other extensions the price was easily $100k for a new system.

1 MW was not really enough for practical development use. After a few years 4 MW = 18MB was a useful RAM size. Disks might then be 300MB or even larger. Which made virtual memory of 200-400 MB possible. Mid 80s onwards.

> That must have had an impact on optimizing the garbage collector.

That is one of the topics of the linked paper... ;-) We are not only talking about 'the' garbage collector, but the whole memory management system, which provided static/manual memory management and various forms of GCs.


This is basically the system Java uses today, isn't it?


Yes. The general term these days is tracing garbage collection. (https://en.wikipedia.org/wiki/Tracing_garbage_collection)

There are all kinds of bells & whistles you can add on top of the general approach, mostly time-space tradeoffs, such as with a copying collector. But the overall time-space complexity generally remains the same. Sometimes it can get worse, like when you add concepts like ephemerons.


Well defining the java GC as tracing is underspecifying it a bit. Virtually all GCs are non-refcount, AFAIK mostly because touching refcounters all the time destroys memory performance. A tracing GC can do that in a batched and parallel fashion.

And then you realize that they are more or less the same in the end:

https://pdfs.semanticscholar.org/91dc/a25f8cb407fc68218f7d5a...

"A Unified Theory of Garbage Collection" TL;DR: Refcount and tracing are a duality.


That paper discusses a reference counting GC which also has a trace phase in order to reclaim cycles.

  "Reference counting must perform extra graph iterations in
  order to complete collection of cyclic data. This is
  typically viewed as a major drawback of reference counting"
What that paper really shows is the equivalence of any kind of tracing collector.

In practice when people refer to reference counting it's usually implied that there's no trace phase, and that the GC cannot reclaim cycles. Perl, Swift, etc, are the typical examples. Python is the counter example because it does reference counting and tracing.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: