Hacker News new | past | comments | ask | show | jobs | submit login
Baby’s First Garbage Collector (2013) (stuffwithstuff.com)
124 points by pcr910303 4 days ago | hide | past | web | favorite | 29 comments

This is awesome. I'm working my way through The Garbage Collection Handbook and it's always nice to see a GC link on HN.

"What does “low on memory” even mean, especially on modern computers with near-infinite virtual memory?" is a question that I've been wondering about for a while.

Basically, the memory management on modern computers with virtual memory has to self-impose some constraints to create internal pressure to clean up memory. The implementation may have the option to set some absolute maximum heap size. Or other parameters that set growth hurdles. The memory management tries to keep things within a given hurdle, but when it's obvious that more memory is needed, it changes to another hurdle: e.g. it allocates a new heap area from the OS with a big allocation.

There is a performance trade off there, because if lots of memory is available, but we tighten the belt anyway, the application may perform less well due to running garbage collection when it could have gotten away with allocating more external memory. On the other hand, the application is then less of a "pig" with regard to other things in the system.

Memory is a very finite resources actually. Pretty sure virtual memory and over allocation is a major issue with virtual machines and containerization nowadays.

Software preempt huge amount of memory because it's assumed to be unlimited free virtual memory. Except the software is running within a container or an OS, within a virtual machine, on a physical machine, that's running another hundreds things like that more or less isolated from one another.

While the application may be aware of what memory it's really using, the layers above may not be able to tell exactly what's used or sitting idle. It's a major challenge to estimate and distribute resources appropriately.

near-infinite virtual memory, yes, but there's practical limits to what the hardware can support.

After exhausting actual RAM, and swap - there's nowhere else to put things... so you'll crash anyway.

If you mean the jones/lins book, it's excellent[0]. Fully recommended to all.

[0] https://www.amazon.co.uk/gp/product/0471941484 I think there's a 2nd edition but I can't find it.

> 2nd edition but I can't find it

Perhaps you're referring to this one [1]?

[1] https://www.amazon.co.uk/Garbage-Collection-Handbook-Managem...

Thanks, I think I was confusing that with the 2nd ed, which doesn't seem to exist (in fact there is a 1999 2nd printing of the 1996 book with a different cover but with the same ISBN, just to make things worse, https://www.cs.kent.ac.uk/people/staff/rej/gc.html, so if you want to buy the later one which incorporate the errata from the first printing, check his website first. Publishers do make a mess of things sometimes).

Jones' website https://www.cs.kent.ac.uk/people/staff/rej/gc.html is excellent too.

From the same guy: https://craftinginterpreters.com great stuff

Reach-ability in graphs is quite an interesting topic in applied sciences.

For example, recently I had to figure out (nontrivial) investment proportions in funds and realised two things:

1. You can use a weighted DAG [1] and sum the products of paths from each leaf to the root to get to the investment proportions; the total of all such paths from leaves to the root should be 100% of the underlying investing entities.

2. Tax offices should use weighted DAGs to figure out cases where there is suspected fraud. Whenever you sum and don't get to 100%, then you know that people are doing fishy things with their investment vehicles and shell companies. [2]

[1] https://en.wikipedia.org/wiki/Daw

[2] https://en.m.wikipedia.org/wiki/CumEx-Files

I'm no great programmer, so my stupid question of the day is: why do people depend on GC so much? Garbage collection is one of those weird things that everyone seems to accept must be necessary in higher level languages, and end up being quite a burden in operation. The more people depend on dynamic memory, the more likely GC problems are. Meanwhile, the simplest fix in the world - restart the process frequently - prevents memory leaks from accumulating, and requires no complicated tuning.

> Meanwhile, the simplest fix in the world - restart the process frequently - prevents memory leaks from accumulating, and requires no complicated tuning.

This is just a degenerate case of garbage collection, really.

A novel and valid perspective.

I have simply relied on leaking wildly before, knowing it's a small program that will end soon. It's an acceptable solution in some cases.

Agree. The best way to manage memory in trivial, one-off C programs is not managing it at all. Don't try to free() everything. When I first started to write in C, I attempted to free() whenever I have a chance to, no matter how trivial the program is. Later a textbook convinced me that the effort is not worthwhile (unless for learning C) if all your program does is some simple calculation or processing.

The JVM now has the epsilon GC, which does basically that, it just never GC’s. It’s unfortunate though that there’s no explicit way to tell java to free up a certain reference, if so, in combination with epsilon, you could write real time programs. Maybe you already can do something like that in the embedded java version though? Not sure.

Interesting. First I've heard of it. Well I guess if you can ensure bounded space by limited allocaton, then simply shove manually 'freed' stuff back on a free list, you could do that, sure.

But a quick google gets https://openjdk.java.net/jeps/318 which does say

  It is not a goal to introduce manual memory 
  management features to Java language and/or JVM...
but it doesn't say you can't or shouldn't. Game on then.

Sure, if all your objects are mutable, then that can work, but a lot of stuff in core libraries are not, and you can’t control those allocations, short of not using them. There is also sun.misc.Unsafe, which has freeMemory, but first of all, I don’t know if that works for arbitrary addresses, or just ones previously allocated from the allocateMemory method, and anyways, I don’t think there’s a reliable way to get the address of an arbitrary object in the first place. But to add both of those pieces of functionality should be a pretty small patch to the jvm code anyways, so if someone were really motivated, it may not be all that much work anyways.

> and you can’t control those allocations

You're totally right. Rather stupidly I hadn't thought of that.

Restarting the process frequently sounds like a very messy solution. It is not acceptable in all cases to restart the process frequently. For server software it is the rule that this is not acceptable. Also note that if your software is going to be used more it would need more frequent restarts. This will generally not make users happy. Maybe you could sometimes get away with starting a new process for every login. Memory usage will be higher in that case, though, perhaps unacceptably so. Also for software that is used on the desktop it would generally be frowned upon by the users.

I suppose it is possible to avoid garbage collection if you are willing to accept kind of old-style programming practices. You could have arrays of objects and when an object is no longer needed you don't deallocate it but allow it to be reused when another object of that kind is needed. I guess this would not lead to clearer code for most purposes. But, who knows, some people have been complaining about the lack of clarity in object oriented code.

I've seen many advocates of C++ claim that RAII [0] is a viable alternative paradigm to garbage collection or even a silver bullet to resource management in general. I find it's useful but I don't believe it's a silver bullet, yet my skill level is not high enough to come up with a counterargument, but you may find this approach interesting.

[0] https://en.wikipedia.org/wiki/Resource_acquisition_is_initia...

RAII works as long as you don't have closures, which are prevalent in higher-order/functional settings.

If I adhere to RAII, I guarantee that if I allocate something in a scope then it will be de-allocated when I leave the scope. But if I want to return a closure (a function pointer + state) then I can't de-allocate it myself.

I thing C++ and Rust are trying to address this kind of thing with move semantics and ownership. Perhaps functional languages will solve this kind of thing with linear types in the future.

I'm currently looking into 'defunctionalization', 'lambda lifting', etc. to see if they are viable alternatives.


You can infer where you should allocate data. It was done 25 years ago.

The problem is that for complex programs you end with one big region and this is not useful.

Linear types that are visible to the user will not solve these problems. In fact, they will complicate things greatly, because you have to provide two functions for mapping, for example: for normal and linear types. And their mix with normal types is problematic at best.

The inference of linear operations in the backend, on the other hand, can be more efficient - you may end up using linear memory references not knowing about that.

Many reasons:

* It's much faster than malloc/free. a fragmented free is much much slower than a full stop-the-world major GC, and GC allocation is zero cost.

* it makes your program memory safe, memory safety is too hard to be left to the user.

* it makes your program run faster with a compacting collector (but not this trivial one). it does defragmentation, and puts together used arrays/strings, so that they often are on the same cache-line. typically 2x faster.

One of the Lisp machines of the 80's required you to save your work and reload so it would load in its more compact form. This was the first little while until they built the GC for this machine.

In a sane program without GC, you'll be freeing the memory when you don't need it any more, and your less talented colleagues will get it wrong.

I'm not entirely sure why you're being downvoted, at the very least downvotes prove your original point... A lot of console games do this intentionally, and various console SDKs have special reboot modes to support it. It's falling out of favor in support of games that are architected for streaming, but they still kind of do the same thing internally (free a large chunk and start loading a new one from disk). It actually seems like a valid approach for back end services, especially if those services are mostly short lived and have quick start up times. It's a hard thing to do with a lot of the runtimes most web applications arr built with, but might be a valid approach with Rust or Go. I've heard people doing similar things with Erlang/OTP.

GC gives you memory safety in practice. It’s not possible to slip up in normal circumstances.

GC and memory safety are orthogonal problems.

I was going to upvote your comment and blame HN for not being able to upvote more than once.

Then you ruined it all with the last sentence...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact