
Rust has a static “garbage collector” - steveklabnik
https://words.steveklabnik.com/borrow-checking-escape-analysis-and-the-generational-hypothesis
======
ridiculous_fish
> I don’t generally find writing Rust to be significantly harder than using a
> GC’d language. I’ve been trying to figure out why that is.

That's quite surprising! Here's some examples of where things would be easier
to write if Rust had a GC:

1\. Any sort of lock free algorithm. This is a big one - hazard pointers and
the like are much harder than letting the GC clean up.

2\. Data structures which may contain "back" pointers (e.g. parent pointers in
a tree).

3\. Data structures which may be cyclic. For example the classic LRU cache is
best implemented with a cyclic linked list which is hard to express in Rust.

4\. Any sort of refactoring that may adjust ownership. E.g. going from T to
Rc<T>. GC requires fewer choices so these refactorings are easier.

Surely this pain is real, even if Rustaceans think it's worth tolerating?

In fairness precious resource cleanup (file descriptors, etc) is easier
without a GC.

~~~
pcwalton
Crossbeam makes lock-free data structures easier to write than in any other
mainstream language I know of.

Regarding graphs, just use indices and vectors. It's often better to use
indices for graph nodes anyway, for example in games, where an ECS design
using IDs is generally preferable to direct references to objects.

For a while, I was kind of obsessed with showing that Rust could do doubly
linked trees and graphs just as well as C++ could. I now realize that this was
a mistake, and I made a big mess of a lot of code in the process. Having a
single owner and using IDs for "secondary" references is often preferable even
in languages that easily allow multiple strong references to objects. In the
small, direct references and an OO style can be convenient, but in the large,
you often want to break up your graph code into components and systems anyway,
and it's kind of nice to have the language push you to front loading that kind
of design.

~~~
chc
I know this works, but whenever I hear this advice, I can't help thinking:
Aren't indexes and vectors basically just a way of smuggling pointers past the
borrow checker?

~~~
kibwen
It's not bypassing the borrow checker; it's bypassing references. C++ doesn't
have a borrow checker, and the same pattern of forgoing references for integer
IDs is seen there. References are a just a tool to express certain pointer
patterns, and like every other tool they have contexts for which they work
well and contexts for which they don't. Really the most surprising result of
all this is the broadening realization that references aren't the end-all
abstraction for reasoning about pointer-like behavior (familiarity with smart
pointers like C++'s shared_ptr makes this less of a surprising realization,
but it's easy to initially dismiss a library-level feature as being less
intrinsically fundamental than a language-level feature).

This isn't to say that futzing with integers is the best imaginable solution
for these tasks. Language-level support (or at least stdlib support) for
generational indices would be an interesting subject to pursue.

~~~
pjmlp
C++ is getting one via static analyzers though.

~~~
kibwen
If this is referring to the proposed C++ lifetime profile, it seems as though
recently it has scaled back its ambitions from detecting all misuses of
references to now detecting only common misuses of references. Still useful
for C++ programmers, certainly, but it's no longer really comparable to Rust's
borrow checker. And even with it, it's not going to change anyone from using
an ECS to using a reference scheme; the point is that references _aren 't_ the
best pointer abstraction for these sorts of thoroughly-dynamic graphs with
elements of dynamic lifetime.

~~~
pjmlp
Yes I am referring to that proposal, as for ECS, indexes are not a must.

[https://www.gamedev.net/blogs/entry/2265481-oop-is-dead-
long...](https://www.gamedev.net/blogs/entry/2265481-oop-is-dead-long-live-
oop/)

In this particular case, currently Rust's borrow checker isn't of much help
either, hence the workaround with generations.

------
pcwalton
One thing to note is that borrowing in Rust is more powerful than escape
analysis can ever be in other languages, because in Rust it's in the type
system and therefore works well with separate compilation and higher-order
functions.

To give an example, suppose we have the following:

    
    
        struct A { ... }
        
        fn g(a: &A);
    
        fn f() {
            g(&A { ... });
        }
    

And let's say g() is defined in another crate and separately compiled. In
Rust, we can safely allocate the instance A in f()'s stack, because we know
via the type system that that instance can never escape. But compare the
equivalent example in, say, pseudo-Java:

    
    
        class A { ... }
    
        class G {
            public static void g(A a);
        }
    
        class F {
            public static void f() {
                G.g(new A());
            }
        }
    

Can we promote A to the stack? Well, it depends. If the compiler can see the
source of G.g and prove that A never escapes, then it can. Otherwise, it has
to conservatively assume that A could escape.

(Incidentally, this sort of thing is one of the main reasons why the JVM
usually uses a JIT: because Java allows you to replace the bodies of classes
at runtime via classloaders, you really want to be able to do these kinds of
interprocedural optimizations based on the information you know at the time,
but reserve the right to _back them out_ if the class bodies change. Only a
JIT is able to do this.)

This gets even more difficult when you get to higher-order functions:

    
    
        class A { ... }
    
        interface G {
            void g(A a);
        }
    
        class F {
            public static void f(G g) {
                g.g(new A());
            }
        }
    

Can we allocate A on the stack? Well, it depends on whether _any possible
instance_ of G could possibly have its argument escape. Java's HotSpot
compiler is quite clever here and can actually make assumptions based on the
classes that are currently loaded (as a side effect of devirtualization). But
Go, for example, will always allocate that A instance on the heap, as far as
I'm aware.

This is not a problem for Rust, because the type system ensures that A can
always be allocated on the stack in the analogous code:

    
    
        struct A { ... }
        
        trait G {
            fn g(&self, a: &A);
        }
    
        fn f(g: Box<dyn G>) {
            g.g(&A { ... });
        }
    

Because the signature of the method G.g() ensures that every implementer must
not let the instance of A escape, the compiler can soundly place A on the
stack. In this way, lifting the escaping behavior of values into the type
system is a very powerful technique that allows Rust to go beyond what typical
escape analysis can do.

~~~
ridiculous_fish
This claim is too strong, because Rust can't reduce heap allocations to stack
allocations. Java, Swift, and others can optimize apparent heap allocations
into stack allocations. But (in my understanding) Box, Rc, etc. can never
allocate on the stack.

~~~
kibwen
I'm confused; in Rust, you use Box when you _want_ a value to escape the
function. That's the whole point of Box! Likewise I can't think of any
situation where you'd ever want to construct a value in an Rc without that
value escaping somehow, because in a single scope you're better off just
handing out shared references (I think you may be confusing Rc with RefCell,
since they are often used together and RefCell can be useful within a single
scope--but note that RefCell never allocates). You have no need to cleverly
promote these things to the stack because nobody is using them for data that
they don't want to live on the heap; you have to go out of your way to use
them! :)

------
flying_sheep
It feels like the title abuses the word "garbage collector", which implies
automatic and dynamic. That sounds like "Bike is a human-powered motorcycle".

However the article is well-written and very informative. Just need to skip
the title :-)

~~~
dang
Ok, I took a crack at replacing the title with a representative sentence from
the article. If anyone can suggest a better title, we can change it again.

"Better" here means more accurate and neutral, and preferably using
representative language from the article.

~~~
kristianp
What about

    
    
        Rust has a static "garbage collector"
    

It reflects Klabnick's thesis here, whereas the current "I don't find writing
Rust to be significantly harder..." title is probably more controversial.

~~~
dang
Ok, we'll give that a try.

~~~
steveklabnik
This is better than the last one, for sure. Thanks.

------
brianpgordon
> Historically, there has been two major forms of GC: reference counting, and
> tracing. The argument happens because, with the rise of tracing garbage
> collectors in many popular programming languages, for many programmers,
> “garbage collection” is synonymous with “tracing garbage collection.” For
> this reason, I’ve been coming around to the term “automatic memory
> management”, as it doesn’t carry this baggage.

It's fairly confusing to refer to this as automatic memory management. That
term already exists to refer to stack variables getting allocated and
initialized in C/C++.

~~~
kibwen
I think the comparison is quite apt (and may be deliberate). Automatic memory
management for locals in C/C++ uses the scope of a variable to determine when
reclamation should occur. Likewise, ownership-based automatic memory
management in Rust also simply leverages scope to determine when memory should
be freed (though you can pass the memory into other scopes, but at the end of
the day it's all just lexical scoping). It's RAII for memory.

------
saosebastiao
Just out of curiosity, how many rust users have run into memory fragmentation
problems with long running processes? I love the performance and low memory
profile of deterministic/static memory allocation, but without a compactor, I
can't see it turning out well with long running processes. Has it turned out
to be a problem in practice?

~~~
kibwen
It's a good question. As others have noted, jemalloc attempts to mitigate
fragmentation, but you can't ever compact in a language where you can hand out
pointers willy-nilly, so you would expect fragmentation to strictly increase
as uptime approaches infinity. That said, Rust code doesn't tend to be
especially allocation-heavy in the first place, and repeatedly
allocating/freeing in a hot loop is already something that tends to be avoided
as a core tenet of performance tuning, so allocator traffic may not be very
heavy in the first place.

I don't have any long-running Rust code myself, but I do know people who have
had Rust servers that have been running for upwards of six months and they
seem to be pleased at both how little memory it consumes and at how reliable
it's been (you don't ever get tremendous uptime if your code isn't robust in
the first place :P ).

~~~
saosebastiao
I'm definitely sympathetic to the argument that by avoiding allocation in the
first place that you don't have to worry as much about memory profile.

I allocate and destroy a bunch of arrays in my code (~100MB every minute), so
it's not huge but definitely quite a few pages every time it happens. And so
far I've got one process that's been running for 6 months. For the most part,
whenever I get new data, I take a slice of an old array and append the new
data to it to create a new array. It's fast enough, but more importantly it's
just so much easier to do it that way and with a compacting GC I don't have to
worry about anything.

Of course I don't know how good jemalloc is at avoiding fragmentation, and I
don't have the time to rewrite my code (maybe enough time to simulate, not
sure). But my code creates a ton of fragment-y garbage, and I would imagine
that with Rust I would end up not just translating my code, but changing
semantics to mutate in place, just to avoid fragmentation. I guess maybe
/r/rust would be the place to ask.

------
heavenlyblue
Has anyone used Rust as a replacement for their Go management utilities? I'd
like to find something that could be used in a set of different contexts, but
using the same language.

------
zamalek
I prefer the correct definition of a GC:

> [1]: Garbage collection is simulating a computer with an infinite amount of
> memory.

Chen even provides the "memory reclamation" definition in his post, but points
out that it is incomplete. The article could be made much shorter (although
the full read is still very interesting): as Rust has no free() call, Rust
simulates infinite memory and is hence garbage collected.

[1]:
[https://blogs.msdn.microsoft.com/oldnewthing/20100809-00/?p=...](https://blogs.msdn.microsoft.com/oldnewthing/20100809-00/?p=13203)

~~~
amelius
I always think that GCs are incomplete in a way. They can reclaim memory in
cases such as:

    
    
        x = [1, 2, 3]
        ...
        x = []
    

But not in cases such as:

    
    
        x = [1, 2, 3]
        ... (x not used)
    

Of course, determining if x is used might be uncomputable in general, but in
practice it might be computable in a lot of cases.

~~~
kazinator
A compiler can determine that a local variable _x_ has no "next use", and
insert the x = [] at that point to help garbage collection (or else provide
some map of what is live).

[https://en.wikipedia.org/wiki/Live_variable_analysis](https://en.wikipedia.org/wiki/Live_variable_analysis)

~~~
amelius
Yes, but it's too simple. Consider:

    
    
        a = { x: [1, 2, 3], y: [1] }
        ... (a.y used, but a.x not used)

~~~
arghwhat
This is theoretically solvable, assuming those arrays are by reference.

However, it's just not worth the effort of a significant increase in GC
complexity.

------
aaronblohowiak
I'd suggest that uniqueness typing is a special case of refcounting..

