
Removing garbage collection from the Rust language - graue
http://pcwalton.github.io/blog/2013/06/02/removing-garbage-collection-from-the-rust-language/
======
pron
This seems like the right decision: simplify the language, flatten the
learning curve, and delegate complex functionality to libraries.

But it's important to point out the general importance of good garbage
collectors. GCs are importnat not only because they help avoid a common type
of bugs, but because they are essential to many concurrent data structures.
Performance of modern software is now (and more so in the future) largely
determined not by the single-thread speed of running an algorithm, but the
scalability of an algorithm as it parallelizes across cores. Many of the most
scalable concurrent data structures are almost impossible to implement
robustly without a really good garbage collector.

As of now, the only really-good GCs are found in the JVM, and Java does indeed
provide very low-level access to important memory primitives (direct control
of memory fences is now possible through an undocumented API, and will be made
public probably in the forthcoming Java 8). But as good and as powerful the
JVM is, such algorithms will be necessary in circumstances where the JVM is
not the best choice, namely embedded systems. So a garbage collector as good
as those available for the JVM (or almost as good) but for a low-level systems
language will be a boon.

A good GC for C++ would probably be a headache because of the language's
inherent unsafety in just about anything.

Go is not an option in those circumstances, either. For one, while Go compiles
to native binaries, the language actually operates at a higher level than Java
(for example, it offers no control over thread scheduling like Java does).
More importantly, choices made by the language designers might preclude a GC
implementation good enough for high-performance concurrent data structures.

Rust is a great language to try and build a good GC for. If it would be
possible to do so in a library and not in the core language - all the better.

~~~
dmpk2k
I'm not sure I agree with the point regarding concurrent data structures and
garbage collectors. The idea is nice, but ignores a rather serious problem in
implementation: such garbage collectors are _serious_ engineering
undertakings.

To see how hard the implementation of a parallel and concurrent garbage
collector can be, take a look at the Garbage-First Garbage Collection paper.
Try to really understand all the moving parts in there and how they interact.
Once you're done wiping your brains off the wall, realize that this is the
extent that GC engineers need to go to remove most bottlenecks on scaling,
regardless of your opinion of G1 itself. This is a serious and hard-core
engineering effort that absorbs several man years of expert effort (and I
emphasize the expert part) to do properly. Rust would probably have to become
the size and importance of the Java world to ever get this level of attention.
Despite this, the JVM's collectors still routinely throw roadblocks in front
of the mutator.

And that ignores the other tradeoffs that GCs present. E.g. most GCs need at
least 3x working set to work reasonably efficiently. That's 3x less working
set you could be keeping in a local memory bank, or using for something else
(like a disk cache, because page faults sure are crazy expensive...).

The best way to avoid this problem is to not play the game at all. The reason
I became interested in Rust in the first place is Graydon and co wisely did
not follow that pied piper. Optional thread-local garbage collectors are a
much simpler problem.

As an unrelated aside, I recall Andrei Alexandrescu making the argument that
GC is important for type safety, unless the type system can prove that
dangling pointers are impossible.

~~~
pron
Go's designers seemed to take a somewhat defeatist approach in this regard.
Instead of writing a really good GC, they opted to design the runtime in such
a way that more memory sharing is possible, and the pressure on the GC is
hopefully reduced. But they've done this at a cost: they've deliberately
introduced aliasing, which all but completely precludes the possibility of
ever implementing a really good GC for Go. That's why I think Go might be good
enough for current multi-core hardware, but will not scale in a many-core
world (there are other aspects of Go that are major road blocks for future
scaling as well).

I remember seeing a talk by Cliff Click where he explains how a small change
to the CPU can greatly assist with developing pauseless GCs (though his former
employer, Azul Systems, has moved on to implement the whole thing in software,
probably at some cost to performance).

Regardless of the undertaking, the question remains -- as we move into a many-
core era -- whether we'll find a programming paradigm good enough to take
advantage of many cores. I think that whatever the winning paradigm may be, it
will require a "really good" GC.

~~~
hga
Going by what I know of Azul's systems, their pauseless GC avoids a 1 second
delay per GB of heap by focusing on solving the hardest case vs. deferring it.
When running on their custom hardware (generic 64 bit RISC + their special
sauce) it uses an instruction to cheaply implement a read barrier. According
to a paper describing this system, which they've since improved, doing that on
stock (x86_64) hardware would incur a ~20% performance penalty. If your
application uses a lot of memory and can't afford pauses that's probably OK.

~~~
nickik
There Collector C4 is a soft-realtime they can not provide realtime but if
they can be quite sure that they have very samll pauses. I truly belive in
that vision. The problem is that the all the parts of they system have to be
updated.

At the moment they can run on x86_64 but its not as fast as it could be. The
have a special build for every linux distribution.

I am not sure how good this all works if you have very few cores and not a lot
of threads. All in all however the C4 defently shows the way of the future.
Hardware developer should learn from Azul, the OS guys should work more on
supporting managed runtimes and the programming language community should do
so to. Managed Memory is the future exept in very, very special cases.

------
copx
Rust continues to shape up as a real C++ competitor. And C++ certainly needs
some competition. It is way too lonely in its domain.

I hope the next version of Rust will have a proper Windows package, though. I
don't understand why they don't bundle the version of MinGW they depend on. I
was pretty disappointed to be greeted by missing DLL errors after running
Rust's Windows installer.

There are countless MinGW builds, with different thread libraries, exception
models, etc. I guess when they say "MinGW" they mean the binaries from the
original MinGW project. However those get updated all the time. Thus the
packages the MinGW installer would download today might not be compatible with
the Rust 0.6 binaries which were build some time ago .. probably with a
different version of said packages. Thus they should really release a complete
package.

Even if there is no compatibility issue, it is just bad policy. I actually
program C/C++ on Windows AND have a MinGW build on my machine.. but not the
one Rust needs. Which is common because other builds are much more up to date
and full-featured than those from the original MinGW project. And I really
didn't feel like doing an additional MinGW installation just to play around a
little with Rust.

~~~
Ygg2
I asked a similar question on the IRC and got the answer that they are aiming
for Microsoft C compiler so it would work with natively with system as much as
possible. MinGW has some slight issues when launching application IIRC.

~~~
copx
By the way, why does Rust need a C compiler anyway?

I admit I am ignorant here because I never cared. I know that many
academic/niche programming languages bundle MinGW/depend on a MinGW or even
Cygwin installation.. I always assumed that was to cut some corners. I mean, I
have used multiple programming languages on Windows which compile to native
code and have no dependency on a C compiler.

~~~
steveklabnik
> By the way, why does Rust need a C compiler anyway?

I'm not 100% sure, but I do know that zero.rs requires 'the following libc
functions: malloc, free, abort, memcpy, and memcmp'. So I'd imagine it's that.

~~~
pjmlp
Those functions could be easily done via syscalls to the underlying OS or
simple Assembly routines, as they are quite simple to implement.

I have not yet looked at Rust's code, but I guess it is required for some
bootstrapping code.

~~~
colanderman
_Those functions could be easily done via syscalls to the underlying OS or
simple Assembly routines, as they are quite simple to implement._

None of those four functions use syscalls, _nor_ are any of them simple to
implement efficiently in assembly.

Consider that memcpy, under GCC,

* generates different code based on any known alignment of the source or destination

* generates simpler code if the size is known at compile time

* generates only register accesses if one or both of the arguments live in registers

* is elided entirely if GCC determines that it may do so safely

The other three have similar complex behavior. The Rust developers didn't use
them just because they were too dumb to know how to write a for loop in
assembly.

~~~
pjmlp
Simple is not the same as easy.

Anyone with compiler development experience can implement them without much
trouble, hence simple.

Sadly that is a skill many seem to lack nowadays.

~~~
Confusion
Yeah, back in the old days everyone had that kind of compiler development
experience, didn't they?

~~~
pjmlp
Compared to what kids seem to know nowadays, yes.

They can grok HTML5 page full of JavaScript stuff, but then mix language with
implementation, think that strong typing is only possible with VM based
languages and faith at the look of Assembly code.

------
haberman
I like it!

It sounds like ~ pointers are basically like unique_ptr in C++11 or scoped_ptr
at Google ([http://google-
styleguide.googlecode.com/svn/trunk/cppguide.x...](http://google-
styleguide.googlecode.com/svn/trunk/cppguide.xml#Smart_Pointers)). In
practice, I find that these are the most commonly useful semantics, and as
Patrick mentions it is simple and predictable.

I fully agree that the landscape of GC/refcounting solutions is diverse. If
indeed there is a way to allow for different approaches without favoring one,
I think that would definitely make Rust more widely useful and future-proof.

One question: one distinguishing factor of GC (vs refcounting) is the need to
"stop the world" during the mark phase. Would there be a way of doing this
from the standard library, or would the GC scheme itself be implemented
outside the language? Likewise with any barriers that might be required for
mutations, scanning the stack, and other tricky parts of implementing GC?

~~~
steveklabnik
> It sounds like ~ pointers are basically like unique_ptr in C++11 or
> scoped_ptr at Google

I am not mega familiar with all of the details of {unique,scoped}_ptr, but my
current understanding of ~ is this: basically, the compiler inserts a malloc
before and a free after something declared with ~ goes into and out of scope,
and it's the only pointer allowed to that memory.

~~~
dbaupp
> it's the only pointer allowed to that memory

Not quite true with borrowing (i.e. & and &mut), which allows a temporary
pointer to the memory to be created. Unlike C++ however, the compiler makes
sure all borrows are safe via a fairly intricate borrow checker, that
guarantees (among other things) that the borrowed pointers don't outlive the
original object. (i.e. no dangling references.)

~~~
steveklabnik
Ah! Yes. That's what I get for posting at 3am, thank you. :)

------
specialist
No mention of escape analysis. That's where I'm placing my bets.

<http://en.wikipedia.org/wiki/Escape_analysis>

Most short-lived objects can be stack allocated. Then the heap gets much less
of a work out.

The runtime can figure out heap vs stack. Progressively better over time.
Definitely better than I can do.

C# has (had?) the option of explicit stack allocated data (pseudo objects).
Terrible idea. Premature optimization that prevents the runtime from doing a
better job.

If I was doing embedded, realtime, or kernel dev work, I'd want to fiddle the
bits myself. But I don't so I don't.

~~~
pron
Java does escape analysis, but if I remember correctly, it is no longer used
for stack allocation (it is used for other advanced stuff, like lock elision,
and replacing objects with scalars) because the JVM GCs have gotten so good
that stack allocation provided no significant benefit. The reason is that
stack allocation is relevant only for short-lived objects anyway, and for
generational GCs, short-lived objects are (almost) a non-issue. GCs struggle
much more with long-lived objects, which eventually require compaction, whose
cost rises linearly with the size of the live data set. This is the cause of
the problems with large heaps mentioned in the comments below.

~~~
specialist
Very interesting. Thank you for the update. I'm more than a few years out of
date. I may have to place new bets. :)

FWIW, Azul has presented to our local user group (seajug.org) a few times,
most recently Nov 2012. There's video
<http://www.nimret.org/seajug/index.jsp?p=2012%2Fnov%2F> By all accounts,
their allocation and garbage collection implementations are the best
available.

------
oddthink
In every C++ project I've worked on, the vast majority of allocations go
immediately into a shared_ptr. This article seems to assert that most C and
C++ programs stick to malloc/free or auto_ptr-style semantics. This seems to
be a contradiction, so I'm confused. I can see it being true for C, but
definitely not for C++.

Am I misunderstanding the thrust of this article?

Edit: deleted comment about cycles, since they are discussed a bit at the end.

~~~
plorkyeran
shared_ptr used to be commonly accepted as a reasonable default choice, but
that hasn't been the case for years. unique_ptr/scoped_ptr is nontrivially
faster (thread-safe reference counting is fairly expensive), and much less
error prone. These days the usual advice is to only use shared_ptr if you
absolutely need it.

~~~
oddthink
I'm sure there are many places where other pointer types would be better, but,
like I said, I only tend to see shared pointers, usually typecast to something
like FooPtr and used indiscriminatly. It's an uphill battle to even use
something like a pointer to const.

I've never seen refcounting overhead show up in callgrind, so I think the
choice to uniformly use the most general version is OK.

~~~
marshray
But the insidious thing is they'll be inlined all over the place and may not
show up on callgrind. The atomic operations will contribute to cache lock
contention inside the processor. Of course it all depends on often you perform
operations on the shared_ptrs. Use them only as handles to large components
and keep them out of inner loops and you'll be fine.

------
marshray
Single- transferred-ownership pointers are not going away.

Reference counted pointers are not going away.

I don't care if Java programmers don't know the difference between the stack
and the heap, _please_ keep the @ syntax to save me from having to type

    
    
        void f(std::shared_pointer<my_namespace::my_object_type> const & p)
    

another 50,000 times in my career.

~~~
pron
Reference counted pointers may very well be going away. See comments below.
TL;DR they totally suck in multi-threaded settings.

~~~
marshray
How many times have you seen a large application grind to a halt or become
unusably slow due to reference counting overhead?

Perhaps it happens sometimes, but compare that to leak-it-all-as-garbage and
scan-all-the-address-space style collection.

------
portmanteaufu
I'm really excited that Rust has decided to position itself to be usable at a
C-level. Not only will I be able to write a kernel module with strong memory
safety guarantees (unless I need otherwise), I'll have access to basic data
structures like strings and hashmaps.

~~~
steveklabnik
Here's a minimal Linux kernel module in Rust, by the way:
<https://github.com/tsgates/rust.ko>

------
jhasse
"This could be fixed by switching to keywords, which I prefer for this reason"

~ could be named unique_ptr and @ something like shared_ptr. Genius!

~~~
copx
..or given that they like 1970s style abbreviated identifiers so much: "upt"
and "spt"

Looking forward to writing: "let mut foo = upt .."

~~~
pcwalton
I don't really care for excessive use of abbreviation myself. I would prefer
"heap" and "Gc".

------
jeltz
I look forward to the proposed solution. Because while I understand the
arguments about simplifying the language and allowing for external garbage
collectors, I am not sure I am convinced that the GC type will be as simple to
work with as the current built in GC:ed type. Sometimes you want to have GC
when coding and as it is currently implemented in Rust it is easily accessible
and simple to understand with readable code.

------
chad_oliver
The article mentions that Rust is at least partially designed for low-level
applications such as writing kernels. If garbage collection is shifted into
the standard library, would this allow Rust to be used on real-time embedded
systems such as the newer ARM microcontrollers?

~~~
pjmlp
At least for ARM Cortex-M3 and NXP LPC2000 microcontrollers you do have GC
enabled languages available (Oberon)

<http://www.astrobe.com/default.htm>

But having Rust as option is also great.

------
ambrop7
I'm completely in favor of this. Garbage collection has no place in the core
definition of any language that targets OS kernels and other high-performance
applications. It's sad how we have so many languages that fix many of C++'s
problems (e.g. D and Go), but not without the addition of a garbage collector.

~~~
pron
With regards to OS kernels you may be right (though not for performance
reasons), but some high-performance applications would be very hard to write
without a GC, especially if they're to take advantage of multicore. See my
explanation above about scalable, concurrent data structures. Even hard real-
time systems can benefit from a GC, and Java has a few GC implementations for
hard real-time apps.

FYI, object allocation and de-allocation for short-lived objects is much
faster with a good GC than with dynamic allocation, and even for long-lived
objects GC gives a higher throughput than malloc/free. The problem with GC is
latency issues (pauses) when maintaining very large heaps with many long-lived
objects.

