
Why writing a linked list in safe Rust is so damned hard - michael_fine
https://rcoh.me/posts/rust-linked-list-basically-impossible/
======
ekidd
Yeah, a doubly-linked list is basically the worst possible choice for your
first program in Rust. This is because Rust likes all of your memory to have
one clear owner. And this means that cyclic data structures are unusually
difficult compared to almost any other first program you might choose.

But if you really want to know how to do it, there's a really good tutorial on
the subject titled "Learning Rust With Entirely Too Many Linked Lists":
[http://cglab.ca/~abeinges/blah/too-many-
lists/book/](http://cglab.ca/~abeinges/blah/too-many-lists/book/)

This will walk you through multiple different kinds of linked lists, and show
you how to implement each in Rust. Along the way, you'll learn more about
Rust's ownership system than most working Rust programmers generally need to
know.

Personally, I've written quite a bit of production Rust code, and I've done so
without ever touching 'unsafe' or needing a linked list. If I really did need
a linked list, I'd either grab a good one off crates.io, or just knock out a
quick one using `unsafe` and raw pointers. I mean, Rust can do basically
anything C does if I ask nicely. It's just that if I choose to write "C in
Rust", then I get about same safety guarantees that C offers.

~~~
rusbus
OP here. In hindsight, this is 100% correct. Hopefully this post will help
people avoid my mistakes.

~~~
amelius
Just out of curiosity, didn't you do a google-search for "linked list in Rust"
when you ran into problems? Or isn't that your style? :)

Anyway, thanks for the thought-provoking post.

~~~
rusbus
I did. I found the book of linked lists at the bottom of the post. After
reading and appreciating the complexity I didn't want to deal with, I just
wrote it in Go

~~~
littlestymaar
No jugement here, but I curious to understand your situation: if you didn't
want to deal with the complexity of the implementation, why didn't you just
import a linked list from the standard library ? Did you just want to
implement a linked list for learning, but still put the minimum effort in the
process ?

~~~
rusbus
The standard library linked list doesn't support inserting in the middle

------
madez
That Rusts compiler fights against intuitive doubly linked lists is
reasonable, because they wouldn't work when working in parallel on the list,
while Rust by default also ensures safety doing that.

Having that in mind makes it easier for me to come up with ways how to appease
the compiler. What measures would you take to write a thread-safe doubly
linked list in C? Would you make it entirely exclusive and blocking? Well,
then you might want let one "list" structure own all nodes. Would you want to
allow parallel execution of operations on the list? Well, then break the
ownership of the nodes up into smaller parts that you can hand out in parallel
when they don't affect each other. What granularity of allowed and prepared-
for parallelism gives the best performance depends on the use-case.

~~~
tyingq
I respect your expertise here, but pulling up 20,000 feet...shouldn't Rust
feel some urgency to provide something like simple-to-use common data
structures without pain? Or a reasonable alternative? Current state feels a
little science projecty.

~~~
nicoburns
There are simple-to-use data structures available in Rust. In fact, they are
particularly easy access due to the excellent package manager. What is not so
easy is to write some of these data structures yourself. Which is a publicity
problem for Rust, as these tens to be exercises that people from a C/C++
background want to try in Rust.

~~~
tyingq
Ahh, okay. So maybe a _" c++ common approach"_ vs _" rust best practices"_
FAQ? Like _" maybe you don't need a c++ like doubly linked list, and here's
why"_ write-up?

~~~
quodlibetor
Linked lists are in the standard library[1]. The source code is linked to from
that page (`src`, on the top-right corner), although the implementation is
optimized so it's probably not a great learning resource.

If, for some reason, you actually _need_ a linked list there are tons of
implementations available on crates.io[2].

It's not that there aren't linked lists, and it's not that they can't be
implemented as efficiently as in C, it's that implementing them is an
inherently tricky problem where the obvious implementation can't be proved by
Rust's type system so they're an extremely unpleasant way to _learn_ Rust.

1: [https://doc.rust-
lang.org/std/collections/struct.LinkedList....](https://doc.rust-
lang.org/std/collections/struct.LinkedList.html) 2:
[https://crates.io/search?q=linked%20list](https://crates.io/search?q=linked%20list)

~~~
tyingq
I'm not sure that really addresses my question. You seem to be saying C++
folks need to bear the burden of figuring it out. That's fine, but it explains
the conflict. If the Rust's team position is that C++ diehards need to fully
understand Rust first, then the conflict seems expected. On the other hand, if
the Rust team is interested in serious evangelism, something seems missing.

------
Animats
I've previously discussed on HN how Rust could support backpointers safely.[1]
The idea is to have a "backpointer" attribute, with some additional checking.
Backpointers are non-owning pointers locked in an invariant relationship with
an owning pointer.

You need backpointers not only for doubly linked lists, but for some tree-type
data structures. The backpointer invariant is easy to check at run-time, and
it's often possible to eliminate that check at compile time. With this
feature, few if any tree-type data structures need "unsafe". Tree update code
is error-prone; you need all the help you can get there.

The other basic thing you can't express safely in Rust is a partially
initialized array. If you had syntax for "This array is initialized up to
element N", with suitable checking, "vec" could be written safely.

Rust is pretty close on this. Those are the two main gaps in the safety
system. Calling external non-Rust code is a separate problem, one which
hopefully will decline as more low-level libraries are rewritten in Rust.

[1]
[https://news.ycombinator.com/item?id=14303858](https://news.ycombinator.com/item?id=14303858)

~~~
bbatha
I don’t think your solution works. It will make sure the pointer is never null
(assuming these back pointers are !Sync) sure, but rust guarantees far more
that that. Each ownership type (value, &, &mut) is a capability, it very
important to rust guarantees that you can’t go up the chain. Each node have an
owned pointer to the next means that it can mutate it at anytime, the back
pointer would always alias the previous node meaning that it has to be a
weakened “&” reference (allows aliasing, lifetime rules relaxed). Because all
of the nodes are owned it’s possible to traverse the back pointer then move
forward again and recover ownership or a &mut of the same node twice.

You can however allocate the nodes in an arena and store your forward and back
pointers in Cells. You can’t move the arena, but you’ll get nice back
pointers. You could also make the linked list be backed by a slab and store
indexes instead of node pointers. Both of these solutions will likely be more
cache and allocation efficient than a traditional link list which does
allocation for every node.

~~~
Animats
Right, you can go backwards, but not with mutability. Weak refs have some of
the same problems.[1] You don't get the ability to mutate doubly linked lists
this way. Navigate trees, yes; mutate them from below, no.

[1]
[https://www.reddit.com/r/rust/comments/3csud3/how_do_rust_we...](https://www.reddit.com/r/rust/comments/3csud3/how_do_rust_weak_references_work/)

------
vvanders
Option #3(indexed tree) looks non-ideal but it actually has some really cool
properties if you do it right.

1\. If you know traversal order you can get _really_ great cache coherency. We
used to do this all the time with animation DAGs and the like in gamedev.

2\. If everything is an "offset"/index instead of a pointer you can do things
like in-place loading where creating something is just a single read() call.
No constructors or annoying allocations to slow down loading. If you want to
take this even further you can mmap() the entire file on disk and use
massive(500mb+) files without using much actual memory overhead through
letting the kernel page it in/out for you.

~~~
Narishma
> 1\. If you know traversal order you can get really great cache coherency.

Cache locality, not coherency.

~~~
vvanders
Sorry yes, had a brain fart there.

------
squiguy7
There is some good discussion on /r/rust [0] as well. I'm not trying to
invalidate the author's post but rather share some more insight.

[0]:
[https://www.reddit.com/r/rust/comments/7z7p5m/why_writing_a_...](https://www.reddit.com/r/rust/comments/7z7p5m/why_writing_a_linked_list_in_rust_is_basically/)

~~~
madez
> I'm not trying to invalidate the author's post but rather share some more
> insight.

You don't need to excuse yourself ahead of time. If you wanted to invalidate
anything the content of your comment should speak for itself. Every argument
is expected to give some insight by default.

~~~
always_good
Agreed. I'm sure some people think you're being petty but we've wandered way
too far into this ultra defensive language on the internet.

I think it's a habit formed from the toxic behavior of people responding to
you to try to pin you on some stupid detail, like pointing out that there are
exceptions to some uncontroversial statement you made that was only 1% of the
point of your post.

------
zerosanity
Just a friendly reminder the you can get the Programming Rust book for only
$15 along with a bunch of other good books for 3 more days at the Humble
Bundle Functional Programming Bundle. I've been working my way through it and
learned a lot so far about Rust.

------
kovrik
I've struggled with Rust just a couple of times, nothing serious, so I'm not
an experienced Rustacean by any means.

Question:

Ownership system obviously imposes some limitations, but gives safety in
return.

Are there any data-structures or algorithms or something that you simply
cannot implement in Rust without using unsafe?

~~~
justinpombrio
> Are there any data-structures or algorithms or something that you simply
> cannot implement in Rust without using unsafe?

No. If you wrap every piece of data in your whole program in RefCell, the
borrow checker will leave you alone, and it will be like programming in most
other languages. (There are some minor differences, like the fact that your
program will be refcounted rather than garbage collected, which doesn't deal
with cyclic references, but let's ignore those.) Alternatively, you can wrap
your whole program in "unsafe{...}", and use raw pointers everywhere, and it
will be similar to programming in C.

EDIT: My comment is trying to give a general understanding that will hold most
of the time. See the other comments for fun edge cases :-).

~~~
kovrik
But if you use RefCell, then you won't get any useful compile-time checks,
will you?

In other words: if you have Rust _without_ unsafe and without RefCell (and
similar stuff), will you still be able to implement anything in it and keep
compile-time checks and other benefits of ownership system?

~~~
justinpombrio
> But if you use RefCell, then you won't get any useful compile-time checks,
> will you?

If you use RefCell, then the compile-time checks won't be necessary. For
example, in Java there are no compile-time checks: everything is just garbage-
collected at runtime. Likewise, RefCell is lightweight garbage-collection
(modulo cyclic references).

> In other words: if you have Rust _without_ unsafe and without RefCell (and
> similar stuff), will you still be able to implement anything in it and keep
> compile-time checks and other benefits of ownership system?

Ah, in that case there are a lot of things you can't implement: Strings,
doubly-linked lists (which is the point of this article), trees with
backpointers, graphs, vectors, etc. Fortunately, you rarely need to: if you
need a data structure, it's probably already implemented in Rust. The standard
library has most common data structures, and there are often crates for less
common ones. If you do need to write unsafe Rust, it's about as scary as C.
I've written a reasonable amount of Rust code, and only ran into one situation
where I (think) I need unsafe code.

~~~
steveklabnik
You’re confusing Rc and RefCell, RefCell is “borrow checking at runtime”, Rc
is “lightweight garbage collection”.

~~~
justinpombrio
Aaagh, yes! I meant Rc everywhere :-(.

------
mehrdadn
> I find a bit of solace in the fact that implementing a data structure like
> this in a non-garbage collected language _without_ Rust is also quite tricky

What? No it isn't. The entire problem here is incorrectly assuming that "A
owns B" implies "A has a pointer to B". I don't know if this is a Rust-imposed
constraint, but it certainly isn't a logically necessary one. Just do what C++
(and C#, etc.) do: the data structure (std::list, etc.) owns all the nodes,
and the nodes merely reference each other. The nodes don't own their siblings.
It makes sense and it doesn't get tricky.

~~~
bfrog
Semantically thats what the Rust std library linked list does as well
[https://doc.rust-
lang.org/std/collections/struct.LinkedList....](https://doc.rust-
lang.org/std/collections/struct.LinkedList.html)

However that LinkedList implementation requires unsafe{} to be implemented.
All unsafe really means is that the compiler isn't going to hold your hand,
the usual memory ownership footgun is available at your discretion.

unsafe shouldn't be this mythical thing you don't touch like people seem to
think it is. If you need to escape the compilers very helpful guidance you can
and should, but test thoroughly!

~~~
mehrdadn
Going on a tangent, but I honestly think 'unsafe' might suffer from a naming
issue. It should've been called 'unchecked' or 'unverifiable' or something
that says the code is merely not verified to be safe, not that it is actually
unsafe.

~~~
xenadu02
Nope, unsafe does exactly what it says on the tin.

C# tackled this problem 15 years ago. I'm sure other languages (Haskell) did
it even earlier. When to use unsafe is a judgement call. Each developer and
team will have to set their own standards. Some people will abuse it. None of
this is new. At first it scares people. They think this is the brave new
world, using unsafe feels gross and backwards! Eventually they understand
where it is and isn't appropriate.

You might think "so what? Why even bother with a safe-by-default language?"

Because it greatly restricts the problem space. Rather than being forced to
examine every line of code for every possible bit of undefined behavior or
every path of flow control for memory errors you only need to think really
hard about edge cases inside the unsafe blocks. Simply by virtue of being a
relatively small number of blocks of few lines the problem of safety and
correctness becomes easier to understand. Easier to test. Easier to reason
about.

Unsafe is a tool. It's a dangerous tool so you should always wear your gloves
and safety goggles. But when faced with a problem for which it is the best
tool you should use it without regret.

~~~
madez
I hope we don't settle for unsafe being okay forever. Right now, sometimes it
is the right thing to do and there shouldn't be any regret. But in the future,
I hope Rusts compilers become better.

There are two things I consider necessary for that. First, that the Rust
compilers become smarter in proving the safety of things by themselves.
Second, that the Rust compilers become capable of verifying proofs given to
them that show the safety of a given piece of code the compilers can't prove
as safe on their own.

~~~
Retra
The Rust compiler will not solve the halting problem. It is pretty trivial to
write programs which are safe if and only if they halt. So 100% safe is simply
absurd.

~~~
andrewflnr
You don't need to solve the halting problem just to verify an existing proof
of a semantic property, nor to use smarter heuristics to avoid requiring such
a proof. 100% safe is totally reachable, though I bet the syntax would be
pretty hairy added to today's Rust.

~~~
jononor
Would need to get rid of C FFI, as that cannot be 'safe' in Rust?

~~~
andrewflnr
Hmm, good point. You'd have to extend the formal verification into the C code,
at least. If you can do that, it might be easier to just write verified C.

~~~
Retra
Inline ASM can't be verified either.

~~~
andrewflnr
Inline ASM is exactly as verifiable as the underlying CPU, given an adequate
model in the verifier. That's probably easier than verifying C, which
introduces extra ambiguity in its semantics. But yeah, verifiable CPUs would
be nice.

------
zanny
Nobody mentioned that the Rust std has a doubly linked list in it? [1]

It uses shared pointers to reference other nodes in the sequence.

[1] [https://doc.rust-
lang.org/src/alloc/linked_list.rs.html#46-5...](https://doc.rust-
lang.org/src/alloc/linked_list.rs.html#46-51)

~~~
rusbus
Tragically, this doesn't solve one of the only reasons to use a linked-list:
constant insertion in the middle.

------
teacpde
I recently started to actually write some Rust code after reading about Rust
here and there. The experience has been quite unique, it constantly forces me
to think about the code at low level, which I find refreshing. And the
compiler is truly impressive, it pinpoints me where things go wrong, and
conveys the error messages in a very human-like fashion.

------
hcs
> // I actually don't understand why the line below compiles.

> // Since `head` was moved into the box, I'm not sure why I can mutate it.

> head.next = Some(Box::new(next));

I'm fairly new to Rust myself, but it's my understanding that since it was
moved into the Box, the variable "head" is now just considered effectively
uninitialized, so you can go ahead and set its fields, or overwrite it
entirely with head = Node {...}, without affecting the value that was moved
into the Box.

~~~
Rusky
That is precisely correct. It also works before the variable has even been
initialized to begin with: [https://play.rust-
lang.org/?gist=faf1642e4e1f48decbeac704505...](https://play.rust-
lang.org/?gist=faf1642e4e1f48decbeac7045051113b&version=stable)

------
AceJohnny2
Tangentially, I'd love to see some list of "what does this language make easy"
(C: raw memory manipulation!) and "makes hard" (C: memory-safe code)...

Does one exist for Rust?

~~~
leetcrew
tbh I don't know enough about rust or the whole field of software development
to give an authoritative answer, but if you don't know much about rust, this
might help you.

in general, rust is great for pretty much anything c is great for. you can do
raw memory manipulation in unsafe blocks if you want, but you write memory
safe code by default. at present, rust is definitely slower than c, but
there's no inherent reason that it has to be so; mainly it's just the
consequence of being a new and immature language.

one neat thing you can do with rust is build safe interfaces to the c
libraries that you know and love for a relatively small performance penalty. I
am writing a toy graphics application in rust, and it is so much nicer than
bare OpenGL, although there are some serious pain points with library
maturity.

it can also be a decent substitute for problems you might solve in C# / Java /
other statically-typed languages, although it is a bit more strict and
explicit than those.

the main things I can think of that rust makes hard are applications where you
really don't want static typing (web dev, scripting, etc.) or you have a need
to use a lot of specific libraries that you don't feel like writing interfaces
for.

~~~
zenhack
I will contest the idea that you don't want static typing for web dev.
Curious, what statically typed languages do you have experience with? I ask
because you only mention C# and Java, and I often find people who form their
ideas about types from those languages think they have to be much more
cumbersome than they really do.

~~~
leetcrew
> what statically typed languages do you have experience with?

c, c++, java, rust, c#, so the assumption you're making is probably correct.

~~~
zenhack
If you're at all interested in taking the red pill: elm and reason/ocaml are
some things worth checking out. There is so much more out there than the C
family tree.

------
kazinator
"safe" means more than just memory safety.

Free from deadlocks, for one thing; who cares if no memory is misused if the
show locks up.

Also, free from problems like thread A only traversing half the list because B
removed a node in the middle which derailed A into a null pointer that looked
like the list terminator (Even though no memory was misused.)

------
amelius
Also interesting is to figure out how Rust deals with closures, which
reference the parent scopes, and how the corresponding (cyclic) data
structures are managed.

Does Rust eliminate the cycles by copying? (expensive, and doesn't allow for
writing)

~~~
Retra
Why would there be cycles? Rust captures closure environments however you
like. You can copy them, share them, reference them, whatever.

~~~
kazinator
Why would be there be cycles when you have closures?

Because a function's environment can end up having a reference back to the
same function.

This can be set up without assignment, given just a lambda operator. Hint:
look in domain name in the URL in the browser address bar.

~~~
viraptor
If you used it in a way that allows the closure to escape the scope, you'd get
one of the "foo can't outlive bar in score ...". (On a mobile, can't write an
example easily)

~~~
kazinator
If I can't have escaping closures, I want to be working in C.

~~~
viraptor
You can use unsafe if you want to feel like working with C :-) There's nothing
stopping you from doing things you know are correct. You just sometimes need
to tell the compiler that you know better and guarantee that the code is fine.

------
GreaterFool
You don't have to go as far as doubly-linked list. Writing a simple cons-list
is hard enough:

    
    
        enum List<T> {
            Nil,
            Cons(T, Box<List<T>>)
        }
    

Imagine you're writing `Iterator`. You have a `&mut List<T>`. For `Nil`,
you're done. For `Cons`, you take it apart, return the `T`, deref the `Box`
and move your `&mut List<T>` to point to that value. Nothing could be easier,
right?

Except in Rust you can't do that! One can resort to unsafe code or use ugly
and inefficient workarounds to remain in safe-land.

~~~
anaphylactic
I don't understand what your concern is - this only took me about a minute to
write and it looks completely safe and efficient.

[https://play.rust-
lang.org/?gist=674f4b88876614f603fd70368cb...](https://play.rust-
lang.org/?gist=674f4b88876614f603fd70368cb6a067&version=stable)

~~~
GreaterFool
Thanks for this snippet. Didn't think about that.

If I understand correctly that's overly restrictive though. You're limiting
the lifetime of list elements to the lifetime of the spine of the list.

What I want is this:

    
    
        enum List<T> {
            Nil,
            Cons(T, Box<List<T>>)
        }
    
        struct IntoIter<T>(List<T>);
    
        impl<T> IntoIterator for List<T> {
            type Item = T;
            type IntoIter = IntoIter<T>;
            fn into_iter(self) -> Self::IntoIter {
                IntoIter(self)
            }
        }
    
        impl<T> Iterator for IntoIter<T> {
            type Item = T;
            fn next(&mut self) -> Option<T> {
                match std::mem::replace(&mut self.0, List::Nil) {
                    List::Nil => None,
                    List::Cons(x, l) => {
                        std::mem::replace(&mut self.0, *l);
                        Some(x)
                    }
                }
            }
        }
    
    

But without the `replace` calls.

Also, in general I may be working with a data type for which I don't have a
value I can conjure out of thin air (like `Nil`). What then?

------
jokoon
I don't understand why there is a need for linked lists. I'm reading about
fast insert in the middle, but there are other ways to insert data quickly.
Maybe it's a need on hardware with specific memory management?

There are so many drawbacks to linked lists: cache incoherence, the use of
pointers, no fast random access...

The single fact that rust makes it hard to implement a linked list should show
that this data structure is a bad idea. Even when the C++ author is saying it,
that should be enough, no?

------
amelius
Firefox is written in Rust, and I suspect that their DOM implementation has
backpointers (from children back to parents), for performance reasons. It
might be interesting to check how they did it.

~~~
pcwalton
This is a good question, even if the details are wrong (the question should be
about Servo, not Firefox/Gecko). The answer is somewhat idiosyncratic: Servo
uses the SpiderMonkey garbage collector to manage DOM objects, which, like all
tracing GCs, can deal with cycles just fine.

This ends up simultaneously solving the ever-annoying problem of "how do you
manage memory when both JS and Rust can hold strong references to objects?"
(In Servo's case, the answer is simply "just punt all of the logic to the JS
engine.")

------
dmitrygr

      struct node{
        uint64_t val;
        struct node *next;
        struct node *prev;
      };
    

:)

