
Rust Sucks If I Fail to Write X - Sindisil
http://llogiq.github.io/2017/02/15/sucks.html
======
Animats
You can do linear lists and trees in Rust. The problem is backlinks. If you
refcount everything, you can have backlinks, but otherwise there's a safety
problem. Backlinks require an invariant which covers two variables, and you
can't express that in Rust.

Back pointers are a special sort of pointer from an ownership perspective.
They don't carry ownership, but are locked in an invariant relationship with
the pointer that does. If you could express that in Rust, it would be compile-
time checkable. Rust needs a way to say "field Q of struct T2 is a backpointer
to field P of struct T1".

During pointer manipulation, that rule will momentarily be broken. But it's
still checkable. It just needs some analysis. First, the checker would have to
identify a block of code in which a backpointer was being manipulated, and
locate the corresponding manipulation of the forward pointer. This is the code
block of interest. It's an atomic transaction, in the database sense. Maybe
the user would have to identify the code block with something like "atomic",
with syntax like "unsafe".

Then, the checker would have to establish that if the invariant held at entry
to the code block, it will hold at exit from the code block. This is a simple
application of program verification technology. If the atomic section is
small, this is not difficult.

Typical uses would be updating doubly-linked lists, rebalancing trees, and
pulling part of a DOM-like tree out and moving it somewhere else.

This is one class of unsafe code - transient. It's the easiest to check,
because it's a local problem. At the end of the code block, a safe state has
been reestablished.

The second class of unsafe code involves partially valid data structures. This
problem lies underneath "Vec", where the underlying block of storage has
preallocated, uninitialized space for later growth. This is harder, because
unsafe state persists outside unsafe code sections. To deal with this, it's
necessary to somehow attach invariants to the data structure. If you have an
array A which is valid from elements 0 to n, you need something like
"valid_array(A,0,n)". Then, when you initialize an element n+1, you change the
validity limits. The checker needs a few simple theorems, such as
"valid_array(A,0,n) and valid_element(A[n+1)) implies valid_array(A,0,n+1)" to
check this. This is old and completely automatable technology; we did it in
the Pascal-F verifier 35 years ago, and Dafny does it today.

Handling these two classes of unsafe code takes care of a sizable fraction of
the unsafe code really needed in Rust. With the support described above, most
of the unsafe code in pure Rust could be eliminated. Outside of those two
classes, most unsafe code in Rust involves external interfaces to other
languages.

Unsafe code which doesn't fit into any of those classes needs to be looked at
very hard.

~~~
Manishearth
> The checker needs a few simple theorems, such as "valid_array(A,0,n) and
> valid_element(A[n+1)) implies valid_array(A,0,n+1)" to check this.

This is basically dependent types. Rust doesn't really want to go that far
from a language level.

> Handling these two classes of unsafe code takes care of a sizable fraction
> of the unsafe code really needed in Rust.

Not really. All the other datastructures in the stdlib would still need
unsafe. Not really a good metric.

The Rustbelt project at MIP-SWS is doing formal verification of Rust including
unsafe code, and that approach will be far better IMO. It won't let you
eliminate unsafe from your code, but might make it easier to prove that your
unsafe code is performing safe operations.

~~~
dbaupp
_> FWIW you can use Weak for this. There's a performance cost though._

The sentence before your quote mentions ref-counting. This is referring to
Weak.

~~~
Manishearth
Somehow missed that, thanks.

------
ekidd
In my programming experience, there are two kinds of code:

1\. Code where my objects form a tree. Rust's ownership model is great for
this. 95% of my code looks this way naturally, and maybe another 3% can be
rewritten to look like this.

2\. Code where my objects form a complex graph. At this point, I need to make
a choice between manual pointer management (C++, unsafe Rust) and a garbage
collector (lots of languages). Happily, Rust does have regular pointers and
'unsafe'.

If most your code looks like (1), and only a small amount looks like (2), then
Rust can be a big win. Personally. I really like the combination of low-level
control, performance and safety.

But if I encounter a problem with a lot of (2), my first instinct is to reach
for crates.io and look up an appropriate library. I _do_ know how to use
pointers in Rust and work with 'unsafe', but it's easier to let somebody else
do it for me.

If you're really curious, then "Learning Rust with too many lists"
([http://cglab.ca/~abeinges/blah/too-many-
lists/book/](http://cglab.ca/~abeinges/blah/too-many-lists/book/)) is a great
introduction to more advanced techniques.

~~~
jstimpfle
A common approach to 2) is the database approach ("data-oriented design")
where you basically have tables and replace pointers by offsets into these
tables.

That might not work if the situation is very uncontrolled and objects live and
die very quickly. But in most cases you can just let die a few objects, and
every once in a while do "garbage collection" manually by renumbering the
still-alive objects to be consecutively indexed.

Usually the result is very clean, performant and modular code.

It's clean and modular for all the reasons that E.F. Codd preached all his
life.

It's performant because the tables approach is not micro-managing allocations
- each table is only one allocation. You will be hard pressed to detect a
difference of (single array + relative index) to raw pointers (= absolute
index). There's even machine level support for relative addressing.

You also write most of your code to operate on slices (tables or contiguous
subsets of tables) instead of only one row per function call. Mike Acton
rightfully says "where there's one, there's many". This approach is obviously
great for performance because it avoids function-call overhead and because
it's cache-friendly.

By the way, what's Rust's story to avoid referencing dead items in these
tables?

~~~
notriddle
> By the way, what's Rust's story to avoid referencing dead items in these
> tables?

One option is Option<T>.

------
vvanders
> Rustaceans usually opt for continuous data layout (known in C/C++ lingua as
> array-of-struct or struct-of-array depending on priorities), which is more
> cache-friendly than reference-heavy data structures anyway.

Yes! This is a point that I try to hammer home that's missed by many people
who write C/C++ on a daily basis. Everyone wants to use the fanciest data
structures when most of the times arrays will be faster and simpler to use.

~~~
charles-salvia
C++ programmers usually prefer contiguous data layout as well. I mean
std::vector is just a contiguous array that dynamically reallocates and
copies/move-constructs everything as needed.

But many _interesting_ data structures are hard to write in a memory-efficient
manner without resorting to non-contiguous nodes. Even a hashtable often will
use linked lists within each bucket for collision resolution. You can argue
that an open-addressing scheme is more cache-friendly, but it also has
downsides, i.e. performance degrades faster as the load factor gets higher.

Many other interesting data structures, especially some lock-free structures,
are simply impractical to implement as single contiguous arrays.

Of course, any node-based structure can make use of a memory pool that
allocates blocks from one or more larger contiguous buffers, but there will
still be pointers interleaved throughout the structure.

All in all, it remains true that Rust doesn't really provide any safety above
C++ in regard to writing these kinds of node-based data structures, and saying
"don't ever write node-based data structures" is just pointless. Yes, Rust has
some downsides and tradeoffs. Is it so bad to just say that out loud?

~~~
vvanders
How often do you _really_ need those interesting structures(and the memory
fragmentation that comes with them)? I've seen countless times where a
developer reached for std::hash_map/linked_list when there will never be more
than 10 values in their dataset. In that case an array would be at least as
fast and much easier on your data layout.

Also if you're trying to implement lockfree data structures then safe/unsafe
pointer access are going to be the least of your worries :).

~~~
charles-salvia
When you need them you _really_ need them. A Patricia Trie for example, which
includes back pointers to ancestor nodes, is simply an ideal structure for
prefix search.

~~~
Manishearth
How about a rephrasing: how many times do you really need to _write_ these?

Rust will make it tricky to implement these, but then you can use the
datastructure safely as much as you want. It's a "write once use everywhere"
thing.

~~~
jstimpfle
Usually there's no good library that fits your use case. There's always subtle
differences (to quote John Carmack)

Or it's really hard to find the good one amongst a hundred bad ones (to
paraphrase ESR, when he tried Rust in all seriousness).

"Write once use everywhere" is wishful thinking, it's important to make
adaptions (to paraphrase Knuth).

------
ianbertolacci
I think this misses the problem. Should you write your own data-structures?
Not unless absolutely necessary. But mostly everyone knows that.

So why are people complaining about data-structures?

For me, writing the data-structures is the canary in the mine. It's the next
step from hello world when trying to pick up a new language. Most importantly,
trying to write a few simple data-structures hints at the difficulty that will
be encountered writing a 'real' program.

For people like me, if writing trivial data-structures in Rust is a great
challenge, it shows that writing other, less trivial things in Rust will be
much more challenging than might be preferred.

Hand waving the comment "writing data-structures in Rust is hard" with "well
you shouldn't be doing that anyway" misses, I think, the point that Rust
(while very neat) is generally a difficult language to pick-up.

[Edit]

Again, the issue isn't data-structures or about them.

The issue is: Can I write something in scratch in this language 1) at all and
2) with some amount of daily progress.

Writing data-structures is a test case of my ability to thing and program in
the context of Rust. It serves to answer the question: Can I write a program
that implements a well defined, well understood construct so that I may learn
the building blocks of rust?

This leads up to an application, the behavior and design of which may not be
extremely well defined in the context of Rust and requires more thinking when
working out the data-structure in rust.

If the answer to "can I write a data-structure in Rust" is "Ya totally got
this makes sense" then writing an application will be relatively easy.

However if the answer is "Wow I did it but there were a lot of pain points and
I still have no idea if I've done it in the right or canonical way" then
writing an application is going to be very difficult.

~~~
ecnahc515
> For people like me, if writing trivial data-structures in Rust is a great
> challenge, it shows that writing other, less trivial things in Rust will be
> much more challenging than might be preferred.

And this is the real issue, because your assumption doesn't really have any
real basis. I don't mean to belittle your point, I can completely understand
where you're coming from, but assuming that because writing data structures in
Rust is hard, that "real" problems will be difficult as well, is wrong.
There's nothing to back this up. Data structures are a very specific domain,
often very far away from what you'll be doing in the average "real" program.
There's a reason common data structures are often implemented and included in
the stdlib of most languages.

What I find is that it's more common that writing data structures in other
languages is easy, because most of the time, people aren't implementing them
correctly, or safely. 99% of the linked lists I see in C++ fail to even
implement the copy constructor, meaning they're going to be broken the moment
you do a copy of the list.

~~~
ianbertolacci
Again, the issue isn't data-structures or about them.

The issue is: Can I write something in scratch in this language 1) at all and
2) with some amount of daily progress.

Writing data-structures is a test case of my ability to thing and program in
the context of Rust. It serves to answer the question: Can I write a program
that implements a well defined, well understood construct so that I may learn
the building blocks of rust?

This leads up to an application, the behavior and design of which may not be
extremely well defined in the context of Rust and requires more thinking when
working out the data-structure in rust.

If the answer to "can I write a data-structure in Rust" is "Ya totally got
this makes sense" then writing an application will be relatively easy.

However if the answer is "Wow I did it but there were a lot of pain points and
I still have no idea if I've done it in the right or canonical way" then
writing an application is going to be very difficult.

~~~
dbaupp
_> If the answer to "can I write a data-structure in Rust" is "Ya totally got
this makes sense" then writing an application will be relatively easy._

 _> However if the answer is "Wow I did it but there were a lot of pain points
and I still have no idea if I've done it in the right or canonical way" then
writing an application is going to be very difficult._

I feel like one of implicit points of the OP is that this isn't obviously
true: writing a data structure is often a very different type of programming
to writing a normal application. Most of the code I write isn't like a data
structure, and is definitely not like a really good data structure: I can just
glue together such code that others have written (or even I personally wrote
once, a while ago) without having to worry about the details that it packages
up/manages for me. (This is true in both Rust and C++, the latter of which I
use day-to-day.)

------
comex
It's interesting what this issue reveals about Rust culture.

After all, you _can_ write basic (tree-like) data structures without resorting
to `unsafe`. It's not that hard: just wrap all your nodes in `Rc`, plus either
`RefCell` or `Cell` if you need mutability. Yeah, this adds some overhead, but
really not very much. It'll probably still run faster than equivalent code in
most safe languages, often using less memory, without GC pauses. Or even if it
loses to some language in some tree microbenchmark, in a real application Rust
will probably make up for it with better performance elsewhere.

But culturally, the baseline for comparison isn't safe languages. It's C/C++.
Rust is supposed to be about zero cost abstractions, so non-zero-cost
abstractions are suspect. And so people recommend raw pointers and `unsafe` -
which is no less safe than C++, but still leaves newcomers with a bad taste in
their mouths.

To some extent this attitude is built into the language itself.
`Rc<RefCell<Foo>>` is ugly; needing two nested generics, one with a rather
abstruse name, for what in other languages is just `Foo`, i.e. a basic object
reference, makes you feel like you're doing something wrong. In the olden days
of Rust, `RefCell` was called `Mut`, and there was a type `RcMut<Foo>` which
combined `Rc` and `Mut`. Much more appealing to a newbie: `Mut` is what you
use to add (inner) mutability, and `RcMut` is the mutable version of `Rc`, for
the common case where you need both reference counting and mutability. No need
for ugly nesting. But well before 1.0, `Mut` was renamed and `RcMut` deemed
unnecessary, and Rust ended up where it is today. Arguably this is a good
thing, as it discourages unnecessary use of both reference counting and inner
mutability, each of which comes with a runtime cost. But for newbies, I think
it makes Rust look more intimidating than it needs to be.

------
pklausler
Remember, if your computation is short-lived and forkable, exit() is a really
fast garbage collector.

------
webkike
Okay first of, before I read the rest of the article, I want to complain about
the statement:

> While you can hack together a “list” that will be backed by a Vec of nodes
> with indices to the next / previous item, this approach is quite wasteful –
> and gains little compared to using the Vec directly.

How is this wasteful or hacky?? This is THE proper way of writing data
structures that are not strict trees. Sure, this isn't the correct way to
write a list data structure, but there's almost no reason to use a linked list
in general.

~~~
tyoverby
When most people ask for a linked list data structure, they probably don't
mean "a vector of elements and a linked list of indexes into that vector"
because it totally defeats the purpose of using a linked list in the first
place.

~~~
webkike
Well when most people ask for a linked list my first response would be "why?"

~~~
charles-salvia
an LRU cache?

~~~
webkike
I'm not saying there aren't reasons to use link lists, but often the correct
approach is a hybrid one and additionally for cache aware programs a vector of
entries with indirection integers is going to be more local

------
klodolph
Let's get this out of the way: Rust is great. Rust apologia like this article
is not so great.

> As an aside, remember that the only difference to C/c++ is that if you write
> a “basic linked list” in them, all of your code will be unsafe.

There's a bit of mental gymnastics going on here. The word "unsafe" is
performing double duty, since it means "memory safety not guaranteed by the
compiler" in Rust and it means something else entirely when you are talking
about C++, since memory safety was never guaranteed by the compiler in the
first place. The other problem with this statement is that linked lists in C
or C++ aren’t really that hard to get right, in fact, they’re easy. Maybe you
draw out a diagram on pen and paper before you write the code, but you’re
unlikely to be facing segfaults.

I admit I’m biased here, because I’ve been using Haskell for something like 15
years now, but I feel like the Haskell community acknowledges that Haskell’s
type system gets in the way and prevents you from doing useful, interesting
work, and that even a great library ecosystem isn’t enough to overcome this.
That’s how safety generally works. It’s harder to write programs that do
useful things, but in exchange, it’s also harder to write programs that behave
unpredictably or do dangerous things. Because Rust and Haskell put you in such
restrictive type systems, sometimes you have to break out to get real work
done.

Haskell’s pitch, in my mind, is, “Let’s make it easy to reason about side
effects and value semantics.” From the article, Rust’s pitch could be, “Let’s
make it easy to reason about control- and data flow.” These are both
evolutionary steps in the development of programming languages, all
programming languages being somewhat flawed. Future languages will steal ideas
from Rust the same way modern languages have stolen ideas from Haskell.

But apologia still leaves a bad taste in my mouth. The article says, “Is this
a problem with Rust? Not at all.” There’s a worrying unwillingness to
acknowledge that Rust is flawed, and the article describes Rust users as
“Rustaceans” and makes broad generalizations about how they behave. This
reminds me of the excesses of 2000s-era object-oriented programming. The
comment about “Rust’s facilities for code reuse” could have been taken
straight out of a press release for Java back in the late 1990s for all I
know.

Rust is great, but this article is further cementing my distaste for the Rust
community.

By comparison, here is Simon Peyton Jones talking about how Haskell is
useless:
[https://www.youtube.com/watch?v=iSmkqocn0oQ](https://www.youtube.com/watch?v=iSmkqocn0oQ)

------
BuuQu9hu
Imagine there's no unsafe blocks

It's easy if you try

No dangling pointers below us

Above us, only sky

Imagine all the programmers writing memory-safe~

Imagine there's no memory

It isn't hard to do

Abstracted away all the pointers

A memory-safe CPU

Imagine all the programmers using their GCs~

You may say I'm a dreamer

But I'm not the only one

I hope someday you'll put down your C

And our buffers won't overrun~

Imagine only capabilities

I wonder if you can

Only objects referencing each other

And remote calls using Capn

Imagine all the programmers sharing across the world~

~~~
pagnol
Has this been recorded? Can't find on Youtube.

------
jstimpfle
> As an aside, remember that the only difference to C/c++ is that if you write
> a “basic linked list” in them, all of your code will be unsafe.

I stopped reading here.

~~~
tyoverby
In the rust world "unsafe" is synonymous with "isn't proven to be safe by the
compiler". Under this definition, every C/C++ program that uses pointers is
"unsafe" because the languages make no memory safety guarantees.

~~~
pklausler
Raw pointers, maybe. But not std::unique_ptr<>, if I understand Rust's concept
here.

~~~
dbaupp
unique_ptr is unfortunately still unsafe (in the Rust sense): there's nothing
in the language stopping use-after-move of the unique_ptr value itself (which
is a null-dereference and undefined behaviour), nor references to the interior
becoming dangling.

