
The Rustonomicon: The Dark Arts of Advanced and Unsafe Rust - valarauca1
https://doc.rust-lang.org/nightly/nomicon/
======
Animats
Interesting. Other than interfaces to non-Rust code, the big safety issues
revolve around partial allocation of arrays. That's the machinery underneath
the ability to grow a vector, and comes up with some other data structures.

It's not that hard to formalize safety in that area. You need some way to talk
about an array being partially initialized. This requires some simple design-
by-contract machinery - object invariants, basically. With some proof
machinery, it should be possible to determine at compile time that a data
manipulation is safe.

Rust already has the concept of raw memory space, which is treated as write-
only. That takes care of initialization for non-array types. Partially
initialized arrays require some additional annotation.

Suppose we have a primitive for talking about initialization:

    
    
        isvalid(T, arr, lo, hi)
    

This means that array _arr_ consists of valid values of type _T_ is
initialized and valid for subscripts in the range _lo_ to _hi_ inclusive.

Now there's a way to talk about partially initialized arrays:

    
    
        pub struct Vec<T> {
            ptr: *mut T,   // oversimplifying here
            cap: usize,
            len: usize,
            invariant isvalid(T, *ptr, 0, len-1)
        }
    

This says that the array is only initialized up to len-1. The current
requirement for "unsafe" comes from the lack of any way to talk about partial
initialization. It's an expressive problem. Much of Rust's success comes from
finding ways to talk about safety issues such as ownership within the
language. This is an extension to that.

Then, a minimal theorem prover is needed, one which at least knows these
theorems:

    
    
        j < i ==> isvalid(T, a, i, j) // the null case
        isvalid(T, a, i, j) and k >= i and k <= j ==> isvalid(T, a[k]) // element valid in range
        isvalid(T, a, i, j) and isvalid(T, a[j+1]) ==> isvalid(T, a, i, j+1) // extend by adding new valid element
    

With that info, it's possible to check operations such as push, pop, and array
growth, where an array is partially initialized in a controlled way. Array
references can also be checked; this implies that access to an element index
>= len is an error, and the compiler now knows this.

When the compiler knows something like that, it can optimize. If you're
iterating a vec, from 0 to len-1, the subscript check required is

    
    
       i < len
    

Within a FOR statement, the compiler already knows the range of i is within
0..len-1, so the subscript check becomes

    
    
       len-1 < len
    

which simplifies to True, eliminating the subscript check. That gets rid of
the need for most unchecked subscripting.

This sort of proof machinery used to be exotic technology. (I was doing this
kind of thing 35 years ago.[1]) Now it's well understood. It's quite possible
to do this in a compiler today.

[1]
[http://www.animats.com/papers/verifier/verifiermanual.pdf](http://www.animats.com/papers/verifier/verifiermanual.pdf)

------
kibwen
This book is the final product (and (not) master's thesis) of Alexis
Beingessner (Gankro)'s internship at Mozilla this summer. He's gone a long way
towards helping to define the guarantees that `unsafe` code must uphold and
thereby auditing the soundness of the various `unsafe` bits in the stdlib.

~~~
Gankro
Not my thesis by verdict of my supervisors -- I am now on my third attempt at
a thesis.

(attempt 2 was [http://cglab.ca/~abeinges/blah/too-many-
lists/book/](http://cglab.ca/~abeinges/blah/too-many-lists/book/))

~~~
samth
I think the claims about linked lists there are rather oversold, especially
the parts about functional languages. First, linked lists are common, useful,
and efficient in languages such as Racket or Ocaml. I'd really want to see
performance results before accepting your claims. Second, Haskell is really
not limited in the way you say. You can write all the programs you'd write in
an imperative language if you're willing to have IO in your type.

~~~
aturon
Regarding performance, one thing to keep in mind is that all of the languages
you mention use GC, which can help with cache locality issues. In Rust, of
course, the situation is different. Gankro also addresses the common and
useful idioms you see in most functional languages through corresponding
vector-centric functions on the Rust side. (Scala is perhaps a reasonable
midpoint.)

That all said, I too would love to see some substantiation of the performance
intuitions here.

~~~
samth
I think one of the problems here is that the discussion of how bad linked
lists are switches back and forth between taking about Rust specifically and
linked lists generally.

Also, I think performance claims, especially ones as strong as the ones in
that document, should be avoided without substantiation.

------
noobymatze
Am I right in assuming, that the name is a reference to the Normanomicon from
Fable II and Fable III? It's just priceless.

Thanks to all involved with Rust (also in light of the 1.3 release). From what
I can tell (passively reading about language decisions and seeing the
responsiveness of all involved on various communication channels), it's a
great lesson in building a community while simultaneously creating a fun
programming language.

I hope, I'll come to use it at work at some point.

P.S. First comment ever on the internet anywhere, hope I didn't violate any
guidelines.

~~~
kibwen
The name is actually an alteration of "The Necronomicon", which is from the
works of H.P. Lovecraft (the C'thulhu guy). The quote at the beginning of the
book is also an altered Lovecraft quote (which, historically, are quite
popular as easter eggs in the Rust community). In the literature, it is an
evil and otherworldly book which eventually drives its reader insane, which
should give you some idea as to the content of the OP. :)

Also, welcome to the internet!

~~~
noobymatze
Ah, obviously I have some reading up to do, thanks. :)

The one from Fable actually seems to be an alteration as well, giving me about
the same idea about the content as knowing the actual reference would have.
Fun stuff.

------
Gankro
Primary author here. AMA how Rust is unsound.

~~~
infinity0
I really like your writing style, I wish more people did this. It makes it fun
to read, almost like a story. :)

------
valarauca1
A really interesting read. Its worth mentioning how little unsafe code you
need in Rust. Even implementing strange data structures normally only requires
5-10 lines just to handle the manual memory management.

